The 2nd International Workshop on High Dimensional Data Mining (HDM’14)
In conjunction with the IEEE International Conference on Data Mining (IEEE ICDM 2014)

 

The workshop will take place on 14 December 2014, in room Madrid 5.

 

Accepted Papers

 

SC219 Adaptive Semi-Supervised Dimensionality Reduction
Jia Wei

 

SC207 High dimensional Matrix Relevance Learning
Frank-Michael Schleif, Thomas Villmann, and Xibin Zhu

SC203 Boosting for Vote Learning in High-dimensional kNN Classification
Nenad Tomasev

DM515 Random KNN
Shengqiao Li, James Harner, Donald Adjeroh

 

SC204 SUBSCALE: Fast and Scalable Subspace Clustering for High Dimensional Data
Amardeep Kaur and Amitava Datta

SC208 Who Wrote This? Textual Modeling with Authorship Attribution in Big Data
Naruemon Pratanwanich and Pietro Lio

SC211Renyi Divergence based Generalization for Learning of Classification Restricted Boltzmann Machines
Qian Yu, Yuexian Hou, Xiaozhao Zhao, and Guochen Cheng

SC216 Two approaches of using heavy tails in high dimensional EDA
Momodou Sanyang, Hanno Muehlbrandt, and Ata Kaban

SC201 Out-of-Sample Error Estimation: the Blessing of High Dimensionality
Luca Oneto, Alessandro Ghio, Sandro Ridella, Jorge Luis Reyes Ortiz, and Davide Anguita


DM283 Dimensionality reduction based similarity visualization for neural gas
Kadim Tasdemir

 

Invited Talk

Bob Durrant: The Unreasonable Effectiveness of Random Projections in Computer Science [slides]

Abstract: Random projection is fast becoming a workhorse approach for high-dimensional data mining, with applications in clustering, regression, classification and low-rank matrix approximation amongst others. In the first half of this talk I will briefly survey some of the historical motivations for random projection and the applications these have inspired. In the second half I will focus on my work with Ata Kaban and Jakramate Bootkrajang, which takes some different perspectives leading to some novel theory and simple, yet effective, algorithms for classification and unconstrained real-valued optimization specifically aimed at very high-dimensional domains.

 

Program

 

 9:00- 9:10

Welcome message
HDM’14 Chairs

 9:10-10:00

The Unreasonable Effectiveness of Random Projections in Computer Science (Invited Talk) slides
Bob Durrant

10:00-10:15

Coffee break

 

Morning session: Reducing the curses of high dimensionality

10:15-10:40

Dimensionality Reduction Based Similarity Visualization for Neural Gas
Kadim Tasdemi

10:40-11:05

Adaptive Semi-Supervised Dimensionality Reduction
Jia Wei

11:05-11:30

SUBSCALE: Fast and Scalable Subspace Clustering for High Dimensional Data
Amardeep Kaur and Amitava Datta

11:30-11:55

Who Wrote This? Textual Modeling with Authorship Attribution in Big Data
Naruemon Pratanwanich and Pietro Lio

11:55-12:20

High dimensional Matrix Relevance Learning
Frank-Michael Schleif, Thomas Villmann, and Xibin Zhu

12:20-14:05

Lunch break

 

Afternoon session: In search of the blessings of high dimensionality

14:05-14:30

Vote Learning in High-dimensional kNN Classification
Nenad Tomasev

14:30-14:55

Random KNN
Shengqiao Li, James Harner, Donald Adjeroh

14:55-15:20

Out-of-Sample Error Estimation: the Blessing of High Dimensionality
Luca Oneto, Alessandro Ghio, Sandro Ridella, Jorge Luis Reyes Ortiz, and Davide Anguita

15:20-15:45

Renyi Divergence based Generalization for Learning of Classification Restricted Boltzmann Machines
Qian Yu, Yuexian Hou, Xiaozhao Zhao, and Guochen Cheng

15:45-16:00

Coffee break

16:00-16:25

Two Approaches of Using Heavy Tails in High dimensional EDA
Momodou Sanyang, Hanno Muehlbrandt, and Ata Kaban

16:25-16:50

Discussion & Closing

 

Description of Workshop

 

Stanford statistician David Donoho predicted that the 21st century will be the century of data. "We can say with complete confidence that in the coming century, high-dimensional data analysis will be a very significant activity, and completely new methods of high-dimensional data analysis will be developed; we just don't know what they are yet." -- D. Donoho, 2000.

 

Beyond any doubt, unprecedented technological advances lead to increasingly high dimensional data sets in all areas of science, engineering and businesses. These include genomics and proteomics, biomedical imaging, signal processing, astrophysics, finance, web and market basket analysis, among many others. The number of features in such data is often of the order of thousands or millions - that is much larger than the available sample size.

 

A number of issues make classical data analysis methods inadequate, questionable, or inefficient at best when faced with high dimensional data spaces:

 1. High dimensional geometry defeats our intuition rooted in low dimensional experiences, and this makes data presentation and visualisation particularly challenging.

 2. Phenomena that occur in high dimensional probability spaces, such as the concentration of measure, are counter-intuitive for the data mining practitioner. For instance, distance concentration is the phenomenon that the contrast between pair-wise distances may vanish as the dimensionality increases. This makes the notion of nearest neighbour meaningless, together with a number of methods that rely on a notion of distance.

 3. Bogus correlations and misleading estimates may result when trying to fit complex models for which the effective dimensionality is too large compared to the number of data points available.

 4. The accumulation of noise may confound our ability to find low dimensional intrinsic structure hidden in the high dimensional data.

 5. The computation cost of processing high dimensional data or carrying out optimisation over a high dimensional parameter spaces is often prohibiting.

 

Topics

 

This workshop aims to promote new advances and research directions to address the curses and uncover and exploit the blessings of high dimensionality in data mining. Topics of interest include (but are not limited to):

 

- Systematic studies of how the curse of dimensionality affects data mining methods

- New data mining techniques that exploit some properties of high dimensional data spaces

- Theoretical underpinning of mining data whose dimensionality is larger than the sample size

- Stability and reliability analyses for data mining in high dimensions

- Adaptive and non-adaptive dimensionality reduction for noisy high dimensional data sets

- Methods of random projections, compressed sensing, and random matrix theory applied to high dimensional data mining and high dimensional optimisation

- Models of low intrinsic dimension, such as sparse representation, manifold models, latent structure models, and studies of their noise tolerance

- Classification of high dimensional complex data sets

- Functional data mining

- Data presentation and visualisation methods for very high dimensional data sets

- Data mining applications to real problems in science, engineering or businesses where the data is high dimensional

 

Paper submission

High quality original submissions are solicited for oral and poster presentation at the workshop. Papers should not exceed a maximum of 8 pages, and must follow the IEEE ICDM format requirements of the main conference. All submissions will be peer-reviewed, and all accepted workshop papers will be published in the proceedings by the IEEE Computer Society Press. Submit your paper here.


Important dates

Early cycle submission deadline: August 17, 2014; Late-cycle submission deadline: 26 September.

Notifications to authors: October 10, 2014

Workshop date: December 14, 2014

 

Program committee

Robert J. Durrant - University of Waikato, NZ

Barbara Hammer - Clausthal University of Technology, Germany

Ata Kaban - University of Birmingham, UK

John A. Lee - Universite Catholique de Louvain, Belgium

Milos Radovanovic - University of Novi Sad, Serbia

Stephan Gunnemann - Carnegie Mellon University

Yiming Ying - University of Exeter, UK

Michael Biehl - University of Groningen

Carlotta Domeniconi - George Mason University

Mehmed Kantardzic - University of Louisville

Udo Seiffert - University of Magdeburg

Frank-Michael Schleif - University of Birmingham, UK

Peter Tino - University of Birmingham, UK

Guoxian Yu - Southwest University

Thomas Villmann - University of Applied Science Mittweida

Michel Verleysen - Universite Catholique de Louvain, Belgium

 

Workshop organisers

Dr. Ata Kaban

School of Computer Science, University of Birmingham, UK

Dr. Frank-Michael Schleif

School of Computer Science, University of Birmingham, UK

Prof. Thomas Villmann

University of Applied Sciences Mittweida, (Saxonia) Germany