The 3rd International Workshop on High Dimensional Data Mining (HDM’15)
In conjunction with the IEEE International Conference on Data Mining (IEEE ICDM 2015)

14 November 2015, Atlantic City, NJ, USA.




2:00 - 3:00

HDM'15 Session 1

HARAM: a Hierarchical ARAM neural network for large-scale text classification
Fernando Benites and Elena Sapozhnikova

Finding Subspace Clusters using Ranked Neighborhoods
Emin Aksehirli, Siegfried Nijssen, Matthijs van Leeuwen, and Bart Goethals

3:00 - 3:30

Coffee break

3:30 - 5:30

HDM'15 Session 2

PaMPa-HD: a Parallel MapReduce-based frequent Pattern miner for High-Dimensional data
Daniele Apiletti, Elena Baralis, Tania Cerquitelli, Paolo Garza, Fabio Pulvirenti, and Pietro Michiardi

New quality indexes for optimal clustering model identification with high dimensional data
Jean-Charles Lamirel and Pascal Cuxac


Pruned simple model sets for fast exact recovery of image
Basarab Matei and Younès Bennani

Fast LMNN Algorithm Through Random Sampling
Kaiyuan Wu


Description of Workshop


Over a decade ago, Stanford statistician David Donoho predicted that the 21st century will be the century of data. "We can say with complete confidence that in the coming century, high-dimensional data analysis will be a very significant activity, and completely new methods of high-dimensional data analysis will be developed; we just don't know what they are yet." -- D. Donoho, 2000.


Unprecedented technological advances lead to increasingly high dimensional data sets in all areas of science, engineering and businesses. These include genomics and proteomics, biomedical imaging, signal processing, astrophysics, finance, web and market basket analysis, among many others. The number of features in such data is often of the order of thousands or millions - that is much larger than the available sample size.

For a number of reasons, classical data analysis methods inadequate, questionable, or inefficient at best when faced with high dimensional data spaces:

 1. High dimensional geometry defeats our intuition rooted in low dimensional experiences, and this makes data presentation and visualisation particularly challenging.

 2. Phenomena that occur in high dimensional probability spaces, such as the concentration of measure, are counter-intuitive for the data mining practitioner. For instance, distance concentration is the phenomenon that the contrast between pair-wise distances may vanish as the dimensionality increases.

3. Bogus correlations and misleading estimates may result when trying to fit complex models for which the effective dimensionality is too large compared to the number of data points available.

 4. The accumulation of noise may confound our ability to find low dimensional intrinsic structure hidden in the high dimensional data.

 5. The computation cost of processing high dimensional data or carrying out optimisation over a high dimensional parameter spaces is often prohibiting.




This workshop aims to promote new advances and research directions to address the curses and uncover and exploit the blessings of high dimensionality in data mining. Topics of interest include all aspects of high dimensional data mining, including the following:

- Systematic studies of how the curse of dimensionality affects data mining methods

- Models of low intrinsic dimension: sparse representation, manifold models, latent structure models, large margin, other?

- How to exploit intrinsic dimension in optimisation tasks for data mining?

- New data mining techniques that scale with the intrinsic dimension, or exploit some properties of high dimensional data spaces

- Dimensionality reduction

- Methods of random projections, compressed sensing, and random matrix theory applied to high dimensional data mining and high dimensional optimisation

- Theoretical underpinning of mining data whose dimensionality is larger than the sample size

- Classification, regression, clustering, visualisation of high dimensional complex data sets

- Functional data mining

- Data presentation and visualisation methods for very high dimensional data sets

- Data mining applications to real problems in science, engineering or businesses where the data is high dimensional


Paper submission

High quality original submissions are solicited for oral and poster presentation at the workshop. Papers should not exceed a maximum of 8 pages, and must follow the IEEE ICDM format requirements of the main conference. All submissions will be peer-reviewed, and all accepted workshop papers will be published in the proceedings by the IEEE Computer Society Press. Submit your paper here.

Important dates

Submission deadline extended to: August 3, 2015.

Notifications to authors: September 1, 2015.

Workshop date: November 13, 2015.


Program committee

Arthur Zimek – Ludwig-Maximilians-Universitaet, Muenchen, Germany

Ata Kaban – University of Birmingham, UK

Barbara Hammer – University of Bielefeld, Germany

Bob Durrant – Waikato University, NZ

John A. Lee – Universite Catholique de Louvain, Belgium

Mark Last – Ben-Gurion University of the Negev, Israel

Mehmed Kantardzic – University of Louisville, USA

Michael E. Houle – National Institute of Informatics, Japan
Milos Radovanovic – University of Novi Sad, Serbia

Nenad Tomasev – Google, Mountain View, CA, USA

Peter Tino – University of Birmingham, UK

Stephan Gunnemann – Carnegie Mellon University, USA

Udo Seiffert – Fraunhofer IFF Magdeburg & University of Magdeburg, Germany

Yiming Ying – SUNY, NY, USA


Workshop organisation & contact

Dr. Ata Kaban

School of Computer Science, University of Birmingham, UK


Previous editions:




Related Links & resources: Analytics, Big Data, Data Mining, & Data Science Resources