The 2^{nd} International Workshop on High Dimensional
Data Mining (HDM’14)
In
conjunction with the IEEE
International Conference on Data Mining (IEEE ICDM 2014)
The workshop will take place on 14 December 2014, in room Madrid 5.
Accepted
Papers
SC219
Adaptive SemiSupervised Dimensionality Reduction
Jia Wei
SC207 High
dimensional Matrix Relevance Learning
FrankMichael Schleif, Thomas Villmann,
and Xibin Zhu
SC203
Boosting for Vote Learning in Highdimensional kNN
Classification
Nenad Tomasev
DM515 Random KNN
Shengqiao Li, James Harner,
Donald Adjeroh
SC204
SUBSCALE: Fast and Scalable Subspace Clustering for High Dimensional Data
Amardeep Kaur and Amitava Datta
SC208 Who Wrote This? Textual Modeling with Authorship
Attribution in Big Data
Naruemon Pratanwanich and
Pietro Lio
SC211Renyi Divergence based Generalization for Learning of Classification
Restricted Boltzmann Machines
Qian Yu, Yuexian Hou, Xiaozhao Zhao, and Guochen Cheng
SC216 Two approaches of using heavy tails in high dimensional EDA
Momodou Sanyang, Hanno Muehlbrandt, and Ata Kaban
SC201 OutofSample Error Estimation: the Blessing of High Dimensionality
Luca Oneto, Alessandro Ghio,
Sandro Ridella, Jorge Luis
Reyes Ortiz, and Davide Anguita
DM283 Dimensionality reduction based similarity visualization for neural gas
Kadim Tasdemir
Invited
Talk
Bob Durrant:
The Unreasonable Effectiveness of Random Projections in Computer Science [slides]
Abstract:
Random projection is fast
becoming a workhorse approach for highdimensional data mining, with
applications in clustering, regression, classification
and lowrank matrix approximation amongst others. In the first half of this
talk I will briefly survey some of the historical motivations for random
projection and the applications these have inspired. In the second half I will
focus on my work with Ata Kaban and Jakramate Bootkrajang, which
takes some different perspectives leading to some novel theory and simple, yet
effective, algorithms for classification and unconstrained realvalued
optimization specifically aimed at very highdimensional domains.
Program
9:00 9:10 
Welcome message 
9:1010:00 
The Unreasonable Effectiveness of
Random Projections in Computer Science (Invited Talk) slides 
10:0010:15 
Coffee
break 

Morning
session: Reducing the curses of high dimensionality 
10:1510:40 
Dimensionality Reduction Based
Similarity Visualization for Neural Gas 
10:4011:05 
Adaptive SemiSupervised
Dimensionality Reduction 
11:0511:30 
SUBSCALE: Fast and Scalable Subspace
Clustering for High Dimensional Data 
11:3011:55 
Who Wrote This? Textual Modeling with Authorship Attribution in Big Data 
11:5512:20 
High dimensional Matrix Relevance
Learning 
12:2014:05 
Lunch break 

Afternoon
session: In search of the blessings of high dimensionality 
14:0514:30 
Vote Learning in Highdimensional kNN Classification 
14:3014:55 
Random KNN 
14:5515:20 
OutofSample Error Estimation: the
Blessing of High Dimensionality 
15:2015:45 
Renyi
Divergence based Generalization for Learning of Classification Restricted Boltzmann
Machines 
15:4516:00 
Coffee
break 
16:0016:25 
Two Approaches of Using Heavy Tails
in High dimensional EDA 
16:2516:50 
Discussion & Closing 
Description
of Workshop
Stanford
statistician David Donoho predicted that the 21st
century will be the century of data. "We can say with complete confidence
that in the coming century, highdimensional data analysis will be a very
significant activity, and completely new methods of highdimensional data
analysis will be developed; we just don't know what they are yet."  D. Donoho, 2000.
Beyond
any doubt, unprecedented technological advances lead to increasingly high
dimensional data sets in all areas of science, engineering and businesses.
These include genomics and proteomics, biomedical imaging, signal processing,
astrophysics, finance, web and market basket analysis, among many others. The
number of features in such data is often of the order of thousands or millions
 that is much larger than the available sample size.
A number of issues
make classical data analysis methods inadequate, questionable, or inefficient
at best when faced with high dimensional data spaces:
1. High dimensional geometry defeats our
intuition rooted in low dimensional experiences, and this makes data
presentation and visualisation particularly challenging.
2.
Phenomena that occur in high dimensional probability spaces, such as the
concentration of measure, are counterintuitive for the data mining
practitioner. For instance, distance concentration is the phenomenon that the
contrast between pairwise distances may vanish as the dimensionality
increases. This makes the notion of nearest neighbour meaningless, together
with a number of methods that rely on a notion of distance.
3.
Bogus correlations and misleading estimates may result when trying to fit
complex models for which the effective dimensionality is too large compared to
the number of data points available.
4. The
accumulation of noise may confound our ability to find low dimensional
intrinsic structure hidden in the high dimensional data.
5. The
computation cost of processing high dimensional data or carrying out
optimisation over a high dimensional parameter spaces is often prohibiting.
Topics
This
workshop aims to promote new advances and research directions to address the
curses and uncover and exploit the blessings of high dimensionality in data
mining. Topics of interest include (but are not limited to):
 Systematic studies of how the curse of
dimensionality affects data mining methods
 New data mining techniques that exploit
some properties of high dimensional data spaces
 Theoretical underpinning of mining data
whose dimensionality is larger than the sample size
 Stability and reliability analyses for data
mining in high dimensions
 Adaptive and nonadaptive dimensionality
reduction for noisy high dimensional data sets
 Methods of random projections, compressed
sensing, and random matrix theory applied to high dimensional data mining and
high dimensional optimisation
 Models of low intrinsic dimension, such as
sparse representation, manifold models, latent structure models, and studies of
their noise tolerance
 Classification of high dimensional complex
data sets
 Functional data mining
 Data presentation and visualisation methods
for very high dimensional data sets
 Data mining applications to real problems
in science, engineering or businesses where the data is high dimensional
Paper
submission
High
quality original submissions are solicited for oral and poster presentation at
the workshop. Papers should not exceed a maximum of 8 pages, and must follow
the IEEE ICDM format requirements of the main conference. All submissions will
be peerreviewed, and all accepted workshop papers will be published in the
proceedings by the IEEE Computer Society Press. Submit your paper here.
Important
dates
Early
cycle submission deadline: August 17, 2014; Latecycle submission deadline: 26
September.
Notifications
to authors: October 10, 2014
Workshop
date:
Program
committee
Robert
J. Durrant  University of Waikato, NZ
Barbara
Hammer  Clausthal
Ata
Kaban 
John
A. Lee  Universite Catholique
de Louvain,
Milos
Radovanovic  University of Novi Sad, Serbia
Stephan
Gunnemann  Carnegie Mellon University
Yiming Ying  University of Exeter, UK
Michael
Biehl 
Carlotta
Domeniconi 
Mehmed
Kantardzic 
Udo Seiffert

FrankMichael
Schleif 
Peter
Tino  University of Birmingham, UK
Guoxian Yu  Southwest University
Thomas
Villmann  University of Applied Science Mittweida
Michel
Verleysen  Universite Catholique de Louvain, Belgium
Workshop
organisers
University
of Applied Sciences Mittweida, (Saxonia)
Germany