The 2nd International Workshop on High Dimensional
Data Mining (HDM’14)
In
conjunction with the IEEE
International Conference on Data Mining (IEEE ICDM 2014)
The workshop will take place on 14 December 2014, in room Madrid 5.
Accepted
Papers
SC219
Adaptive Semi-Supervised Dimensionality Reduction
Jia Wei
SC207 High
dimensional Matrix Relevance Learning
Frank-Michael Schleif, Thomas Villmann,
and Xibin Zhu
SC203
Boosting for Vote Learning in High-dimensional kNN
Classification
Nenad Tomasev
DM515 Random KNN
Shengqiao Li, James Harner,
Donald Adjeroh
SC204
SUBSCALE: Fast and Scalable Subspace Clustering for High Dimensional Data
Amardeep Kaur and Amitava Datta
SC208 Who Wrote This? Textual Modeling with Authorship
Attribution in Big Data
Naruemon Pratanwanich and
Pietro Lio
SC211Renyi Divergence based Generalization for Learning of Classification
Restricted Boltzmann Machines
Qian Yu, Yuexian Hou, Xiaozhao Zhao, and Guochen Cheng
SC216 Two approaches of using heavy tails in high dimensional EDA
Momodou Sanyang, Hanno Muehlbrandt, and Ata Kaban
SC201 Out-of-Sample Error Estimation: the Blessing of High Dimensionality
Luca Oneto, Alessandro Ghio,
Sandro Ridella, Jorge Luis
Reyes Ortiz, and Davide Anguita
DM283 Dimensionality reduction based similarity visualization for neural gas
Kadim Tasdemir
Invited
Talk
Bob Durrant:
The Unreasonable Effectiveness of Random Projections in Computer Science [slides]
Abstract:
Random projection is fast
becoming a workhorse approach for high-dimensional data mining, with
applications in clustering, regression, classification
and low-rank matrix approximation amongst others. In the first half of this
talk I will briefly survey some of the historical motivations for random
projection and the applications these have inspired. In the second half I will
focus on my work with Ata Kaban and Jakramate Bootkrajang, which
takes some different perspectives leading to some novel theory and simple, yet
effective, algorithms for classification and unconstrained real-valued
optimization specifically aimed at very high-dimensional domains.
Program
9:00- 9:10 |
Welcome message |
9:10-10:00 |
The Unreasonable Effectiveness of
Random Projections in Computer Science (Invited Talk) slides |
10:00-10:15 |
Coffee
break |
|
Morning
session: Reducing the curses of high dimensionality |
10:15-10:40 |
Dimensionality Reduction Based
Similarity Visualization for Neural Gas |
10:40-11:05 |
Adaptive Semi-Supervised
Dimensionality Reduction |
11:05-11:30 |
SUBSCALE: Fast and Scalable Subspace
Clustering for High Dimensional Data |
11:30-11:55 |
Who Wrote This? Textual Modeling with Authorship Attribution in Big Data |
11:55-12:20 |
High dimensional Matrix Relevance
Learning |
12:20-14:05 |
Lunch break |
|
Afternoon
session: In search of the blessings of high dimensionality |
14:05-14:30 |
Vote Learning in High-dimensional kNN Classification |
14:30-14:55 |
Random KNN |
14:55-15:20 |
Out-of-Sample Error Estimation: the
Blessing of High Dimensionality |
15:20-15:45 |
Renyi
Divergence based Generalization for Learning of Classification Restricted Boltzmann
Machines |
15:45-16:00 |
Coffee
break |
16:00-16:25 |
Two Approaches of Using Heavy Tails
in High dimensional EDA |
16:25-16:50 |
Discussion & Closing |
Description
of Workshop
Stanford
statistician David Donoho predicted that the 21st
century will be the century of data. "We can say with complete confidence
that in the coming century, high-dimensional data analysis will be a very
significant activity, and completely new methods of high-dimensional data
analysis will be developed; we just don't know what they are yet." -- D. Donoho, 2000.
Beyond
any doubt, unprecedented technological advances lead to increasingly high
dimensional data sets in all areas of science, engineering and businesses.
These include genomics and proteomics, biomedical imaging, signal processing,
astrophysics, finance, web and market basket analysis, among many others. The
number of features in such data is often of the order of thousands or millions
- that is much larger than the available sample size.
A number of issues
make classical data analysis methods inadequate, questionable, or inefficient
at best when faced with high dimensional data spaces:
1. High dimensional geometry defeats our
intuition rooted in low dimensional experiences, and this makes data
presentation and visualisation particularly challenging.
2.
Phenomena that occur in high dimensional probability spaces, such as the
concentration of measure, are counter-intuitive for the data mining
practitioner. For instance, distance concentration is the phenomenon that the
contrast between pair-wise distances may vanish as the dimensionality
increases. This makes the notion of nearest neighbour meaningless, together
with a number of methods that rely on a notion of distance.
3.
Bogus correlations and misleading estimates may result when trying to fit
complex models for which the effective dimensionality is too large compared to
the number of data points available.
4. The
accumulation of noise may confound our ability to find low dimensional
intrinsic structure hidden in the high dimensional data.
5. The
computation cost of processing high dimensional data or carrying out
optimisation over a high dimensional parameter spaces is often prohibiting.
Topics
This
workshop aims to promote new advances and research directions to address the
curses and uncover and exploit the blessings of high dimensionality in data
mining. Topics of interest include (but are not limited to):
- Systematic studies of how the curse of
dimensionality affects data mining methods
- New data mining techniques that exploit
some properties of high dimensional data spaces
- Theoretical underpinning of mining data
whose dimensionality is larger than the sample size
- Stability and reliability analyses for data
mining in high dimensions
- Adaptive and non-adaptive dimensionality
reduction for noisy high dimensional data sets
- Methods of random projections, compressed
sensing, and random matrix theory applied to high dimensional data mining and
high dimensional optimisation
- Models of low intrinsic dimension, such as
sparse representation, manifold models, latent structure models, and studies of
their noise tolerance
- Classification of high dimensional complex
data sets
- Functional data mining
- Data presentation and visualisation methods
for very high dimensional data sets
- Data mining applications to real problems
in science, engineering or businesses where the data is high dimensional
Paper
submission
High
quality original submissions are solicited for oral and poster presentation at
the workshop. Papers should not exceed a maximum of 8 pages, and must follow
the IEEE ICDM format requirements of the main conference. All submissions will
be peer-reviewed, and all accepted workshop papers will be published in the
proceedings by the IEEE Computer Society Press. Submit your paper here.
Important
dates
Early
cycle submission deadline: August 17, 2014; Late-cycle submission deadline: 26
September.
Notifications
to authors: October 10, 2014
Workshop
date:
Program
committee
Robert
J. Durrant - University of Waikato, NZ
Barbara
Hammer - Clausthal
Ata
Kaban -
John
A. Lee - Universite Catholique
de Louvain,
Milos
Radovanovic - University of Novi Sad, Serbia
Stephan
Gunnemann - Carnegie Mellon University
Yiming Ying - University of Exeter, UK
Michael
Biehl -
Carlotta
Domeniconi -
Mehmed
Kantardzic -
Udo Seiffert
-
Frank-Michael
Schleif -
Peter
Tino - University of Birmingham, UK
Guoxian Yu - Southwest University
Thomas
Villmann - University of Applied Science Mittweida
Michel
Verleysen - Universite Catholique de Louvain, Belgium
Workshop
organisers
University
of Applied Sciences Mittweida, (Saxonia)
Germany