The 1st International Workshop on High Dimensional
Data Mining (HDM)
In conjunction with the
IEEE International Conference on Data
Mining (IEEE ICDM 2013) in
News
***
Extended deadline: 17 August ***
*
The authors of selected highest quality papers from the workshop will be
invited to extend their work for inclusion in a special issue of an
international journal that will be organised shortly after the workshop.
Description
Some 13
years ago, Stanford statistician D. Donoho predicted
that the 21st century will be the century of data. "We can say
with complete confidence that in the coming century, high-dimensional data
analysis will be a very significant activity, and completely new methods of
high-dimensional data analysis will be developed; we just don't know what they
are yet." -- D. Donoho, 2000.
Indeed, unprecedented technological advances lead to increasingly high
dimensional data sets in all areas of science, engineering and businesses.
These include genomics and proteomics, biomedical imaging, signal processing,
astrophysics, finance, web, and market basket analysis, among many others. The
number of features in such data is often of the order of thousands or millions
-- that is much larger than the available sample size. This renders classical
data analysis methods inadequate, questionable, or inefficient at best, and
calls for new approaches.
Some of the manifestations of this curse of dimensionality are the following:
-
High
dimensional geometry defeats our intuition rooted in low dimensional
experiences so that data presentation and visualisation become particularly
challenging.
-
Distance
concentration is the phenomenon of high dimensional probability spaces where
the contrast between pairwise distances vanishes as
the dimensionality increases -- this makes distances meaningless, and affects
all methods that rely on a notion of distance.
-
Bogus
correlations and misleading estimates may result when trying to fit complex
models for which the effective dimensionality is too large compared to the
number of data points available.
-
The
accumulation of noise may confound our ability to find low dimensional
intrinsic structure hidden in the high dimensional data.
-
The
computation cost of processing high dimensional data is often prohibiting.
Topics
This
workshop aims to promote new advances and research directions to address the
curses, and to uncover and exploit the blessings of high dimensionality in data
mining. Topics of interest range from theoretical foundations, to algorithms
and implementation, to applications and empirical studies of mining high
dimensional data, including (but not limited to) the following:
o Systematic studies of how the curse
of dimensionality affects data mining methods
o New data mining techniques that
exploit some properties of high dimensional data spaces
o Theoretical underpinnings of mining
data whose dimensionality is larger than the sample size
o Stability and reliability analyses
for data mining in high dimensions
o Adaptive and non-adaptive dimensionality
reduction for noisy high dimensional data sets
o Methods of random projections,
compressed sensing, and random matrix theory applied to high dimensional data
mining
o Models of low intrinsic dimension,
such as sparse representation, manifold models, latent structure models, and
studies of their noise tolerance
o Classification, regression,
clustering of high dimensional complex data sets
o Functional data mining
o Data presentation and visualisation
methods for very high dimensional data sets
o Data mining applications to real
problems in science, engineering or businesses where the data is high
dimensional
Paper submission
High
quality original submissions are solicited for oral and poster presentation at
the workshop. Papers should not exceed a maximum of 8 pages, and must follow
the IEEE ICDM format
requirements of the main conference. All submissions will be peer-reviewed,
and all accepted workshop papers will be published in the proceedings by the
IEEE Computer Society Press. Submit your paper here.
Important
dates
Submission
deadline:
Notifications
to authors:
Workshop:
Programme committee
Adam Kowalczyk - Victoria Research Laboratory, NICTA,
Arthur Zimek - LMU
Barbara
Hammer - Clausthal
Ata
Kaban - University of Birmingham, UK
John A.
Lee - Universite Catholique
de Louvain,
Laurens
van der Maaten - Delft
University of Technology, The Netherlands
Mark Last - University of the Negev,
Milos Radovanovic -
Pierre Alquier - University College
Robert
J. Durrant - University of Waikato,
NZ
Stephan Gunnemann -
Yiming Ying - University of Exeter, UK
Workshop organiser
School
of Computer Science,
Related links & resources: Analytics, Big Data, Data Mining, & Data
Science Resources