The 7th ICDM Workshop on High Dimensional Data Mining
In conjunction with the IEEE International Conference on Data Mining (IEEE ICDM 2019)
China, November 8-11, 2019.
workshop will take place in Room 305.
Session chair: Xi Zhang
– 8:30 – Arrival & get to know each other
– 8:00 Jiacheng Yang, Bin Chen, and Shu-Tao
Xia, Mean-Removed Product
Quantization for Approximate Nearest Neighbor Search
9:00 – 10:00 Invited Talk: Yulong Li / Keegan
– 10:30 Coffee break
– 11:00 Xi Zhang, and Ata Kaban, Experiments with Random Projections
Ensembles: Linear versus Quadratic Discriminants
– 11:30 Jinyu Li, Yu Pan, Hongfeng
Yu, and Qi Zhang, Prediction
Approach for Ising Model Estimation
11:30 – 11:40 Adahlia Charles, Underdetermined blind source separation for hard clipped stereophonic mixtures
– 11:50 Qi Xu, Random Projection Ensembles for Clustering
– 12:00 Shouvik Mani, Expert-guided
Regularization via Distance Metric Learning
Description of Workshop
a decade ago, Stanford statistician David Donoho
predicted that the 21st century will be the century of data. "We can say
with complete confidence that in the coming century, high-dimensional data
analysis will be a very significant activity, and completely new methods of
high-dimensional data analysis will be developed; we just don't know what they
are yet." -- D. Donoho, 2000.
Unprecedented technological advances
lead to increasingly high dimensional data sets in all areas of science,
engineering and businesses. These include genomics and proteomics, biomedical
imaging, signal processing, astrophysics, finance, web and market basket
analysis, among many others. The number of features in such data is often of
the order of thousands or millions - that is much larger than the available
For a number of reasons, classical data analysis methods inadequate, questionable, or inefficient at best when faced with high dimensional data spaces:
1. High dimensional geometry defeats our
intuition rooted in low dimensional experiences, and this makes data
presentation and visualisation particularly challenging.
Phenomena that occur in high dimensional probability spaces, such as the
concentration of measure, are counter-intuitive for the data mining practitioner.
For instance, distance concentration is the phenomenon that the contrast
between pair-wise distances may vanish as the dimensionality increases.
3. Bogus correlations and misleading estimates
may result when trying to fit complex models for which the effective
dimensionality is too large compared to the number of data points available.
accumulation of noise may confound our ability to find low dimensional
intrinsic structure hidden in the high dimensional data.
computation cost of processing high dimensional data or carrying out
optimisation over a high dimensional parameter spaces is often prohibiting.
workshop aims to promote new advances and research directions to address the curses,
as well as to uncover and exploit the blessings of high dimensionality in data
we would like to particularly encourage submissions that define and exploit
some notion of "intrinsic dimension" or more generic "intrinsic
structure" in learning and/or optimisation
problems that allows solving high dimensional data mining tasks more reliably
and more efficiently.
interest include (but are not limited to) the following:
- What are
some useful notions of intrinsic structure for high dimensional data mining?
- How to
devise data mining algorithms that scale with a suitable notion of intrinsic
- Plausiable models of low intrinsic structure, such as
sparse representation, manifold models, latent space models, and studies of
their noise tolerance.
Systematic studies of how the curse of dimensionality affects data mining
- New data
mining techniques that exploit some properties of high dimensional data spaces.
Theoretical underpinning of data mining where the data dimension is larger than
the sample size.
and non-adaptive dimensionality reduction for high dimensional data sets.
projections, and random matrix theory applied to high dimensional data mining.
Classification, regression, clustering, and visualisation
of high dimensional complex data sets.
Functional data mining.
mining applications to real problems in science, engineering or businesses
where the data is high dimensional.
quality original submissions are solicited for oral and poster presentation at
the workshop. The page limit of workshop papers
is 8 pages in the standard IEEE 2-column format (https://www.ieee.org/conferences/publishing/templates.html), including the
bibliography and any possible appendices. Reviewing is triple-blind!
Therefore, please do not include author identifying information.
papers must be formatted according to the IEEE Computer Society proceedings manuscript style, following IEEE ICDM 2019 submission guidelines, which are the same as for the
main conference (except the page limit). All
accepted workshop papers will be published in the IEEE Computer Society Digital
Library (CSDL) and IEEE Xplore, and indexed by EI.
· Submission deadline extended until
16th August, 2019. Submission
· Workshop paper notifications: September 4, 2019
· Camera-ready deadline for the final version of accepted papers: September 8, 2019
Every workshop paper must have
at least one full paid conference
registration in order
to be published. Check the main conference pages for details.
Jo Bootkrajang – Chiang Mai University, Thailand
Arthur Flexer –
Austrian Research Institute for AI, Austria
Ata Kaban – University of Birmingham
Mehmed Kantardzic –
University of Louisville, USA
Minqing Li – University of Birmingham, UK
Luca Oneto –
University of Pisa, Italy
Momodou Sanyang – University of The
– Iniversity of Applied Sciences Würzburg, Germany
Huseyin Seker – Newcastle upon Tyne, UK
Guoxian Yu – Southwest University, China
organisation & contact
School of Computer Science, University of
Related Links & resources: Analytics, Big Data, Data Mining, & Data