The 10th ICDM Workshop on High Dimensional Data
Mining (HDM’22) – 28 November
In
conjunction with the IEEE International
Conference on Data Mining
28
November – 1 December, 2022 Orlando, Florida
Program
Time (Eastern Time) |
Title |
Presenter/Author |
1:00-1:10 |
Opening Remarks |
Ata Kaban (organizer) |
1:10-1:30 |
Unknown Type Streaming Feature Selection via
Maximal Information Coefficient |
Peng Zhou, Yunyun
Zhang, Yuanting Yan, and Shu Zhao |
1:30-1:50 |
An Efficient and Reliable Tolerance-Based
Algorithm for Principal Component Analysis |
Michael Yeh and Ming Gu |
1:50-2:10 |
Unsupervised DeepView:
Uncertainty Visualization of High Dimensional Data |
Carina Newen and
Emmanuel Müller |
2:10-3:30 |
Polytopal
Complex Construction and Use in Persistent Homology |
Rohit Singh and Philip Wilsey |
3:30-3:50 |
AARS: A novel adaptive archive-based efficient
counting method for machine learning applications |
Sajib
K. Biswas, Pranab K. Muhuri, and Uttam K. Roy |
3:50-4:00 |
Closing Remarks |
Ata Kaban (organizer) |
Description of Workshop
Over
a decade ago, Stanford statistician David Donoho predicted
that the 21st century will be the century of data. "We can say with
complete confidence that in the coming century, high-dimensional data analysis
will be a very significant activity, and completely new methods of
high-dimensional data analysis will be developed; we just don't know what they
are yet." -- D. Donoho, 2000.
Unprecedented technological advances
lead to increasingly high dimensional data sets in all areas of science, engineering and businesses. These include genomics and
proteomics, biomedical imaging, signal processing, astrophysics, finance, web and market basket analysis, among many others. The
number of features in such data is often of the order of thousands or millions
- that is much larger than the available sample size.
For a number of reasons, classical data analysis
methods inadequate, questionable, or inefficient at best when faced with high
dimensional data spaces:
1. High dimensional geometry defeats our intuition
rooted in low dimensional experiences, and this makes data presentation and
visualisation particularly challenging.
2.
Phenomena that occur in high dimensional probability spaces, such as the
concentration of measure, are counter-intuitive for the data mining
practitioner. For instance, distance concentration is the phenomenon that the
contrast between pair-wise distances may vanish as the dimensionality
increases.
3. Bogus correlations and misleading estimates
may result when trying to fit complex models for which the effective
dimensionality is too large compared to the number of data points available.
4. The
accumulation of noise may confound our ability to find low dimensional
intrinsic structure hidden in the high dimensional data.
5. The computation
cost of processing high dimensional data or carrying out optimisation over a high dimensional parameter spaces is often prohibiting.
Topics
This
workshop aims to promote new advances and research directions to address the
curses, as well as to uncover and exploit the blessings of high dimensionality
in data mining.
We
encourage submissions that define and exploit some notion of "intrinsic
dimension" or more generic "intrinsic structure" in learning
and/or optimisation problems that allows solving high
dimensional data mining tasks more reliably and more efficiently.
Topics of
interest include (but are not limited to) the following:
- What are
some useful notions of intrinsic structure for high dimensional data mining?
- How to
devise data mining algorithms that scale with a suitable notion of intrinsic
structure?
- Plausible
models of low intrinsic structure, such as sparse representation, manifold
models, latent space models, and studies of their noise tolerance.
- Systematic
studies of how the curse of dimensionality affects data mining methods.
- New data
mining techniques that exploit some properties of high dimensional data spaces.
-
Theoretical underpinning of data mining where the data dimension is larger than
the sample size.
- Adaptive
and non-adaptive dimensionality reduction for high dimensional data sets.
- Random
projections, and random matrix theory applied to high dimensional data mining.
-
Classification, regression, clustering, and visualisation
of high dimensional complex data sets.
-
Functional data mining.
- Data
mining applications to real problems in science, engineering
or businesses where the data is high dimensional.
Paper submission
High
quality original submissions are solicited for oral and poster presentation at
the workshop. The page limit of workshop papers is 8 pages in the standard IEEE 2-column format (https://www.ieee.org/conferences/publishing/templates.html), including the
bibliography and any possible appendices. Reviewing is triple-blind!
Therefore, please do not include author identifying information.
All
papers must be formatted according to the IEEE Computer Society proceedings manuscript
style, following IEEE ICDM 2021 submission guidelines, which are the same as for
the main conference (except the page limit). All
accepted workshop papers will be published in the IEEE Computer Society Digital
Library (CSDL) and IEEE Xplore, and indexed by EI.
Important dates
Submission deadline: 17 September 2022 (AoE)
via Submission
site
Workshop
paper notifications: 8 October, 2022
Workshop day: 28 November 2022.
Registration
& Expenses
Every workshop paper must have
at least one full paid conference
registration in order to be published. Check the main
conference pages for details.
Program
committee
TBC
Workshop
organisation & contact
School of Computer Science, University of
Birmingham, UK
Previous
editions:
Related Links & resources: Analytics, Big Data, Data Mining, & Data
Science Resources