The 6th ICDM Workshop on High Dimensional Data Mining
(HDM’18)
In
conjunction with the IEEE International
Conference on Data Mining (IEEE ICDM 2018)
Singapore,
November 17, 2018.
Schedule
1:30
– 1:50: Mihir Shekhar, Lini Thomas, and Kamalakar Karlapalem,
High Dimensional Clustering: A Strongly
Connected Component Clustering Solution (SCCC)
1:50 – 2:10: Bishal Deb, Ankita Sarkar, Nupur Kumari, Akash Rupela,
Piyush Gupta, and Balaji
Krishnamurthy,
Multimapper: Data Density Sensitive Topological
Visualization
2:10
– 2:30: Mohammed Wasid and Rashid Ali,
Clustering Approach for
Multidimensional Recommender Systems
2:30
– 2:50: Donglin Wang,
HILLS: Hierarchical Indoor
Localization for Large-Scale Architectural Complex
2:50
– 3:10: Ryan Moulton and Yunjiang Jiang,
Maximally Consistent Sampling and the Jaccard Index of Probability Distributions
3:10-3:30 coffee break
3:30
– 3:50: Hyunmin Lee, Zhen Hao
Wu, and Zhaolei Zhang,
Dimension Reduction on Open Data using Variational Autoencoder
3:50
– 4:10: Ao Yin and Chunkai
Zhang,
BOFE: Anomaly Detection in Linear Time
Based On Feature Estimation
4:10
– 4:30: Yingjing Lu,
Mining Connections between Domains
through Latent Space Mapping
4:30
– 4:50: Qizhi Zhang, Kuang-Chih
Lee, Hongying Bao, Yuan
You, Wenjie Li, and Dongbai
Guo,
Large scale classification in deep neural
network with Label Mapping
4:50
– 5:10: Youssef Hmamouche, Lotfi
Lakhal, and Alain Casali,
Predictors Extraction in Time Series
using Authorities-Hubs Ranking
5:10
– 5:30: Arthur Flexer, Monika Dörfler,
Jan Schlüter, and Thomas Grill,
Hubness as a case of technical algorithmic bias
in music recommendation
Description of Workshop
Over
a decade ago, Stanford statistician David Donoho
predicted that the 21st century will be the century of data. "We can say
with complete confidence that in the coming century, high-dimensional data
analysis will be a very significant activity, and completely new methods of
high-dimensional data analysis will be developed; we just don't know what they
are yet." -- D. Donoho, 2000.
Unprecedented technological advances
lead to increasingly high dimensional data sets in all areas of science,
engineering and businesses. These include genomics and proteomics, biomedical imaging,
signal processing, astrophysics, finance, web and market basket analysis, among
many others. The number of features in such data is often of the order of
thousands or millions - that is much larger than the available sample size.
For a number of reasons, classical data analysis methods inadequate,
questionable, or inefficient at best when faced with high dimensional data
spaces:
1. High dimensional geometry defeats our
intuition rooted in low dimensional experiences, and this makes data presentation
and visualisation particularly challenging.
2.
Phenomena that occur in high dimensional probability spaces, such as the
concentration of measure, are counter-intuitive for the data mining
practitioner. For instance, distance concentration is the phenomenon that the
contrast between pair-wise distances may vanish as the dimensionality
increases.
3. Bogus correlations and misleading estimates
may result when trying to fit complex models for which the effective
dimensionality is too large compared to the number of data points available.
4. The
accumulation of noise may confound our ability to find low dimensional
intrinsic structure hidden in the high dimensional data.
5. The
computation cost of processing high dimensional data or carrying out optimisation
over a high dimensional parameter spaces is often prohibiting.
Topics
This
workshop aims to promote new advances and research directions to address the
curses and uncover and exploit the blessings of high dimensionality in data
mining.
This
year we would like to particularly encourage submissions that define and
exploit some notion of "intrinsic dimension" or "intrinsic
structure" in the learning or optimisation problem that allows solving
high dimensional data mining tasks more reliably and more efficiently.
Topics
of interest include all aspects of high dimensional data mining, including the
following:
- Systematic studies of how the curse of
dimensionality affects data mining methods
- Models of low intrinsic dimension: sparse
representation, manifold models, latent structure models, large margin, other?
- How to exploit intrinsic dimension in
optimisation tasks for data mining?
- New data mining techniques that scale with
the intrinsic dimension, or exploit some properties of high dimensional data
spaces
- Dimensionality reduction
- Methods of random projections, compressed
sensing, and random matrix theory applied to high dimensional data mining and
high dimensional optimisation
- Theoretical underpinning of mining data whose
dimensionality is larger than the sample size
- Classification, regression, clustering,
visualisation of high dimensional complex data sets
- Functional data mining
- Data presentation and visualisation methods
for very high dimensional data sets
- Data mining applications to real problems in
science, engineering or businesses where the data is high dimensional
Paper
submission
High
quality original submissions are solicited for oral and poster presentation at
the workshop.
The page limit of workshop papers is 8 pages in the standard IEEE 2-column format (https://www.ieee.org/conferences/publishing/templates.html), including the
bibliography and any possible appendices. Reviewing is triple-blind!
Therefore, please do not include author identifying information.
All
papers must be formatted according to the IEEE Computer Society proceedings manuscript style, following IEEE
ICDM 2018 submission guidelines, which are the same as for the
main conference (except the page limit), and available at http://icdm2018.org/calls/call-for-papers/.
Accepted papers will be included in the IEEE ICDM 2018 Workshops Proceedings
volume published by IEEE Computer Society
Press, and will also be included in the IEEE Xplore
Digital Library. The workshop proceedings will be in a CD separated from the CD of the main conference. The CD
is produced by IEEE Conference Publishing
Services (CPS).
Important dates
· Submission deadline: 7th
August, 2018. SUBMIT
HERE
· Workshop paper notifications: September 4, 2018
· Camera-ready deadline for the final version of accepted papers: September 15, 2018
Registration
& Expenses
Every workshop paper must have
at least one full paid conference registration in order to be published. Check the main conference pages
for details.
Program
committee
Fabrizio Angiulli
Michael E. Houle
Ata Kaban
Mehmed Kantardzic
Milos Radovanovic
Nenad Tomasev
Guoxian Yu
Workshop
organisation & contact
School of Computer Science, University of
Birmingham, UK
Previous
editions:
Related Links & resources: Analytics, Big Data, Data Mining, & Data
Science Resources