The 4th ICDM Workshop on High Dimensional Data Mining
(HDM’16)
In
conjunction with the IEEE International
Conference on Data Mining (IEEE ICDM 2016)
Barcelona,
Spain, December 12, 2016.
Program
09:30 –
10:30 |
An
Empirical Analysis of Hubness in Unsupervised
Distance-Based Outlier Detection Arthur
Flexer Semi-supervised
similarity preserving co-selection Raywat Makkhongkaew, Khalid Benabdeslem Regression
on High-dimensional Inputs Alexander
Kuleshov, Alexander Bernstein |
10:30 -
11:00 |
Coffee break |
11:00 –
13:00 |
Ofir Lindenbaum, Arie Yeredor, Amir Averbuch Topic
Extraction Method from Millions of Tweets based on Fast Feature Selection
Technique CWC Takako
Hashimoto, Dave Shepard, Kuboyama Tetsuji, Shin Kilho Modelling
Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer
Programming Problem Lyndon
White, Roberto Togneri, Wei Liu, Mohammed Bennamoun Robust
Local Scaling using Conditional Quantiles of Graph Similarities Jayaraman J. Thiagarajan, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Bhavya Kailkhura Aggregating
Tree for Searching in Billion Scale High Dimensional Data Shicong Liu, Junru Shao, Hongtao Lu Random
Projection Clustering on Streaming Data Lee Carraher, Philip Wilsey, Anindya Moitra, Sayantan Dey: |
13:00 -
14:30 |
Lunch break |
Accepted
Papers
DM1027 - Lee Carraher, Philip Wilsey, Anindya Moitra, Sayantan Dey: Random Projection Clustering on Streaming
Data
S14202 - Arthur Flexer: An
Empirical Analysis of Hubness in Unsupervised
Distance-Based Outlier
Detection
S14204 - Takako Hashimoto, Dave Shepard, Kuboyama
Tetsuji, and Shin Kilho: Topic
Extraction Method from Millions of Tweets based on Fast Feature Selection Technique
CWC
S14203 - Jayaraman J. Thiagarajan, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, and Bhavya Kailkhura:
Robust Local Scaling using
Conditional Quantiles of Graph Similarities
S14201 - Alexander Kuleshov and
Alexander Bernstein: Regression on High-dimensional Inputs
DM630 - Ofir Lindenbaum,
Arie Yeredor, Amir Averbuch: Clustering Based On MultiView Diffusion Maps
DM324 - Shicong Liu, Junru Shao, Hongtao Lu: Aggregating
Tree for Searching in Billion Scale High Dimensional Data
DM572 - Raywat Makkhongkaew,
Khalid Benabdeslem: Semi-supervised
similarity preserving co-selection
DM639 - Lyndon White, Roberto Togneri,
Wei Liu, Mohammed Bennamoun: Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed
Integer Programming Problem
Description
of Workshop
Over
a decade ago, Stanford statistician David Donoho
predicted that the 21st century will be the century of data. "We can say
with complete confidence that in the coming century, high-dimensional data
analysis will be a very significant activity, and completely new methods of
high-dimensional data analysis will be developed; we just don't know what they
are yet." -- D. Donoho, 2000.
Unprecedented technological advances
lead to increasingly high dimensional data sets in all areas of science,
engineering and businesses. These include genomics and proteomics, biomedical
imaging, signal processing, astrophysics, finance, web and market basket
analysis, among many others. The number of features in such data is often of
the order of thousands or millions - that is much larger than the available
sample size.
For a number of reasons, classical data analysis methods inadequate,
questionable, or inefficient at best when faced with high dimensional data
spaces:
1. High dimensional geometry defeats our
intuition rooted in low dimensional experiences, and this makes data presentation
and visualisation particularly challenging.
2.
Phenomena that occur in high dimensional probability spaces, such as the
concentration of measure, are counter-intuitive for the data mining
practitioner. For instance, distance concentration is the phenomenon that the
contrast between pair-wise distances may vanish as the dimensionality
increases.
3. Bogus correlations and misleading
estimates may result when trying to fit complex models for which the effective
dimensionality is too large compared to the number of data points available.
4. The
accumulation of noise may confound our ability to find low dimensional
intrinsic structure hidden in the high dimensional data.
5. The
computation cost of processing high dimensional data or carrying out optimisation
over a high dimensional parameter spaces is often prohibiting.
Topics
This
workshop aims to promote new advances and research directions to address the
curses and uncover and exploit the blessings of high dimensionality in data
mining. Topics of interest include all aspects of high dimensional data mining,
including the following:
- Systematic studies of how the curse of
dimensionality affects data mining methods
- Models of low intrinsic dimension: sparse representation,
manifold models, latent structure models, large margin, other?
- How to exploit intrinsic dimension in
optimisation tasks for data mining?
- New data mining techniques that scale with
the intrinsic dimension, or exploit some properties of high dimensional data
spaces
- Dimensionality reduction
- Methods of random projections, compressed
sensing, and random matrix theory applied to high dimensional data mining and
high dimensional optimisation
- Theoretical underpinning of mining data
whose dimensionality is larger than the sample size
- Classification, regression, clustering,
visualisation of high dimensional complex data sets
- Functional data mining
- Data presentation and visualisation methods
for very high dimensional data sets
- Data mining applications to real problems
in science, engineering or businesses where the data is high dimensional
Paper
submission
High
quality original submissions are solicited for oral and poster presentation at
the workshop. The page limit of workshop papers
is 8 pages in the standard IEEE 2-column
format (http://www.ieee.org/
All
papers must be formatted according to the IEEE Computer Society proceedings manuscript style, following IEEE ICDM 2016 submission guidelines available at http://icdm2016.eurecat.org.
All submissions will be peer-reviewed, and all accepted
papers will be included in the IEEE ICDM
2016 Workshops Proceedings volume published by IEEE Computer Society Press, and will also be included in the IEEE
Xplore Digital Library. The workshop proceedings will be in
a CD separated from the CD of the main
conference. The CD is produced by IEEE
Conference Publishing Services (CPS).
Important
dates
Extended submission deadline: August 22,
2016.
The submission site was here.
Submissions are now closed.
Notifications
to authors: September 15, 2016.
Camera ready instructions & deadline: The
following is a URL link to the "Author's Final Paper Formatting and
Submission Instructions" Webpage (Online Author Kit) for 2016 IEEE 16th
International Conference on Data Mining Workshops (ICDMW 2016):
http://www.ieeeconfpublishing.
Workshop
date: December 12, 2016.
Registration
& Expenses
Every workshop paper must have at least one full paid
conference registration in order to be published. Check
the main conference pages for details: http://icdm2016.eurecat.org/registration/
Program
committee
Stephan Gunnemann :
Carnegie Mellon University, USA
Michael E. Houle : National Institute of
Informatics, Japan
Ata Kaban : University
of Birmingham, United Kingdom
Mehmed Kantardzic :
University of Louisville, USA
Mark Last : Ben-Gurion University of the Negev, Israel
Milos Radovanovic :
University of Novi Sad, Serbia
Nenad Tomasev : Google, Mountain
View, CA, USA
Nakul Verma : Janelia Research Campus, HHMI, USA
Guoxian Yu : Southwest University, China
Arthur Zimek :
University of Southern Denmark, Denmark
Workshop
organisation & contact
School of Computer Science, University of Birmingham, UK
Previous
editions:
Related
Links & resources: Analytics, Big Data,
Data Mining, & Data Science Resources