The 4th ICDM Workshop on High Dimensional Data Mining (HDM’16)
In conjunction with the IEEE International Conference on Data Mining (IEEE ICDM 2016)

Barcelona, Spain, December 12, 2016.

 

Program

09:30 – 10:30 

An Empirical Analysis of Hubness in Unsupervised Distance-Based Outlier Detection

Arthur Flexer

 

Semi-supervised similarity preserving co-selection

Raywat Makkhongkaew, Khalid Benabdeslem

 

Regression on High-dimensional Inputs

Alexander Kuleshov, Alexander Bernstein

10:30 - 11:00

 Coffee break

11:00 – 13:00 


Clustering Based On MultiView Diffusion Maps

Ofir Lindenbaum, Arie Yeredor, Amir Averbuch

 

Topic Extraction Method from Millions of Tweets based on Fast Feature Selection Technique CWC

Takako Hashimoto, Dave Shepard, Kuboyama Tetsuji, Shin Kilho

 

Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem

Lyndon White, Roberto Togneri, Wei Liu, Mohammed Bennamoun

 

Robust Local Scaling using Conditional Quantiles of Graph Similarities

Jayaraman J. Thiagarajan, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Bhavya Kailkhura

 

Aggregating Tree for Searching in Billion Scale High Dimensional Data

Shicong Liu, Junru Shao, Hongtao Lu

 

Random Projection Clustering on Streaming Data

Lee Carraher, Philip Wilsey, Anindya Moitra, Sayantan Dey:

 

13:00 - 14:30

Lunch break

 

Accepted Papers

DM1027 - Lee Carraher, Philip Wilsey, Anindya Moitra, Sayantan Dey: Random Projection Clustering on Streaming Data

S14202 - Arthur Flexer: An Empirical Analysis of Hubness in Unsupervised Distance-Based Outlier Detection

S14204 - Takako Hashimoto, Dave Shepard, Kuboyama Tetsuji, and Shin Kilho: Topic Extraction Method from Millions of Tweets based on Fast Feature Selection Technique CWC

S14203 - Jayaraman J. Thiagarajan, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, and Bhavya Kailkhura: Robust Local Scaling using Conditional Quantiles of Graph Similarities

S14201 - Alexander Kuleshov and Alexander Bernstein: Regression on High-dimensional Inputs

DM630 - Ofir Lindenbaum, Arie Yeredor, Amir Averbuch: Clustering Based On MultiView Diffusion Maps

DM324 - Shicong Liu, Junru Shao, Hongtao Lu: Aggregating Tree for Searching in Billion Scale High Dimensional Data

DM572 - Raywat Makkhongkaew, Khalid Benabdeslem: Semi-supervised similarity preserving co-selection

DM639 - Lyndon White, Roberto Togneri, Wei Liu, Mohammed Bennamoun: Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem

Description of Workshop

 

Over a decade ago, Stanford statistician David Donoho predicted that the 21st century will be the century of data. "We can say with complete confidence that in the coming century, high-dimensional data analysis will be a very significant activity, and completely new methods of high-dimensional data analysis will be developed; we just don't know what they are yet." -- D. Donoho, 2000.

 

Unprecedented technological advances lead to increasingly high dimensional data sets in all areas of science, engineering and businesses. These include genomics and proteomics, biomedical imaging, signal processing, astrophysics, finance, web and market basket analysis, among many others. The number of features in such data is often of the order of thousands or millions - that is much larger than the available sample size.

For a number of reasons, classical data analysis methods inadequate, questionable, or inefficient at best when faced with high dimensional data spaces:

 1. High dimensional geometry defeats our intuition rooted in low dimensional experiences, and this makes data presentation and visualisation particularly challenging.

 2. Phenomena that occur in high dimensional probability spaces, such as the concentration of measure, are counter-intuitive for the data mining practitioner. For instance, distance concentration is the phenomenon that the contrast between pair-wise distances may vanish as the dimensionality increases.

3. Bogus correlations and misleading estimates may result when trying to fit complex models for which the effective dimensionality is too large compared to the number of data points available.

 4. The accumulation of noise may confound our ability to find low dimensional intrinsic structure hidden in the high dimensional data.

 5. The computation cost of processing high dimensional data or carrying out optimisation over a high dimensional parameter spaces is often prohibiting.

 

Topics

 

This workshop aims to promote new advances and research directions to address the curses and uncover and exploit the blessings of high dimensionality in data mining. Topics of interest include all aspects of high dimensional data mining, including the following:

- Systematic studies of how the curse of dimensionality affects data mining methods

- Models of low intrinsic dimension: sparse representation, manifold models, latent structure models, large margin, other?

- How to exploit intrinsic dimension in optimisation tasks for data mining?

- New data mining techniques that scale with the intrinsic dimension, or exploit some properties of high dimensional data spaces

- Dimensionality reduction

- Methods of random projections, compressed sensing, and random matrix theory applied to high dimensional data mining and high dimensional optimisation

- Theoretical underpinning of mining data whose dimensionality is larger than the sample size

- Classification, regression, clustering, visualisation of high dimensional complex data sets

- Functional data mining

- Data presentation and visualisation methods for very high dimensional data sets

- Data mining applications to real problems in science, engineering or businesses where the data is high dimensional

 

Paper submission

High quality original submissions are solicited for oral and poster presentation at the workshop. The page limit of workshop papers is 8 pages in the standard IEEE 2-column format (http://www.ieee.org/conferences_events/conferences/publishing/templates.html), including the bibliography and any possible appendices.

All papers must be formatted according to the IEEE Computer Society proceedings manuscript style, following IEEE ICDM 2016 submission guidelines available at http://icdm2016.eurecat.org. All submissions will be peer-reviewed, and all accepted papers will be included in the IEEE ICDM 2016 Workshops Proceedings volume published by IEEE Computer Society Press, and will also be included in the IEEE Xplore Digital Library. The workshop proceedings will be in a CD separated from the CD of the main conference. The CD is produced by IEEE Conference Publishing Services (CPS).


Important dates

Extended submission deadline: August 22, 2016. The submission site was here. Submissions are now closed.

Notifications to authors: September 15, 2016.

 

Camera ready instructions & deadline: The following is a URL link to the "Author's Final Paper Formatting and Submission Instructions" Webpage (Online Author Kit) for 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW 2016):
http://www.ieeeconfpublishing.org/cpir/authorKit.asp?Facility=CPS_Dec&ERoom=ICDMW+2016

 

Workshop date: December 12, 2016.

 

Registration & Expenses

Every workshop paper must have at least one full paid conference registration in order to be published. Check the main conference pages for details: http://icdm2016.eurecat.org/registration/

 

Program committee

Stephan Gunnemann : Carnegie Mellon University, USA
Michael E. Houle : National Institute of Informatics, Japan
Ata Kaban : University of Birmingham, United Kingdom
Mehmed Kantardzic : University of Louisville, USA
Mark Last : Ben-Gurion University of the Negev, Israel
Milos Radovanovic : University of Novi Sad, Serbia
Nenad Tomasev : Google, Mountain View, CA, USA
Nakul Verma : Janelia Research Campus, HHMI, USA
Guoxian Yu : Southwest University, China
Arthur Zimek : University of Southern Denmark, Denmark

 

Workshop organisation & contact

Dr. Ata Kaban

School of Computer Science, University of Birmingham, UK

 

Previous editions:

HDM’15

HDM’14

HDM’13

 

 

Related Links & resources: Analytics, Big Data, Data Mining, & Data Science Resources