All of the Tutorials will be held in the combined room Alto & Mezzo & Tempo on the first floor.
The following tutorials have been accepted for presentation at IEEE ICDM 2012:
Abstract:
High dimensional data in Euclidean space pose special challenges to data mining
algorithms. These challenges are often indiscriminately subsumed under the term
"curse of dimensionality", more concrete aspects being the so-called "distance
concentration effect", the presence of irrelevant attributes concealing
relevant information, or simply efficiency issues. In about just the last few
years, the task of unsupervised outlier detection has found new specialized
solutions for tackling high dimensional data in Euclidean space. These
approaches fall under mainly two categories, namely considering or not
considering subspaces (subsets of attributes) for the definition of outliers.
The former are specifically addressing the presence of irrelevant attributes,
the latter do consider the presence of irrelevant attributes implicitly at best
but are more concerned with general issues of efficiency and effectiveness.
Nevertheless, both types of specialized outlier detection algorithms tackle
challenges specific to high dimensional data. In this tutorial, we discuss
those aspects of the "curse of dimensionality" that are most important for
outlier detection in detail and survey specialized algorithms for outlier
detection from both categories.
Abstract:
Detecting anomalies and events in data is a vital task, with numerous
applications in security, finance, health care, law enforcement, and many
others. While many techniques have been developed in past years for spotting
outliers and anomalies in unstructured collections of multi-dimensional points,
with graph data becoming ubiquitous, techniques for structured graph data have
been of focus recently. As objects in graphs have long-range correlations, a
set of novel technology has been developed for abnormality detection in graph
data. The goal of this tutorial is to provide a general, comprehensive
overview of the state-of-the-art methods for anomaly, event, and fraud
detection in data represented as graphs. As a key contribution, we provide a
thorough exploration of both data mining and machine learning algorithms for
these detection tasks. We give a general framework for the algorithms,
categorized under various settings: unsupervised vs. (semi-)supervised, for
static vs. dynamic data. We focus on the scalability and effectiveness aspects of
the methods, and highlight results on crucial real-world applications,
including accounting fraud and opinion spam detection.
Abstract:
Uncertain data is inherent in many important applications, particularly in the
context of big data analytics, such as environmental surveillance, healthcare
informatics, customer-relationship management, market analysis, and
quantitative economics research. It is almost impossible to avoid modeling and
analyzing uncertainty and probability in conquering big data. Analyzing and
mining large collections of uncertain data have become an important task and
attracted more and more interest from the data mining and industry application
communities.
In this tutorial, carrying big data analytics as the grand background, we will
present a systematic yet compact review on mining uncertain and probabilistic
data, including motivations and application examples, problems, challenges,
fundamental principles, state-of-the-art methods, the interesting open problems
and future directions. We will emphasize big data analytics applications,
connections among various mining and analytics tasks, fundamental principles,
and open problems.
We assume that the audience has the basic concepts of probability and
statistics. However, no deep background knowledge about statistics, sampling,
probability, or any other mathematical principles is assumed. We will use
sufficient examples to explain the ideas and the intuitions.