You are here

Tutorials

All of the Tutorials will be held in the combined room Alto & Mezzo & Tempo on the first floor.

The following tutorials have been accepted for presentation at IEEE ICDM 2012:

  • Tutorial 1: Outlier Detection in High Dimensional Data.
    Arthur Zimek (University of Alberta), Erich Schubert and Hans-Peter Kriegel (Ludwig-Maximilians Universitat Munchen)

    Tutorial website

    Abstract:
    High dimensional data in Euclidean space pose special challenges to data mining
    algorithms. These challenges are often indiscriminately subsumed under the term
    "curse of dimensionality", more concrete aspects being the so-called "distance
    concentration effect", the presence of irrelevant attributes concealing
    relevant information, or simply efficiency issues. In about just the last few
    years, the task of unsupervised outlier detection has found new specialized
    solutions for tackling high dimensional data in Euclidean space. These
    approaches fall under mainly two categories, namely considering or not
    considering subspaces (subsets of attributes) for the definition of outliers.
    The former are specifically addressing the presence of irrelevant attributes,
    the latter do consider the presence of irrelevant attributes implicitly at best
    but are more concerned with general issues of efficiency and effectiveness.
    Nevertheless, both types of specialized outlier detection algorithms tackle
    challenges specific to high dimensional data. In this tutorial, we discuss
    those aspects of the "curse of dimensionality" that are most important for
    outlier detection in detail and survey specialized algorithms for outlier
    detection from both categories.

  • Tutorial 2: What is Strange in Large Networks? Graph-based Irregularity and Fraud Detection.
    Leman Akoglu and Christos Faloutsos (Carnegie Mellon University)

    Tutorial website

    Abstract:
    Detecting anomalies and events in data is a vital task, with numerous
    applications in security, finance, health care, law enforcement, and many
    others. While many techniques have been developed in past years for spotting
    outliers and anomalies in unstructured collections of multi-dimensional points,
    with graph data becoming ubiquitous, techniques for structured graph data have
    been of focus recently. As objects in graphs have long-range correlations, a
    set of novel technology has been developed for abnormality detection in graph
    data. The goal of this tutorial is to provide a general, comprehensive
    overview of the state-of-the-art methods for anomaly, event, and fraud
    detection in data represented as graphs. As a key contribution, we provide a
    thorough exploration of both data mining and machine learning algorithms for
    these detection tasks. We give a general framework for the algorithms,
    categorized under various settings: unsupervised vs. (semi-)supervised, for
    static vs. dynamic data. We focus on the scalability and effectiveness aspects of
    the methods, and highlight results on crucial real-world applications,
    including accounting fraud and opinion spam detection.

  • Tutorial 3: Mining Uncertain and Probabilistic Data for Big Data Analytics.
    Jian Pei (Simon Fraser University)

    Tutorial website

    Abstract:
    Uncertain data is inherent in many important applications, particularly in the
    context of big data analytics, such as environmental surveillance, healthcare
    informatics, customer-relationship management, market analysis, and
    quantitative economics research. It is almost impossible to avoid modeling and
    analyzing uncertainty and probability in conquering big data. Analyzing and
    mining large collections of uncertain data have become an important task and
    attracted more and more interest from the data mining and industry application
    communities.

    In this tutorial, carrying big data analytics as the grand background, we will
    present a systematic yet compact review on mining uncertain and probabilistic
    data, including motivations and application examples, problems, challenges,
    fundamental principles, state-of-the-art methods, the interesting open problems
    and future directions. We will emphasize big data analytics applications,
    connections among various mining and analytics tasks, fundamental principles,
    and open problems.

    We assume that the audience has the basic concepts of probability and
    statistics. However, no deep background knowledge about statistics, sampling,
    probability, or any other mathematical principles is assumed. We will use
    sufficient examples to explain the ideas and the intuitions.