Tuesday, 11 December
13:30 – 15:30
Room: Rembrandt & Permeke
Session Chair: Takashi Washio
13:30 | Effective and Robust Mining of Temporal Subspace Clusters if($paper->short_paper) echo(" (Short)");?> Hardy Kremer, Stephan Günnemann, Arne Held, and Thomas Seidl |
DM667 |
Mining temporal multivariate data by clustering is an important research topic. In today's complex data, interesting patterns are often neither bound to the whole dimensional nor temporal extent of the data domain. This challenge is met by temporal subspace clustering methods. Their effectiveness, however, is impeded by aspects unavoidable in real world data: Misalignments between time series, for example caused by out-of-sync sensors, and measurement errors. Under these conditions, existing temporal subspace clustering approaches miss the patterns contained in the data. In this paper, we propose a novel clustering method that mines temporal subspace clusters reflected by sets of objects and relevant intervals. We enable flexible handling of misaligned time series by adaptively shifting time series in the time domain, and we achieve robustness to measurement errors by allowing certain fractions of deviating values in each relevant point in time. We show the effectiveness of our method in experiments on real and synthetic data. | ||
13:50 | Student-t based Robust Spatio-Temporal Prediction if($paper->short_paper) echo(" (Short)");?> Yang Chen, Feng Chen, Jing Dai, T. Charles Clancy, and Yao-Jan Wu |
DM847 |
This paper describes an efficient and effective design of Robust Spatio-Temporal Prediction based on Student's t distribution, namely, St-RSTP, to provide estimations based on observations over spatio-temporal neighbors. The proposed St-RSTP is more resilient to outliers or other small departures from model assumptions than its ancestor, the Spatio-Temporal Random Effects (STRE) model. STRE is a state-of-the-art statistical model with linear order complexity for large scale processing. However, it assumes Gaussian observations, which has the well-known limitation of non-robustness. In our St-RSTP design, the measurement error follows Student's t distribution, instead of a traditional Gaussian distribution. This design reduces the influence of outliers, improves prediction quality, and keeps the problem analytically intractable. We propose a novel approximate inference approach, which approximates the model into the form that separates the high dimensional latent variables into groups, and then estimates the posterior distributions of different groups of variables separately in the framework of Expectation Propagation. As a good property, our approximate approach decentralizes to the standard STRE based prediction, when the degree of freedom of the Student's t distribution is set to infinite. Extensive experimental evaluations based on both simulation and real-life data sets demonstrated the robustness and the efficiency of our Student-t prediction model. The proposed approach provides critical functionality for stochastic processes on spatio-temporal data. | ||
14:10 | Robust Prediction and Outlier Detection for Spatial Dataset if($paper->short_paper) echo(" (Short)");?> Xutong Liu, Feng Chen, and Chang-Tien Lu |
DM876 |
Spatial kriging is a widely used predictive model for spatial datasets. In spatial kriging model, the observations are assumed to be Gaussian for computational convenience. However, its predictive accuracy could be significantly compromised if the observations are contaminated by outliers. This deficiency can be systematically addressed by increasing the robustness of spatial kriging model using heavy tailed distributions, such as the Huber, Laplace, and Student's t distributions. This paper presents a novel Robust and Reduced Rank Spatial Kriging Model (R3-SKM), which is resilient to the influences of outliers and allows for fast spatial inference. Furthermore, three effective and efficient algorithms are proposed based on R3-SKM framework that can perform robust parameter estimation, spatial prediction, and spatial outlier detection with a linear-order time complexity. Extensive experiments on both simulated and real data sets demonstrated the robustness and efficiency of our proposed techniques. | ||
14:30 | Understanding Data Completeness in Network Monitoring Systems if($paper->short_paper) echo(" (Short)");?> Flip Korn, Ruilin Liu, and Hui (Wendy) Wang |
DM884 |
In many networks including Internet Service Providers, transportation monitoring systems and the electric grid, measurements from a set of objects are continuously taken over time and used for important decisions such as provisioning and pricing. It is therefore vital to understand the completeness and reliability of such data. Given the large volume of data generated by these systems, rather than enumerating the times and objects incurring missing or spurious data, it is more effective to provide patterns (groups of tuples) concisely summarizing trends that may not otherwise be readily observable. In this paper, we define the Graph Tableau Discovery Problem where the measured tuples can be thought of as edges in a bipartite graph on an ordered attribute (time) and an unordered attribute (object identifiers). We define the problem of finding an optimal summary, show that it is NP-complete, and then provide a polynomial-time approximation algorithm with guarantees to find a good summary. Experiments on real and synthetic data demonstrate the effectiveness and efficiency of our approach. | ||
14:50 | Inferring the Root Cause in Road Traffic Anomalies if($paper->short_paper) echo(" (Short)");?> Sanjay Chawla, Yu Zheng, and Jiafeng Hu |
DM711 |
We propose a novel two-step mining and optimization framework for inferring the root cause of anomalies that appear in road traffic data. We model road traffic as a time-dependent flow on a network formed by partitioning a city into regions bounded by major roads. In the first step we identify link anomalies based on their deviation from their historical traffic profile. However, link anomalies on their own shed very little light on what caused them to be anomalous. In the second step we take a generative approach by modeling the flow in a network in terms of the origin-destination (OD) matrix which physically relates the latent flow between origin and destination and the observable flow on the links. The key insight is that instead of using all of link traffic as the observable vector we only use the link anomaly vector. By solving an L1 inverse problem we infer the routes (the origin-destination pairs) which gave rise to the link anomalies. Experiments on a very large GPS data set consisting on nearly eight hundred million data points demonstrate that we can discover routes which can clearly explain the appearance of link anomalies. The use of optimization techniques to explain observable anomalies in a generative fashion is, to the best of our knowledge, entirely novel. | ||
15:10 | A Stochastic Model for Context-Aware Anomaly Detection in Indoor Location Traces if($paper->short_paper) echo(" (Short)");?> Chuanren Liu, Hui Xiong, Yong Ge, Wei Geng, Matt Perkins |
DM720 |
Rapid growth in the development of real-time location system solutions has led to an increased interest in indoor location-aware services, such as hospital asset management. Although there are extensive studies in the literature on the analysis of outdoor location traces, the studies of indoor location traces are less touched and fragmented. To that end, in this paper, we provide a focused study of indoor location traces collected by the sensors attached to medical devices in a hospital environment. Along this line, we first introduce some unique properties of these indoor location traces. We show that they can capture the movement patterns of the medical devices, which are tightly coupled with the work flow in the controlled hospital environment. Based on this observation, we propose a stochastic model for context-aware anomaly detection in indoor location traces, which exploits the hospital work flow and models the movements of medical devices as transitions in finite state machines. In detail, we first develop a density-based method to identify the hotspots filled with high-level abnormal activities in the indoor environment. The discovered hotspots serve as the context for nearby trajectories. Then, we introduce an N-gram based method for measuring the degree of anomaly based on the detected hotspots, which is able to predict the missing events possibly due to the devices being stolen. Besides, to address the noisy nature of the indoor sensor networks, we also propose an iterative algorithm to estimate the transition probabilities. This algorithm allows to effectively recover the missing location records which are critical for the abnormality estimation. Finally, the experimental results on the real-world date sets validate the effectiveness of the proposed context-aware anomaly detection method for identifying abnormal events. |