Thursday, 13 December
16:00 – 18:00
Room: Salle des nations II
Session Chair: Li Xiaoli
16:00 | Online Induction of Probabilistic Real Time Automata if($paper->short_paper) echo(" (Short)");?> Jana Schmidt and Stefan Kramer |
DM342 |
Probabilistic real time automata (PRTAs) are a representation of dynamic processes arising in the sciences and industry. Currently, the induction of automata is divided into two steps: the creation of the prefix tree acceptor (PTA) and the merge procedure based on clustering of the states. These two steps can be very time intensive when a PRTA is to be induced for massive or even unbounded data sets. The latter one can be efficiently processed, as there exist scalable online clustering algorithms. However, the creation of the PTA still can be very time consuming. To overcome this problem, we propose a genuine online PRTA induction approach that incorporates new instances by first collapsing them and then using a maximum frequent pattern based clustering. The approach is tested against a predefined synthetic automaton and real-world data sets, for which the approach is scalable and stable. Moreover, we present a broad evaluation on a real world disease group data set that shows the applicability of such a model to the analysis of medical processes. | ||
16:20 | Online Maritime Abnormality Detection using Gaussian Processes and Extreme Value Theory if($paper->short_paper) echo(" (Short)");?> Mark Smith, Stephen Reece, Stephen Roberts, and Iead Rezek |
DM755 |
Novelty, or abnormality, detection aims to identify patterns within data streams that do not conform to expected behaviour. This paper introduces a novelty detection technique using a combination of Gaussian Processes and extreme value theory to identify anomalous behaviour in streaming data. The proposed combination of continuous and count stochastic processes is a principled approach towards dynamic extreme value modeling that accounts for the dynamics in the time series, the streaming nature of its observation as well as its sampling process. The approach is tested on both synthetic and real data, showing itself to be effective in our primary application of maritime vessel track analysis. | ||
16:40 | Utilizing Real-World Transportation Data for Accurate Traffic Prediction if($paper->short_paper) echo(" (Short)");?> Bei Pan, Ugur Demiryurek, and Cyrus Shahabi |
DM279 |
For the first time, real-time high-fidelity spatiotemporal data on transportation networks of major cities have become available. This gold mine of data can be utilized to learn about traffic behavior at different times and locations, potentially resulting in major savings in time and fuel, the two important commodities of 21st century. As a first step towards the utilization of this data, in this paper, we study the real-world data collected from Los Angeles County transportation network in order to incorporate the data's intrinsic behavior into a time-series mining technique to enhance its accuracy for traffic prediction. In particular, we utilized the spatiotemporal behaviors of rush hours and events to perform a more accurate prediction of both short-term and long-term average speed on road-segments, even in the presence of infrequent events (e.g., accidents). Our result shows that taking historical rush-hour behavior we can improve the accuracy of traditional predictors by up to 67% and 78% in short-term and long-term predictions, respectively. Moreover, we can incorporate the impact of an accident to improve the prediction accuracy by up to 91%. | ||
17:00 | Granger Causality for Time-Series Anomaly Detection if($paper->short_paper) echo(" (Short)");?> Huida Qiu, Yan Liu, Niranjan A Subrahmanya, and Weichang Li |
DM274 |
Recent developments in industrial systems provide us with a large amount of time series data from sensors, logs, system settings and physical measurements, etc. These data are extremely valuable for providing insights about the complex systems and could be used to detect anomalies at early stages. However, the special characteristics of these time series data, such as high dimensions and complex dependencies between variables, as well as its massive volume, pose great challenges to existing anomaly detection algorithms. In this paper, we propose Granger graphical models as an effective and scalable approach for anomaly detection whose results can be readily interpreted. Specifically, Granger graphical models are a family of graphical models that exploit the temporal dependencies between variables by applying L1-regularized learning to Granger causality. Our goal is to efficiently compute a robust "correlation anomaly" score for each variable via Granger graphical models that can provide insights on the possible reasons of anomalies. We evaluate the effectiveness of our proposed algorithms on both synthetic and application datasets. The results show the proposed algorithm achieves significantly better performance than other baseline algorithms and is scalable for large-scale applications. | ||
17:12 | Progressive Mining of Transition Dynamics for Autonomous Control if($paper->short_paper) echo(" (Short)");?> Steven Loscalzo, Robert Wright, Kevin Acunto, and Lei Yu |
DM760 |
Autonomous agents are emerging in diverse areas and many rely on reinforcement learning (RL) to learn optimal control policies by acting in the environment. This form of learning generates large amounts of transition dynamics data, which can be mined to improve the agent's understanding of the environment. There could be many uses for this data, here we focus on mining it to identify a relevant feature subspace. This is vital since RL performs poorly in high-dimensional spaces, such as those that autonomous agents would commonly face in real-world problems. This paper demonstrates the necessity and feasibility of integrating data mining into the learning process while an agent is learning, enabling it to learn to act by both acting and understanding. Doing so requires overcoming challenges regarding data quantity and quality, and difficulty measuring feature relevance with respect to the control policy. We propose the progressive mining framework to address these challenges by relying on cyclic interaction between data mining and RL. We show that a feature selection algorithm developed under this framework, PROFESS, can improve RL scalability better than a competing approach. | ||
17:24 | Socialized Gaussian Process Model for Human Behavior Prediction in a Health Social Network if($paper->short_paper) echo(" (Short)");?> Yelong Shen, Ruoming Jin, Dejing Dou, Nafisa Chowdhury, Junfeng Sun, Brigitta Piniewski, and David Kil |
DM927 |
Modeling and predicting human behaviors, such as the activity level and intensity, is the key to prevent the cascades of obesity, and help spread wellness and healthy behavior in a social network. In this work, we propose a Socialized Gaussian Process (SGP) for socialized human behavior modeling. In the proposed SGP model, we naturally incorporates human's personal behavior factor and social correlation factor into a unified model, where basic Gaussian Process model is leveraged to capture individual's personal behavior pattern. Furthermore, we extend the Gaussian Process Model to socialized Gaussian Process (SGP) which aims to capture social correlation phenomena in the social network. The detailed experimental evaluation has shown the SGP model achieves the best prediction accuracy compared with other baseline methods. | ||
17:36 | Mining Personal Context-Aware Preferences for Mobile Users if($paper->short_paper) echo(" (Short)");?> Hengshu Zhu, Enhong Chen, Kuifei Yu, Huanhuan Cao, Hui Xiong, and Jilei Tian |
DM419 |
In this paper, we illustrate how to extract personal context-aware preferences from the context-rich device logs (i.e., context logs) for building novel personalized context-aware recommender systems. A critical challenge along this line is that the context log of each individual user may not contain sufficient data for mining his/her context-aware preferences. Therefore, we propose to first learn common context-aware preferences from the context logs of many users. Then, the preference of each user can be represented as a distribution of these common context-aware preferences. Specifically, we develop two approaches for mining common context-aware preferences based on two different assumptions, namely, context independent and context dependent assumptions, which can fit into different application scenarios. Finally, extensive experiments on a real-world data set show that both approaches are effective and outperform baselines with respect to mining personal context-aware preferences for mobile users. | ||
17:48 | Topic Models Over Spoken Language if($paper->short_paper) echo(" (Short)");?> Niketan Pansare, Chris Jermaine, Peter Haas, and Nitendra Rajput |
DM894 |
Virtually all work on topic modeling has assumed that the topics are to be learned over a text-based document corpus. However, there exist important applications where topic models must be learned over an audio corpus of spoken language. Unfortunately, speech-to-text programs can have very low accuracy. We therefore propose a novel topic model for spoken language that incorporates a statistical model of speech-to-text software behavior. Crucially, our model exploits the uncertainty numbers returned by the software. Our ideas apply to any domain in which it would be useful to build a topic model over data in which uncertainties are explicitly represented. |