You are here

Session Details

Dynamics

Thursday, 13 December
16:00 – 18:00
Room: Salle des nations I
Session Chair: Nikolaj Tatti

16:00 Efficient Episode Mining of Dynamic Event Streams short_paper) echo(" (Short)");?>
Debprakash Patnaik, Srivatsan Laxman, Badrish Chandramouli, and Naren Ramakrishnan
DM477

Discovering frequent episodes over event sequences is an important data mining problem. Existing methods typically require multiple passes over the data, rendering them unsuitable for streaming contexts. We present the first streaming algorithm for mining frequent episodes over a window of recent events in the stream. We derive approximation guarantees for our algorithm in terms of: (i) the separation of frequent episodes from infrequent ones, and (ii) the rate of change of stream characteristics. Our parameterization of the problem provides a new sweet spot in the tradeoff between making distributional assumptions over the stream and algorithmic efficiencies of mining. We illustrate how this yields significant benefits when mining practical streams from neuroscience and telecommunications logs.

16:20 Dynamic Boolean Matrix Factorizations short_paper) echo(" (Short)");?>
Pauli Miettinen
DM880

Boolean matrix factorization is a method to decompose a binary matrix into two binary factor matrices. Akin to other matrix factorizations, the factor matrices can be used for various data analysis tasks. Many (if not most) real-world data sets are dynamic, though, meaning that new information is recorded over time. Incorporating this new information into the factorization can require a re-computation of the factorization -- something we cannot do if we want to keep our factorization up-to-date after each update. This paper proposes a method to dynamically update the Boolean matrix factorization when new data is added to the data base. This method is extended with a mechanism to improve the factorization with a trade-off in speed of computation. The method is tested with a number of real-world and synthetic data sets including studying its efficiency against off-line methods. The results show that with good initialization the proposed online and dynamic methods can beat the state-of-the-art offline Boolean matrix factorization algorithms.

16:40 Local and Global Algorithms for Learning Dynamic Bayesian Networks short_paper) echo(" (Short)");?>
Nguyen Xuan Vinh, Madhu Chetty, Ross Coppel, and Pramod P. Wangikar
DM230

Learning optimal Bayesian networks (BN) from data is NP-hard in general. Nevertheless, certain BN classes with additional topological constraints, such as the dynamic BN (DBN) models, widely applied in specific fields such as systems biology, can be efficiently learned in polynomial time. Such algorithms have been developed for the Bayesian-Dirichlet (BD), Minimum Description Length (MDL), and Mutual Information Test (MIT) scoring metrics. The BD-based algorithm admits a large polynomial bound, hence it is impractical for even modestly sized networks. The MDL-and MIT-based algorithms admit much smaller bounds, but require a very restrictive assumption that all variables have the same cardinality, thus significantly limiting their applicability. In this paper, we first propose an improvement to the MDL-and MIT-based algorithms, dropping the equicardinality constraint, thus significantly enhancing their generality. We also explore local Markov blanket based algorithms for constructing BN in the context of DBN, and show an interesting result: under the faithfulness assumption, the mutual information test based local Markov blanket algorithms yield the same network as learned by the global optimization MIT-based algorithm. Experimental validation on small and large scale genetic networks demonstrates the effectiveness of our proposed approaches.

17:00 Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media short_paper) echo(" (Short)");?>
Himabindu Lakkaraju, Indrajit Bhattacharya, and Chiranjib Bhattacharyya
DM829

We study the problem of analyzing influence of various factors affecting individual messages posted in social media. The problem is challenging because of various types of influences propagating through the social media network that act simultaneously on any user. Additionally, the topic composition of the influencing factors and the susceptibility of users to these influences evolve over time. This problem has not been studied before, and off-the-shelf models are unsuitable for this purpose. To capture the complex interplay of these various factors, we propose a new non-parametric model called the Dynamic Multi-Relational Chinese Restaurant Process. This accounts for the user network for data generation and also allows the parameters to evolve over time. Designing inference algorithms for this model suited for large scale social-media data is another challenge. To this end, we propose a scalable and multi-threaded inference algorithm based on online Gibbs Sampling. Extensive evaluations on large-scale Twitter and Face book data show that the extracted topics when applied to authorship and commenting prediction outperform state-of-the-art baselines. More importantly, our model produces valuable insights on topic trends and user personality trends beyond the capability of existing approaches.

17:20 Time Constrained Influence Maximization in Social Networks short_paper) echo(" (Short)");?>
Bo Liu, Gao Cong, Dong Xu, and Yifeng Zeng
DM382

Influence maximization is a fundamental research problem in social networks. Viral marketing, one of its applications, is to get a small number of users to adopt a product, which subsequently triggers a large cascade of further adoptions by utilizing "Word-of-Mouth"; effect in social networks. Influence maximization problem has been extensively studied recently. However, none of the previous work considers the time constraint in the influence maximization problem. In this paper, we propose the time constrained influence maximization problem. We show that the problem is NP-hard, and prove the monotonicity and submodularity of the time constrained influence spread function. Based on this, we develop a greedy algorithm with performance guarantees. To improve the algorithm scalability, we propose two Influence Spreading Path based methods. Extensive experiments conducted over four public available datasets demonstrate the efficiency and effectiveness of the Influence Spreading Path based methods.

17:40 Computational Television Advertising short_paper) echo(" (Short)");?>
Suhrid Balakrishnan, Sumit Chopra, David Applegate, and Simon Urbanek
DM923

Ever wonder why that Kia Ad ran during Iron Chef? Traditional advertising methodology on television is a fascinating mix of marketing, branding, measurement, and predictive modeling. While still a robust business, it is at risk with the recent growth of online and time-shifted (recorded) television. A particular issue is that traditional methods for television advertising are far less efficient than their counterparts in the online world which employ highly sophisticated computational techniques. This paper formalizes an approach to eliminate some of these inefficiencies by recasting the process of television advertising media campaign generation in a computational framework. We describe efficient mathematical approaches to solve for the task of finding optimal campaigns for specific target audiences. In two case studies, our campaigns report gains in key operational metrics of up to 56% compared to campaigns generated by traditional methods.