You are here

Session Details

Social Networks 2

Wednesday, 12 December
10:00 – 12:00
Room: Salle des nations II
Session Chair: Jie Tang

10:00 Sequential Network Change Detection with Its Applications to Ad Impact Relation Analysis short_paper) echo(" (Short)");?>
Yu Hayashi and Kenji Yamanishi
DM300

We are concerned with the issue of tracking changes of variable dependencies from multivariate time series. Conventionally, this issue has been addressed in the batch scenario where the whole data set is given at once, and the change detection must be done in a retrospective way. This paper addresses this issue in a sequential scenario where multivariate data are sequentially input and the detection must be done in a sequential fashion. We propose a new method for sequential tracking of variable dependencies. In it we employ a Bayesian network as a representation of variable dependencies. The key ideas of our method are, 1) we extend the theory of dynamic model selection (DMS), which has been developed in the batch-learning scenario, into the sequential setting, and apply it to our issue, 2) we conduct the change detection sequentially using dynamic programming per a window where we employ the Hoeffding's bound to automatically determine the window size. We empirically demonstrate that our proposed method is able to perform change detection more efficiently than a conventional batch method. Further, we give a new framework of an application of variable dependency change detection, which we call Ad Impact Relation analysis (AIR). In it, we detect the time point when a commercial message advertisement has given an impact on the market and effectively visulaize the impact through network changes. We employ real data sets to demonstrate the validity of AIR.

10:20 GUISE: Uniform Sampling of Graphlets for Large Graph Analysis short_paper) echo(" (Short)");?>
Mansurul A Bhuiyan, Mahmudur Rahman, Mahmuda Rahman, and Mohammad Al Hasan
DM947

Graphlet frequency distribution (GFD) has recently become popular for characterizing large networks. However, the computation of GFD for a network requires the exact count of embedded graphlets in that network, which is a computationally expensive task. As a result, it is practically infeasible to compute the GFD for even a moderately large network. In this paper, we propose GUISE, which uses a Markov Chain Monte Carlo (MCMC) sampling method for constructing the approximate GFD of a large network. Our experiments on networks with millions of nodes show that GUISE obtains the GFD within few minutes, whereas the exhaustive counting based approach takes several days.

10:40 Reliable Clustering on Uncertain Graphs short_paper) echo(" (Short)");?>
Lin Liu, Ruoming Jin, Charu Aggarwal, and Yelong Shen
DM871

Many graphs in practical applications are not deterministic, but are probabilistic in nature because the existence of the edges is inferred with the use of a variety of statistical approaches. In this paper, we will examine the problem of clustering uncertain graphs. Uncertain graphs are best clustered with the use of a possible worlds model in which the most reliable clusters are discovered in the presence of uncertainty. Reliable clusters are those which are not likely to be disconnected in the context of different instantiations of the uncertain graph. We present experimental results which illustrate the effectiveness of our model and approach.

11:00 Towards Annotating Media Contents through Social Diffusion Analysis short_paper) echo(" (Short)");?>
Tong Xu, Dong Liu, Enhong Chen, Huanhuan Cao, and Jilei Tian
DM333

Recently, the boom of media contents on the Internet raises challenges in managing them effectively and thus requires automatic media annotation techniques. Motivated by the observation that media contents are usually shared frequently in online communities and thus have a lot of social diffusion records, we propose a novel media annotating approach depending on these social diffusion records instead of metadata. The basic assumption is that the social diffusion records reflect the common interests (CI) between users, which can be analyzed for generating annotations. With this assumption, we present a novel CI-based social diffusion model and translate the automatic annotating task into the CI-based diffusion maximization (CIDM) problem. Moreover, we propose to solve the CIDM problem through two optimization tasks, corresponding to the training and test stages in supervised learning. Extensive experiments on real-world data sets show that our approach can effectively generate high quality annotations, and thus demonstrate the capability of social diffusion analysis in annotating media.

11:12 Inferring the Underlying Structure of Information Cascades short_paper) echo(" (Short)");?>
Bo Zong, Yinghui Wu, Ambuj K. Singh, and Xifeng Yan
DM385

In social networks, information and influence diffuse among users as cascades. While the importance of studying cascades has been recognized in various applications, it is difficult to observe the complete structure of cascades in practice. In this paper we study the cascade inference problem following the independent cascade model, and provide a full treatment from complexity to algorithms: (a) we propose the idea of consistent trees as the inferred structures for cascades, these trees connect source nodes and observed nodes with paths satisfying the constraints from the observed temporal information. (b) We introduce metrics to measure the likelihood of consistent trees as inferred cascades, as well as several optimization problems for finding them. (c) We show that the decision problems for consistent trees are in general NP-complete, and that the optimization problems are hard to approximate. (d) We provide approximation algorithms with performance guarantees on the quality of the inferred cascades, as well as heuristics. We experimentally verify the efficiency and effectiveness of our inference algorithms, using real and synthetic data.

11:24 Detecting Spam and Promoting Campaigns in the Twitter Social Network short_paper) echo(" (Short)");?>
Xianchao Zhang, Shaoping Zhu, and Wenxin Liang
DM440

The Twitter social network has become a target platform for both promoters and stammers to disseminate their target messages. There are a large number of campaigns containing coordinated spam or promoting accounts in Twitter, which are more harmful than the traditional methods, such as email spamming. Since traditional solutions mainly check individual accounts or messages, it is an urgent task to detect spam and promoting campaigns in Twitter. In this paper, we propose a scalable framework to detect both spam and promoting campaigns. Our framework consists of three steps: firstly linking accounts who post URLs for similar purposes, secondly extracting candidate campaigns which may exist for spam or promoting purpose and finally distinguishing their intents. One salient aspect of the framework is introducing a URL-driven estimation method to measure the similarity between accounts' purposes of posting URLs, the other one is proposing multiple features to distinguish the candidate campaigns based on a machine learning method. Over a large-scale dataset from Twitter, we can extract the actual campaigns with high precision and recall and distinguish the majority of the candidate campaigns correctly.

11:36 Towards Automatic Image Understanding and Mining via Social Curation short_paper) echo(" (Short)");?>
Katsuhiko Ishiguro, Akisato Kimura, and Koh Takeuchi
DM533

The amount and variety of multimedia data such as images, movies and music available on over social networks are increasing rapidly. However, the ability to analyze and exploit these unorganized multimedia data remains inadequate, even with state-of-the-art media processing techniques. Our finding in this paper is that the emerging social curation service is a promising information source for the automatic understanding and mining of images distributed and exchanged via social media. One remarkable virtue of social curation service datasets is that they are weakly supervised: the content in the service is manually collected, selected and maintained by users. This is very different from other social information sources, and we can utilize this characteristics for media content mining without expensive media processing techniques. In this paper we present a machine learning system for predicting view counts of images in social curation data as the first step to automatic image content evaluation. Our experiments confirm that the simple features extracted from a social curation corpus are much superior in terms of count prediction than the gold-standard image features of computer vision research.

11:48 IRIE: Scalable and Robust Influence Maximization in Social Networks short_paper) echo(" (Short)");?>
Kyomin Jung, Wooram Heo, and Wei Chen
DM613

Influence maximization is the problem of selecting top k seed nodes in a social network to maximize their influence coverage under certain influence diffusion models. In this paper, we propose a novel algorithm IRIE that integrates the advantages of influence ranking (IR) and influence estimation (IE) methods for influence maximization in both the independent cascade (IC) model and its extension IC-N that incorporates negative opinion propagations. Through extensive experiments, we demonstrate that IRIE matches the influence coverage of other algorithms while scales much better than all other algorithms. Moreover IRIE is much more robust and stable than other algorithms both in running time and memory usage for various density of networks and cascade size. It runs up to two orders of magnitude faster than other state-of-the-art algorithms such as PMIA for large networks with tens of millions of nodes and edges, while using only a fraction of memory.