You are here

Session Details

Graphs and Networks

Thursday, 13 December
16:00 – 18:00
Room: Rembrandt & Permeke
Session Chair: George Karypis

16:00 CT-IC: Continuously activated and Time-restricted Independent Cascade Model for Viral Marketing short_paper) echo(" (Short)");?>
Wonyeol Lee, Jinha Kim, and Hwanjo Yu
DM832

Influence maximization problem with applications to viral marketing has gained much attention. Underlying influence diffusion models affect influence maximizing nodes because they focus on difference aspect of influence diffusion. Nevertheless, existing diffusion models overlook two important aspects of real-world marketing - continuous trials and time restriction. This paper proposes a new realistic influence diffusion model called Continously activated and Time-restricted IC (CT-IC) model which generalizes the IC model by embedding the above two aspects. We first prove that CT-IC model satisfies two crucial properties -- monotonicity and submodularity. We then provide an efficient method for calculating exact influence spread when a social network is restricted to a directed tree and a simple path. Finally, we propose a scalable algorithm for influence maximization under CT-IC model called CT-IPA. Our experiments show that CT-IC model provides seeds of higher influence spread than IC model and CT-IPA is four orders of magnitude faster than the greedy algorithm while providing similar influence spread to the greedy algorithm.

16:12 Community-Affiliation Graph Model for Overlapping Network Community Detection short_paper) echo(" (Short)");?>
Jaewon Yang and Jure Leskovec
DM917

One of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Communities in networks often overlap as nodes can belong to multiple communities at once. Identifying such overlapping communities is crucial for the understanding the structure as well as the function of real-world networks. Even though community structure in networks has been widely studied in the past, practically all research makes an implicit assumption that overlaps between communities are less densely connected than the non-overlapping parts themselves. Here we validate this assumption on 6 large scale social, collaboration and information networks where nodes explicitly state their community memberships. By examining such ground-truth communities we find that the community overlaps are more densely connected than the non-overlapping parts, which is in sharp contrast to the conventional wisdom that community overlaps are more sparsely connected than the communities themselves. Practically all existing community detection methods fail to detect communities with dense overlaps. We propose Community-Affiliation Graph Model, a model-based community detection method that builds on bipartite node-community affiliation networks. Our method successfully captures overlapping, non-overlapping as well as hierarchically nested communities, and identifies relevant communities more accurately than the state-of-the-art methods in networks ranging from biological to social and information networks.

16:24 Predicting Directed Links using Nondiagonal Matrix Decompositions short_paper) echo(" (Short)");?>
Jérôme Kunegis and Jörg Fliege
DM489

We present a method for trust prediction based on no diagonal decompositions of the asymmetric adjacency matrix of a directed network. The method we propose is based on a no diagonal decomposition into directed components (DEDICOM), which we use to learn the coefficients of a matrix polynomial of the network's adjacency matrix. We show that our method can be used to compute better low-rank approximations to a polynomial of a network's adjacency matrix than using the singular value decomposition, and that a higher precision can be achieved at the task of predicting directed links than by undirected or bipartite methods.

16:36 Mining User Mobility Features for Next Place Prediction in Location-based Services short_paper) echo(" (Short)");?>
Anastasios Noulas, Salvatore Scellato, Neal Lathia, and Cecilia Mascolo
DM746

Mobile location-based services are thriving, providing an unprecedented opportunity to collect fine grained spatio-temporal data about the places users visit. This multi-dimensional source of data offers new possibilities to tackle established research problems on human mobility, but it also opens avenues for the development of novel mobile applications and services. In this work we study the problem of predicting the next venue a mobile user will visit, by exploring the predictive power offered by different facets of user behavior. We first analyze about 35 million check-ins made by about 1 million Foursquare users in over 5 million venues across the globe, spanning a period of five months. We then propose a set of features that aim to capture the factors that may drive users' movements. Our features exploit information on transitions between types of places, mobility flows between venues, and spatio-temporal characteristics of user check-in patterns. We further extend our study combining all individual features in two supervised learning models, based on linear regression and M5 model trees, resulting in a higher overall prediction accuracy. We find that the supervised methodology based on the combination of multiple features offers the highest levels of prediction accuracy: M5 model trees are able to rank in the top fifty venues one in two user check-ins, amongst thousands of candidate items in the prediction list.

16:48 Spatial Interpolation using Multiple Regression short_paper) echo(" (Short)");?>
Orlando Ohashi and Luís Torgo
DM785

Many real world data mining applications involve analyzing geo-referenced data. Frequently, this type of data sets are incomplete in the sense that not all geographical coordinates have measured values of the variable(s) of interest. This incompleteness may be caused by poor data collection, measurement errors, costs management and many other factors. These missing values may cause several difficulties in many applications. Spatial imputation/interpolation methods try to fill in these unknown values in geo-referenced data sets. In this paper we propose a new spatial imputation method based on machine learning algorithms and a series of data pre-processing steps. The key distinguishing factor of this method is allowing the use of data from faraway regions, contrary to the state of the art on spatial data mining. Images (e.g. from a satellite or video surveillance cameras) may also suffer from this incompleteness where some pixels are missing, which again may be caused by many factors. An image can be seen as a spatial data set in a Cartesian coordinates system, where each pixel (location) registers some value (e.g. degree of gray on a black and white image). Being able to recover the original image from a partial or incomplete version of the reality is a key application in many domains (e.g. surveillance, security, etc.). In this paper we evaluate our general methodology for spatial interpolation on this type of problems. Namely, we check the ability of our method to fill in unknown pixels on several images. We compare it to state of the art methods and provide strong experimental evidence of the advantages of our proposal.