You are here

Session Details

Privacy and Security

Tuesday, 11 December
13:30 – 15:30
Room: Salle des nations II
Session Chair: Bettina Berendt

13:30 Reconstructing Graphs from Neighborhood Data short_paper) echo(" (Short)");?>
Dora Erdos, Rainer Gemulla, and Evimaria Terzi
DM387

Consider a social network and suppose that we are given the number of common friends between each pair of users. Can we reconstruct the underlying network? Similarly, consider a set of documents and the words that appear in them. If we know the number of common words for every pair of documents, as well as the number of common documents for every pair of words, can we infer which words appear in which documents? In this paper, we develop a general methodology for answering questions like the ones above. We formalize these questions in what we call the Reconstruct problem: Given information about the common neighbors of nodes in a network, our goal is to reconstruct the hidden binary matrix that indicates the presence or absence of relationships between individual nodes. We propose an effective and practical heuristic, which exploits properties of the singular value decomposition of the hidden binary matrix. More specifically, we show that using the available neighborhood information, we can reconstruct the hidden matrix by finding the components of its singular value decomposition and then combining them appropriately. Our extensive experimental study suggests that our methods are able to reconstruct binary matrices of different characteristics with up to 100% accuracy.

13:50 Differentially Private Histogram Publishing through Lossy Compression short_paper) echo(" (Short)");?>
Gergely Acs, Claude Castelluccia, and Rui Chen
DM928

Differential privacy has emerged as one of the most promising privacy models for private data release. It can be used to release different types of data, and, in particular, histograms, which provide useful summaries of a dataset. Several differentially private histogram releasing schemes have been proposed recently. However, most of them directly add noise to the histogram counts, resulting in undesirable accuracy. In this paper, we propose two sanitization techniques that exploit the inherent redundancy of real-life datasets in order to boost the accuracy of histograms. They lossily compress the data and sanitize the compressed data. Our first scheme is an optimization of the Fourier Perturbation Algorithm (FPA) presented in {RN10}. It improves the accuracy of the initial FPA by a factor of 10. The other scheme relies on clustering and exploits the redundancy between bins. Our extensive experimental evaluation over various real-life and synthetic datasets demonstrates that our techniques preserve very accurate distributions and considerably improve the accuracy of range queries over attributed histograms.

14:10 Class Probability Estimates are Unreliable for Imbalanced Data (and How to Fix Them) short_paper) echo(" (Short)");?>
Byron C. Wallace and Issa J. Dahabreh
DM578

Obtaining good probability estimates is imperative for many applications. The increased uncertainty and typically asymmetric costs surrounding rare events increases this need. Experts (and classification systems) often rely on probabilities to inform decisions. However, we demonstrate that class probability estimates attained via supervised learning in imbalanced scenarios systematically underestimate the probabilities for minority class instances, despite ostensibly good overall calibration. To our knowledge, this problem has not previously been explored. Motivated by our exposition of this issue, we propose a simple, effective and theoretically motivated method to mitigate the bias of probability estimates for imbalanced data that bags estimators calibrated over balanced bootstrap samples. This approach drastically improves performance on the minority instances without greatly affecting overall calibration. We show that additional uncertainty can be exploited via a Bayesian approach by considering posterior distributions over bagged probability estimates.

14:30 A General Framework for Publishing Privacy Protected and Utility Preserved Graph short_paper) echo(" (Short)");?>
Mingxuan Yuan, Lei Chen, Weixiong Rao, and Hong Mei
DM494

The privacy protection of graph data has become more and more important in recent years. Many works have been proposed to publish a privacy preserving graph. All these works prefer publishing a graph, which guarantees the protection of certain privacy with the smallest change to the original graph. However, there is no guarantee on how the utilities are preserved in the published graph. In this paper, we propose a general fine-grained adjusting framework to publish a privacy protected and utility preserved graph. With this framework, the data publisher can get a trade-off between the privacy and utility according to his customized preferences. We used the protection of a weighted graph as an example to demonstrate the implementation of this framework.

14:42 Privacy-preserving SimRank over Distributed Information Network short_paper) echo(" (Short)");?>
Yu-Wei Chu, Chih-Hua Tai, Ming-Syan Chen, and Philip S. Yu
DM682

Information network analysis has drawn a lot attention in recent years. Among all the aspects of network analysis, similarity measure of nodes has been shown useful in many applications, such as clustering, link prediction and community identification, to name a few. As linkage data in a large network is inherently sparse, it is noted that collecting more data can improve the quality of similarity measure. This gives different parties a motivation to cooperate. In this paper, we address the problem of link-based similarity measure of nodes in an information network distributed over different parties. Concerning the data privacy, we propose a privacy-preserving Sim Rank protocol based on fully-homomorphic encryption to provide cryptographic protection for the links.

14:54 Mining Permission Request Patterns from Android and Facebook Applications short_paper) echo(" (Short)");?>
Mario Frank, Ben Dong, Adrienne Porter Felt, and Dawn Song
DM370

Android and Face book provide third-party applications with access to users' private data and the ability to perform potentially sensitive operations (e.g., post to a user's wall or place phone calls). As a security measure, these platforms restrict applications' privileges with permission systems: users must approve the permissions requested by applications before the applications can make privacy-or security-relevant API calls. However, recent studies have shown that users often do not understand permission requests and are unsure of which permissions are typical for applications. As a first step towards simplifying permission systems, we cluster a corpus of 188,389 Android applications and 27,029 Face book applications to find patterns in permission requests. Using a method for Boolean matrix factorization to find overlapping clusters of permissions, we find that Face book permission requests follow a clear structure that can be fitted well with only five patterns, whereas Android applications demonstrate more complex permission requests. We also find that low-reputation applications often deviate from the permission request patterns that we identified for high-reputation applications, which suggests that permission request patterns can be indicative of user satisfaction or application quality.

15:06 Multiple Kernel Learning Clustering with an Application to Malware short_paper) echo(" (Short)");?>
Blake Anderson, Curtis Storlie, and Terran Lane
DM716

With the increasing prevalence of richer, more complex data sources, learning with multiple views is becoming more widespread. Multiple kernel learning (MKL) has been developed to address this problem, but in general, the solutions provided by traditional MKL are restricted to a classification objective function. In this work, we develop a novel multiple kernel learning algorithm that is based on a spectral clustering objective function which is able to find an optimal kernel weight vector for the clustering problem. We go on to show how this optimization problem can be cast as a semidefinite program and efficiently solved using off-the-shelf interior point methods.

15:18 Risks of Friendships on Social Networks short_paper) echo(" (Short)");?>
Cuneyt Gurcan Akcora, Barbara Carminati, and Elena Ferrari
DM674

In this paper, we explore the risks of friends in social networks caused by their friendship patterns, by using real life social network data and starting from a previously defined risk model. Particularly, we observe that risks of friendships can be mined by analyzing users' attitude towards friends of friends. This allows us to give new insights into friendship and risk dynamics on social networks.