You are here

Session Details

Efficient Algorithms 1

Tuesday, 11 December
13:30 – 15:30
Room: Salle des nations I
Session Chair: Joost Kok

13:30 Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems short_paper) echo(" (Short)");?>
Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, and Inderjit Dhillon

Matrix factorization, when the matrix has missing values, has become one of the leading techniques for recommender systems. To handle web-scale datasets with millions of users and billions of ratings, scalability becomes an important issue. Alternating Least Squares (ALS) and Stochastic Gradient Descent (SGD) are two popular approaches to compute matrix factorization. There has been a recent flurry of activity to parallelize these algorithms. However, due to the cubic time complexity in the target rank, ALS is not scalable to large-scale datasets. On the other hand, SGD conducts efficient updates but usually suffers from slow convergence that is sensitive to the parameters. Coordinate descent, a classical optimization approach, has been used for many other large-scale problems, but its application to matrix factorization for recommender systems has not been explored thoroughly. In this paper, we show that coordinate descent based methods have a more efficient update rule compared to ALS, and are faster and have more stable convergence than SGD. We study different update sequences and propose the CCD++ algorithm, which updatesrank-one factors one by one. In addition, CCD++ can be easily parallelized on both multi-core and distributed systems. We empirically show that CCD++ is much faster than ALS and SGD in both settings. As an example, on a synthetic dataset with 2 billion ratings, CCD++ is 4 times faster than both SGD and ALS using a distributed system with 20 machines.

13:50 Tuple MapReduce: Beyond classic MapReduce short_paper) echo(" (Short)");?>
Pedro Ferrera, Ivan de Prado, Eric Palacios, Jose Luis Fernandez-Marquez, and Giovanna Di Marzo Serugendo

This paper proposes Tuple Map Reduce, a new foundational model extending Map Reduce with the notion of tuples. Tuple Map Reduce allows to bridge the gap between the low-level constructs provided by Map Reduce and higher-level needs required by programmers, such as compound records, sorting or joins. This paper presents as well Pangool, an open-source framework implementing Tuple Map Reduce. Pangool eases the design and implementation of applications based on Map Reduce and increases their flexibility, still maintaining Hadoop's performance.

14:10 Parallelization with Multiplicative Algorithms for Big Data Mining short_paper) echo(" (Short)");?>
Dijun Luo, Chris Ding, and Heng Huang

We propose a nontrivial strategy to parallelize a series of data mining and machine learning problems, including 1-class and 2-class support vector machines, nonnegative least square problems, and l1 regularized regression (LASSO) problems. Our strategy fortunately leads to extremely simple multiplicative algorithms which can be straightforwardly implemented in parallel computational environments, such as Map Reduce, or CUDA. We provide rigorous analysis of the correctness and convergence of the algorithm. We demonstrate the scalability and accuracy of our algorithms in comparison with other current leading algorithms.

14:30 Efficient Pattern-Based Time Series Classification on GPU short_paper) echo(" (Short)");?>
Kai-Wei Chang, Biplab Deka, Wen-Mei W. Hwu, and Dan Roth

Time series shapelet discovery algorithm finds subsequences from a set of time series for use as primitives for time series classification. This algorithm has drawn a lot of interest because of the interpretability of its results. However, computation requirements restrict the algorithm from dealing with large data sets and may limit its application in many domains. In this paper, we address this issue by redesigning the algorithm for implementation on highly parallel Graphics Process Units (GPUs). We investigate several concepts of GPU programming and propose a dynamic programming algorithm that is suitable for implementation on GPUs. Results show that the proposed GPU implementation significantly reduces the running time of the shapelet discovery algorithm. For example, on the largest sample dataset from the original authors, the running time is reduced from half a day to two minutes.

14:50 GPU-Accelerated Feature Selection for Outlier Detection using the Local Kernel Density Ratio short_paper) echo(" (Short)");?>
Fatemeh Azmandian, Ayse Yilmazer, Jennifer G. Dy, Javed A. Aslam, and David R. Kaeli

Effective outlier detection requires the data to be described by a set of features that captures the behavior of normal data while emphasizing those characteristics of outliers which make them different than normal data. In this work, we present a novel non-parametric evaluation criterion for filter-based feature selection which caters to outlier detection problems. The proposed method seeks the subset of features that represents the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared to popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.

15:10 Scalable Training of Sparse Linear SVM short_paper) echo(" (Short)");?>
Guo-Xun Yuan and Kwan-Liu Ma

Sparse linear support vector machines have been widely applied to variable selection in many applications. For large data, managing the cost of training a sparse model with good predication performance is an essential topic. In this work, we propose a scalable training algorithm for large-scale data with millions of examples and features. We develop a dual alternating direction method for solving L1-regularized linear SVMs. The learning procedure simply involves quadratic programming in the same form as the standard SVM dual, followed by a soft-thresholding operation. The proposed training algorithm possesses two favorable properties. First, it is a decomposable algorithm by which a large problem can be reduced to small ones. Second, the sparsity of intermediate solutions is maintained throughout the training process. It naturally promotes the solution sparsity by soft-thresholding. We demonstrate that, by experiments, our method outperforms state-of-the-art approaches on large-scale benchmark data sets. We also show that it is well suited for training large sparse models on a distributed system.