You are here

Session Details

Classification 2

Wednesday, 12 December
10:00 – 12:00
Room: Rembrandt & Permeke
Session Chair: David Martens

10:00 The mixture of multi-kernel relevance vector machines model short_paper) echo(" (Short)");?>
Konstantinos Blekas and Aristidis Likas

We present a new regression mixture model where each mixture component is a multi-kernel version of the Relevance Vector Machine (RVM). In the proposed model, we exploit the enhanced modeling capability of RVMs due to their embedded sparsity enforcing properties. Moreover, robustness is achieved with respect to the kernel parameters, by employing a weighted multi-kernel scheme. The mixture model is trained using the maximum a posteriori (MAP) approach, where the Expectation Maximization (EM) algorithm is applied offering closed form update equations for the model parameters. An incremental learning methodology is also presented to tackle the parameter initialization problem of the EM algorithm. The efficiency of the proposed mixture model is empirically demonstrated on the time series clustering problem using various artificial and real benchmark datasets and by performing comparisons with other regression mixture models.

10:20 Self-Training with Selection-by-Rejection short_paper) echo(" (Short)");?>
Yan Zhou, Murat Kantarcioglu, and Bhavani Thuraisingham

Practical machine learning and data mining problems often face shortage of labeled training data. Self-training algorithms are among the earliest attempts of using unlabeled data to enhance learning. Traditional self-training algorithms label unlabeled data on which classifiers trained on limited training data have the highest confidence. In this paper, a self-training algorithm that decreases the disagreement region of hypotheses is presented. The algorithm supplements the training set with self-labeled instances. Only instances that greatly reduce the disagreement region of hypotheses are labeled and added to the training set. Empirical results demonstrate that the proposed self-training algorithm can effectively improve classification performance.

10:40 Co-Labeling: A New Multi-view Learning Approach for Ambiguous Problems short_paper) echo(" (Short)");?>
Wen Li, Lixin Duan, Ivor Wai-Hung Tsang, and Dong Xu

We propose a multi-view learning approach called co-labeling which is applicable for several machine learning problems where the labels of training samples are uncertain, including semi-supervised learning (SSL), multi-instance learning (MIL) and max-margin clustering (MMC). Particularly, we first unify those problems into a general ambiguous problem in which we simultaneously learn a robust classifier as well as find the optimal training labels from a finite label candidate set. To effectively utilize multiple views of data, we then develop our co-labeling approach for the general multi-view ambiguous problem. In our work, classifiers trained on different views can teach each other by iteratively passing the predictions of training samples from one classifier to the others. The predictions from one classifier are considered as label candidates for the other classifiers. To train a classifier with a label candidate set for each view, we adopt the Multiple Kernel Learning (MKL) technique by constructing the base kernel through associating the input kernel calculated from input features with one label candidate. Compared with the traditional co-training method which was specifically designed for SSL, the advantages of our co-labeling are two-fold: 1) it can be applied to other ambiguous problems such as MIL and MMC, 2) it is more robust by using the MKL method to integrate multiple labeling candidates obtained from different iterations and biases. Promising results on several real-world multi-view data sets clearly demonstrate the effectiveness of our proposed co-labeling for both MIL and SSL.

11:00 Graph-oriented Learning via Automatic Group Sparsity for Data Analysis short_paper) echo(" (Short)");?>
Yuqiang Fang, Ruili Wang, and Bin Dai

The key task in graph-oriented learning is con-structing an informative graph to model the geometrical and discriminant structure of a data manifold. Since traditional graph construction methods are sensitive to noise and less datum-adaptive to changes in density, a new graph construction method so-called L1-Graph has been proposed [1] recently. A graph construction method needs to have two important properties: sparsity and locality. However, the L1-Graph is strong in sparsity property, but weak in locality. In order to overcome such limitation, we propose a new method of constructing an informative graph using automatic group sparse regularization based on the work of L1-Graph, which is called as group sparse graph (GroupSp-Graph). The newly developed GroupSp-Graph has the same noise-insensitive property as L1-Graph, and also can successively preserve the group and local information in the graph. In other words, the proposed group sparse graph has both properties of sparsity and locality simultaneously. Furthermore, we integrate the proposed graph with several graph-oriented learning algorithms: spectral em-bedding, spectral clustering, subspace learning and manifold regularized non-negative matrix factorization. The empirical studies on benchmark data sets show that the proposed algo-rithms achieve considerable improvement over classic graph constructing methods and the L1-Graph method in various learning task.

11:20 Ensemble Pruning via Constrained Eigen-Optimization short_paper) echo(" (Short)");?>
Linli Xu, Bo Li, and Enhong Chen

An ensemble is composed of a set of base learners that make predictions jointly. The generalization performance of an ensemble has been justified both theoretically and in practice. However, existing ensemble learning methods sometimes produce unnecessarily large ensembles, with an expense of extra computational costs and memory consumption. The purpose of ensemble pruning is to select a subset of base learners with comparable or better prediction performance. In this paper, we formulate the ensemble pruning problem into a combinatorial optimization problem with the goal to maximize the accuracy and diversity at the same time. Solving this problem exactly is computationally hard. Fortunately, we can relax and reformulate it as a constrained eigenvector problem, which can be solved with an efficient algorithm that is guaranteed to converge globally. Convincing experimental results demonstrate that this optimization based ensemble pruning algorithm outperforms the state-of-the-art heuristics in the literature.

11:40 Healing Truncation Bias : Self-weighted Truncation framework for Dual Averaging short_paper) echo(" (Short)");?>
Hidekazu Oiwa, Shin Matsushima, and Hiroshi Nakagawa

We propose a new truncation framework for online supervised learning. Learning a compact predictive model in an online setting has recently attracted a great deal of attention. The combination of online learning with sparsity-inducing regularization enables faster learning with a smaller memory space than a conventional learning framework. However, a simple combination of these triggers the truncation of weights whose corresponding features rarely appear, even if these features are crucial for prediction. Furthermore, it is difficult to emphasize these features in advance while preserving the advantages of online learning. We develop an extensional truncation framework to Dual Averaging, which retains rarely occurring but informative features. Our proposed framework integrates information on all previous sub gradients of the loss functions into a regularization term. Our enhancement of a conventional L1-regularization accomplishes the automatic adjustment of each feature's truncations. This extension enables us to identify and retain rare but informative features without preprocessing. In addition, our framework achieves the same computational complexity and regret bound as standard Dual Averaging. Experiments demonstrated that our framework outperforms other sparse online learning algorithms.