You are here

Session Details

Classification 3

Thursday, 13 December
10:00 – 12:00
Room: Rembrandt & Permeke
Session Chair: Srivatsan Laxman

10:00 Transductive Representation Learning for Cross-Lingual Text Classification short_paper) echo(" (Short)");?>
Yuhong Guo and Min Xiao
DM409

In cross-lingual text classification problems, it is costly and time-consuming to annotate documents for each individual language. To avoid the expensive re-labeling process, domain adaptation techniques can be applied to adapt a learning system trained in one language domain to another language domain. In this paper we develop a transductive subspace representation learning method to address domain adaptation for cross-lingual text classifications. The proposed approach is formulated as a nonnegative matrix factorization problem and solved using an iterative optimization procedure. Our empirical study on cross-lingual text classification tasks shows the proposed approach consistently outperforms a number of comparison methods.

10:12 Learning Target Predictive Function without Target Labels short_paper) echo(" (Short)");?>
Chun-Wei Seah, Ivor Wai-Hung Tsang, Yew-Soon Ong, and Qi Mao
DM427

In the absence of the labeled samples in a domain referred to as target domain, Domain Adaptation (DA) techniques come in handy. Generally, DA techniques assume there are available source domains that share similar predictive function with the target domain. Two core challenges of DA typically arise, variance that exists between source and target domains, and the inherent source hypothesis bias. In this paper, we first propose a Stability Transfer criterion for selecting relevant source domains with low variance. With this criterion, we introduce a TARget learning Assisted by Source Classifier Adaptation (TARASCA) method to address the two core challenges that have impeded the performances of DA techniques. To verify the robustness of TARASCA, extensive experimental studies are carried out with comparison to several state-of-the-art DA methods on the real-world Sentiment and Newsgroups datasets, where various settings for the class ratios of the source and target domains are considered.

10:24 Fast Kernel Sparse Representation Approaches for Classification short_paper) echo(" (Short)");?>
Yifeng Li and Alioune Ngom
DM316

Sparse representation involves two relevant procedures - sparse coding and dictionary learning. Learning a dictionary from data provides a concise knowledge representation. Learning a dictionary in a higher feature space might allow a better representation of a signal. However, it is usually computationally expensive to learn a dictionary if the numbers of training data and(or) dimensions are very large using existing algorithms. In this paper, we propose a kernel dictionary learning framework for three models. We reveal that the optimization has dimension-free and parallel properties. We devise fast active-set algorithms for this framework. We investigated their performance on classification. Experimental results show that our kernel sparse representation approaches can obtain better accuracy than their linear counterparts. Furthermore, our active-set algorithms are faster than the existing interior-point and proximal algorithms.

10:36 Learning Attitudes and Attributes from Multi-Aspect Reviews short_paper) echo(" (Short)");?>
Julian McAuley, Jure Leskovec, and Dan Jurafsky
DM476

Most online reviews consist of plain-text feedback together with a single numeric score. However, understanding the multiple `aspects' that contribute to users' ratings may help us to better understand their individual preferences. For example, a user's impression of an audio book presumably depends on aspects such as the story and the narrator, and knowing their opinions on these aspects may help us to recommend better products. In this paper, we build models for rating systems in which such dimensions are explicit, in the sense that users leave separate ratings for each aspect of a product. By introducing new corpora consisting of five million reviews, rated with between three and six aspects, we evaluate our models on three prediction tasks: First, we uncover which parts of a review discuss which of the rated aspects. Second, we summarize reviews by finding the sentences that best explain a user's rating. Finally, since aspect ratings are optional in many of the datasets we consider, we recover ratings that are missing from a user's evaluation. Our model matches state-of-the-art approaches on existing small-scale datasets, while scaling to the real-world datasets we introduce. Moreover, our model is able to `disentangle' content and sentiment words: we automatically learn content words that are indicative of a particular aspect as well as the aspect-specific sentiment words that are indicative of a particular rating.

10:48 Simultaneously Combing Multi-View Multi-Label Learning with Maximum Margin Classification short_paper) echo(" (Short)");?>
Zheng Fang and Zhongfei (Mark) Zhang
DM511

Multiple feature views arise in various important data classification scenarios. However, finding a consensus feature view from multiple feature views for a classifier is still a challenging task. We present a new classification framework using the multi-label correlation information to address the problem of simultaneously combining multiple feature views and maximum margin classification. Under this framework, we propose a novel algorithm that iteratively computes the multiple view feature mapping matrices, the consensus feature view representation, and the coefficients of the classifier. Extensive experimental evaluations demonstrate the effectiveness and promise of this framework as well as the algorithm for discovering a consensus view from multiple feature views.

11:00 Decision Theory for Discrimination-aware Classification short_paper) echo(" (Short)");?>
Faisal Kamiran, Asim Karim, and Xiangliang Zhang
DM532

Social discrimination (e.g., against females) arising from data mining techniques is a growing concern worldwide. In recent years, several methods have been proposed for making classifiers learned over discriminatory data discrimination-aware. However, these methods suffer from two major shortcomings: (1) They require either modifying the discriminatory data or tweaking a specific classification algorithm and (2) They are not flexible w.r.t. discrimination control and multiple sensitive attribute handling. In this paper, we present two solutions for discrimination-aware classification that neither require data modification nor classifier tweaking. Our first and second solutions exploit, respectively, the reject option of probabilistic classifier(s) and the disagreement region of general classifier ensembles to reduce discrimination. We relate both solutions with decision theory for better understanding of the process. Our experiments using real-world datasets demonstrate that our solutions outperform existing state-of-the-art methods, especially at low discrimination which is a significant advantage. The superior performance coupled with flexible control over discrimination and easy applicability to multiple sensitive attributes makes our solutions an important step forward in practical discrimination-aware classification.

11:12 Active Label Correction short_paper) echo(" (Short)");?>
Umaa Rebbapragada, Carla E. Brodley, Damien Sulla-Menashe, and Mark Friedl
DM741

Active Label Correction (ALC) is an interactive method that cleans an established training set of mislabeled examples in conjunction with a domain expert. ALC presumes that the expert who conducts this review is either more accurate than the original annotator or has access to additional resources that ensure a high quality label. A high-cost re-review is possible because ALC proceeds iteratively, scoring the full training set but selecting only small batches of examples that are likely mislabeled. The expert reviews each batch and corrects any mislabeled examples, after which the classifier is retrained and the process repeats until the expert terminates it. We compare several instantiations of ALC to fully-automated methods that attempt to discard or correct label noise in a single pass. Our empirical results show that ALC outperforms single-pass methods in terms of selection efficiency and classifier accuracy. We evaluate the best ALC instantiation on our motivating task of detecting mislabeled and poorly formulated sites within a land cover classification training set from the geography domain.

11:24 Sparse Bayesian Adversarial Learning Using Relevance Vector Machine Ensembles short_paper) echo(" (Short)");?>
Yan Zhou, Murat Kantarcioglu, and Bhavani Thuraisingham
DM611

Data mining tasks are made more complicated when adversaries attack by modifying malicious data to evade detection. The main challenge lies in finding a robust learning model that is insensitive to unpredictable malicious data distribution. In this paper, we present a sparse relevance vector machine ensemble for adversarial learning. The novelty of our work is the use of individualized kernel parameters to model potential adversarial attacks during model training. We allow the kernel parameters to drift in the direction that minimizes the likelihood of the positive data. This step is interleaved with learning the weights and the weight priors of a relevance vector machine. Our empirical results demonstrate that an ensemble of such relevance vector machine models is more robust to adversarial attacks.

11:36 An AdaBoost Algorithm for Multiclass Semi-Supervised Learning short_paper) echo(" (Short)");?>
Jafar Tanha, Maarten van Someren, and Hamideh Afsarmanesh
DM792

We present an algorithm for multiclass Semi-Supervised learning which is learning from a limited amount of labeled data and plenty of unlabeled data. Existing semi-supervised algorithms use approaches such as one-versus-all to convert the multiclass problem to several binary classification problems which is not optimal. We propose a multiclass semi-supervised boosting algorithm that solves multiclass classification problems directly. The algorithm is based on a novel multiclass loss function consisting of the margin cost on labeled data and two regularization terms on labeled and unlabeled data. Experimental results on a number of UCI datasets show that the proposed algorithm performs better than the state-of-the-art boosting algorithms for multiclass semi-supervised learning.

11:48 A Classification Based Framework For Concept Summarization short_paper) echo(" (Short)");?>
Dhruv Mahajan, Sundararajan Sellamanickam, Subhajit Sanyal, and Amit Madaan
DM517

In this paper we propose a novel classification based framework for finding a small number of images that summarize a given concept. Our method exploits metadata in- formation available with the images to get category information using Latent Dirichlet Allocation. Using this category infor- mation for each image, we solve the underlying classification problem by building a sparse classifier model for each concept. We demonstrate that the images that specify the sparse model form a good summary. In particular, our summary satisfies important properties such as likelihood, diversity and balance in both visual and semantic sense. Furthermore, the framework allows users to specify desired distributions over categories to create personalized summaries. Experimental results on seven broad query types show that the proposed method performs better than state-of-the-art methods.