We consider the problem of learning density mixture models for Classification. Traditional learning of mixtures for density estimation focuses on models that correctly represent the density at all points in the sample space. Discriminative learning, on the other hand, aims at representing the density at the decision boundary. We introduce novel discriminative learning methods for mixtures of generative models.
Generative probabilistic models such as Bayesian networks (BNs) are an attractive choice in a number of data-driven modeling tasks. While such models are implicitly employed for joint density estimation, they have recently been shown to also yield performance comparable to sophisticated discriminative classifiers such as SVMs and C4.5 [1,2]. In the classification settings, maximizing a conditional likelihood (CML) is known to achieve better classification performance than the traditional Maximum Likelihood (ML) fitting . Unfortunately, the CML optimization problem is, in general, complex with non-unique solutions. Typical CML solutions resort to gradient based numerical optimization methods. Despite improved classification performance, the gradient search makes standard approaches computationally demanding.
Moreover, we focus on the class of density mixture models. A mixture model has a potential to yield superior classification performance to a single BN model, as well as serve as a rich density estimator. Again typical CML learning relies on the same gradient search (e.g, ) suffering from computational overhead. We formulate an efficient and theoretically sound approach to discriminate mixture learning that avoids the parametric gradient optimization.
The proposed method exploits the properties of mixtures to alleviate the complex learning task. In a greedy fashion, the mixture components are added recursively while maximizing the conditional likelihood. More specifically, at each iteration it finds a new mixture component f that, when added to the current mixture F, maximally decreases the conditional loss. Using functional gradient boosting, it results in data weights with which the new component f will be learned. Interestingly, our weighting scheme makes the data points at the decision boundary focused highly, which is a desirable property for successful classification. On the other hand, the generative (non-discriminative) recursive mixture model of  assigns higher weights on the data at the class centers, which is promising for data fitting, but less for classification.
A crucial benefit of this method is efficiency: finding a new f requires ML learning on weighed data, which is relatively easy to do (e.g., computing sufficient statistics if f is in the exponential family). Thus this approach is particularly suited to domains with complex component models (e.g., hidden Markov models (HMMs) in time-series classification) that are usually too complex for effective gradient search. In addition, the recursive approach can benefit from optimal order estimation and insensitiveness to the initial parameters.
We demonstrate the benefits of the proposed methods in an extensive set of evaluations on time-series sequence classification problems. Comparing with state-of-the-art non-generative discriminative approaches such as kernel-based classifiers of , we show that the newly proposed approaches can yield performance comparable or better than that of many standard methods. 
-  M. Kim and V. Pavlovic. "A Recursive Method for Discriminative Mixture Learning". Int'l Conf. Machine Learning (ICML). 2007.
-  M. Kim and V. Pavlovic. "Discriminative Learning of Mixture of Bayesian Network Classifiers for Sequence Classification". IEEE Conf. Computer Vision and Pattern Recognition. 2006. pp. 268-275.
- M. Kim, V. Pavlovic. Efficient Discriminative Learning of Mixture of Bayesian Network Classifiers for Sequence Classification - The Learning Workshop at Snowbird, Utah, April 4-7 2006.
- M. Kim, V. Pavlovic. Discriminative Mixture Models - New York Academy of Sciences (NYAS) Machine Learning Symposium, NY, Oct. 27, 2006.
- Bayesian Network Classifiers, N. Friedman, D. Geiger, and M. Goldszmidt, Machine Learning, 1997.
- Efficient Discriminative Learning of Bayesian Network Classifier via Boosted Augmented Naive Bayes, Y. Jing, V. Pavlovic, and J. M. Rehg, ICML, 2005.
- Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers, R. Greiner and W. Zhou, AAAI, 2002.
- Discriminative mixture weight estimation for large Gaussian mixture models, F. Beaufays, M. Weintraub, and Y. Konig, Proc. ICASSP, 337-340, 1999.
- Model-Based Motion Clustering Using Boosted Mixture Modeling, V. Pavlovic, CVPR 2004.
- Exploiting generative models in discriminative classifiers, T. Jaakkola and D. Haussler, NIPS 1998.