paiHsi-mugshot

Pai-Hsi Huang

I am a member of the SEQAM lab. I started with Professor Vladimir Pavlovic back in the year of 2002. Under Professor Pavlovic's advisement, I obtained my Master's degree in Computer Science in the year of 2004 and another Master's degree in Statistics in the year of 2005.  My personal webpage is located here.
 
Research Interests:
 
My research interests are in Machine Learning/Data Mining and I am currently applying my knowledge of these subjects to a discipline known as BioInformatics. I focus on statistical learning models that are known to be interpretable; these models tend to offer more insights to the structure of the data and perhaps the underlying generation process.
 
    * I am currently working on protein homology detection, with a goal to develop a principled way of using a generative model (in this case, hidden Markov models) as a feature extractor and feed the features to an interpretable, discriminative model (for example, Logistic Regression model) to perform the classification task. More specifically, I am extremely interested in developing interpretable models such that we may gain some insights toward the underlying processes that generated the biosequences.
    * I am also interested in semi-supervised learning algorithms, in which one taps into the abundant unlabeled data in the hope to populate the training sets and thus lower the variance of the estimates.
    * In my previous work, I used duration-explicit hidden Markov models to show that, a set of critical positions and the distances (number of residues) between each neighboring pair are sufficient to model a group of functionally related proteins. The required number of such critical positions is approximately a quarter of the average length of the functionally related proteins.
    * I have also worked on a protein secondary structure prediction problem as a course project. The problem was very challanging because, first, the only information we had was the primary sequence of the protein and second, we need to extract fixed-length features from variable-length protein sequences. We attempted to tackle this problem using SVM and clustering methods.
 
Publications:
 
    * Protein Homology Detection with Biologically Inspired Features and Interpretable Statistical Models. Pai-Hsi Huang and Vladimir Pavlovic. International Journal of Data Mining in Bioinformatics. Accepted for submission.
    * Sparse Logistic Classifiers for Interpretable Protein Homology Detection.Pai-Hsi Huang and Vladimir Pavlovic. IEEE International Conference on Data Mining (ICDM) 2006, under International Workshop on Data Mining in Bioinformatics, Hong Kong, China.
    * Protein Homology Detection using Sparse Profile Hidden Markov Models (Poster). Pai-Hsi Huang and Vladimir Pavlovic. Intelligent Systems for Molecular Biology (ISMB) 2006, Detroit, MI.
    * Inexpensive d-Dimensional Matchings. Ljubomir Perkovic, Eric Schmutz and Bae-Shi Huang. Randomized Structures and Algorithms, Vol 20, No. 1, 2002, 50-58.
 
Conference Presentations:
 
# Sparse Logistic Classifiers for Interpretable Protein Homology Detection.Pai-Hsi Huang and Vladimir Pavlovic. IEEE International Conference on Data Mining (ICDM) 2006, under International Workshop on Data Mining in Bioinformatics, Hong Kong, China.
# Protein Homology Detection using Sparse Profile Hidden Markov Models (Poster). Pai-Hsi Huang and Vladimir Pavlovic. Intelligent Systems for Molecular Biology (ISMB) 2006, Detroit, MI.