Estimation and Model Selection for Block Clustering with Mixtures : A Composite Likelihood Approach
- Author:
- Kuruppumullage Don, Prabhani
- Published:
- [University Park, Pennsylvania] : Pennsylvania State University, 2014.
- Physical Description:
- 1 electronic document
- Additional Creators:
- Lindsay, Bruce G. and Chiaromonte, Francesca
Access Online
- etda.libraries.psu.edu , Connect to this object online.
- Graduate Program:
- Restrictions on Access:
- Open Access.
- Summary:
- Clustering is the task of finding useful and meaningful groups in data, in a way that members within a group are more similar to each other than to members of other groups. There are many well established statistical methods that are used for clustering; among these mixture-based approaches have several advantages and have become increasingly popular. In this thesis, we introduce a mixture-based approach for block clustering (i.e. simultaneous clustering of rows and columns of a data matrix). We discuss the computational challenges that prevent the use of traditional likelihood approaches in this setting, and provide an alternative. We build a composite likelihood that overcomes the computational burden, and devise a nested Expectation-Maximization (EM) algorithm to estimate the block mixture model.Moreover, we develop two useful tools for model selection in block clustering. These can be used, in particular, to determine the number of row and column groups. We discuss how the gradient function can be used to assess the lack of fit of a block mixture model and provide an EM gradient search algorithm to progress towards better fitting models. Further, we develop a composite likelihood ratio test for comparing two block mixture models and incorporate it into a forward model selection method. We then use our methods for two human genomics applications. In one we simultaneously cluster loci of enhanced microsatellite mutability and a large array of genomic features characterizing their environment. In the second, we do the same type of analysis for so-called common fragile sites in the genome.Finally, we list some of the limitations of our methods, identifying challenges and discussing avenues for future developments.
- Other Subject(s):
- Dissertation Note:
- Ph.D. Pennsylvania State University 2014.
- Reproduction Note:
- Microfilm (positive). 1 reel ; 35 mm. (University Microfilms 36-47469)
- Technical Details:
- The full text of the dissertation is available as an Adobe Acrobat .pdf file ; Adobe Acrobat Reader required to view the file.
View MARC record | catkey: 13591869