Probabilistic Statistical Learning Methods for High-dimensional Multimodal Data
- Author
- Qiu, Lin
- Published
- [University Park, Pennsylvania] : Pennsylvania State University, 2021.
- Physical Description
- 1 electronic document
- Additional Creators
- Chinchilli, Vernon M., 1952-
Access Online
- etda.libraries.psu.edu , Connect to this object online.
- Graduate Program
- Restrictions on Access
- Open Access.
- Summary
- This thesis discusses novel statistical methods for analyzing high-dimensional multimodal data. Part one discusses three methods for multimodal data learning. First, we propose a model-based probabilistic approach for correlation and canonical correlation estimation for two sparse count datasets motivated by integer-valued data from next-generation sequencing platforms. Second, in many scientific problems such as video surveillance, modern genomics, and finance, data are often collected from diverse domains across time that exhibits time-dependent heterogeneous properties. We propose a generative model based on variational autoencoder and recurrent neural network to infer the latent dynamics for multi-view longitudinal data. This method allows us to identify the disentangled latent embeddings across multiple modalities while accounting for the time factor to achieve interpretable results. Third, we propose a deep interpretable variational canonical correlation analysis model for multi-view learning. This model is designed to disentangle both the shared and view-specific variations for multi-view data and achieve model interpretability. For all the methods, simulated and real experiments show our algorithms' advantages across domains. Part two discusses two methods for multiple hypothesis testing. First, we propose incorporating the feature hierarchy in a probabilistic black-box model to control FDR for two-group multiple hypothesis testing problems. The deep learning architecture enables efficient optimization and gracefully handles high-dimensional hypothesis features. The extensive simulation studies on synthetic and real datasets demonstrate that our algorithm yields more discoveries while controlling the FDR than state-of-the-art methods. Further, we propose a Bayesian differential analysis framework for multiple group problems. The simulation studies demonstrate that this model can recover the truth from learning the data very well. We conclude the thesis with discussions and future works.
- Other Subject(s)
- Genre(s)
- Dissertation Note
- Ph.D. Pennsylvania State University 2021.
- Technical Details
- The full text of the dissertation is available as an Adobe Acrobat .pdf file ; Adobe Acrobat Reader required to view the file.
View MARC record | catkey: 34508877