Scalable and Accurate Algorithms for Search in Genomic Big Data and Analysis of Genetic Variants
- Author:
- Sun, Chen
- Published:
- [University Park, Pennsylvania] : Pennsylvania State University, 2018.
- Physical Description:
- 1 electronic document
- Additional Creators:
- Medvedev, Paul
Access Online
- etda.libraries.psu.edu , Connect to this object online.
- Restrictions on Access:
- Open Access.
- Summary:
- Next generation sequencing technology has been extensively used in biological and medical research. With abundant genomic data generated, new challenges also arise: the expanding capacity of genomic data pushes the boundaries of current searching and analysis methods. Two of the main challenges for genomic big data are how to efficiently search to identify datasets of interest, and how to deeply analyze the large volume of data to discover new knowledge in genomics.In this dissertation, we present four research achievements that aim to tackle the above two challenges in genomic data. The AllSome Sequence Bloom Tree data structure and associated search algorithms are first introduced to help find datasets of interest, filter out futile ones, and narrow down the data size. To meet the demand of further deep analysis, several scalable algorithms for sequence analysis are introduced. Based on them, a genetic variant analysis toolkit is developed, which contains three methods (ISVDA, VarGeno and VarMatch), which address different directions of small genetic variant study. ISVDA is an iterative small variant discovery algorithm that can detect small genetic variants that are previously hard to detect. VarGeno is a fast and accurate single nucleotide polymorphism genotyping tool. VarMatch is introduced to find high confidence variants among multiple variant detection results. It can also be used to evaluate variant calling results.
- Other Subject(s):
- Genre(s):
- Dissertation Note:
- Ph.D. Pennsylvania State University 2018.
- Reproduction Note:
- Microfilm (positive). 1 reel ; 35 mm. (University Microfilms 13871926)
- Technical Details:
- The full text of the dissertation is available as an Adobe Acrobat .pdf file ; Adobe Acrobat Reader required to view the file.
View MARC record | catkey: 25793859