My lab develops statistical algorithms to interpret and integrate data generated by next-generating sequencing and other high-throughput experimental platforms to improve the diagnosis and treatment of genetic diseases. Towards this aim, my lab is pursuing two main research directions: (1) detecting rare single nucleotide variants with next generation sequencing data from heterogeneous cell populations, and (2) estimating genomic subtypes with complex data types from clinical cancer samples. These aims each have the dual purposes of advancing statistical methodologies for analyzing massive, complex data sets, and advancing our understanding of fundamental biological processes that lead to the development and progression of disease. Achievement of the aims of this research will broadly mean better targeted therapeutic strategies for many types of cancer, improved diagnostics for viral and other infectious diseases, and improved methods for statistical inference in large genomic data sets.
Detecting rare single nucleotide mutations from next-generation sequencing data is important for identifying resistance in virus populations, cell-free DNA diagnostics, and longitudinal chemotherapy treatment monitoring. However, there is a lack of accurate, scalable statistical methods to identify variant alleles from massive next-generation sequencing data sets. Previously, we developed a hierarchical Bayesian statistical model and statistical inference algorithm to identify rare variants in next-generation sequencing data. Now, we are extending that model and using it to study all mutations that modulate antibiotic resistance in target proteins.
An important and unresolved question in the study of the development of solid cancerous tumors is how cells with distinct genomic subtypes within the same tumor cooperate or interfere and lead to growth or resistance to therapy. A better understanding of this phenomena will allow physicians to better assess the content of the entire heterogeneous tumor and design combination therapies that target multiple subtypes simultaneously. We previously developed, a mixed membership statistical model that simultaneously learns a sparse biomarker signature for each subtype as well as a distribution over subtypes for each sample. The next steps toward understanding heterogeneous tumor development is to improve our model to integrate diverse genomic assay data and characterize the co-occurrence of genomic subtypes in primary cancer samples using massive, distributed, genomic data sets.
Learn more at www.math.umass.edu/directory/faculty/patrick-flaherty
- BS Electrical Engineering, Rochester Institute of Technology
- PhD Electrical Engineering & Computer Science, UC Berkeley
- Postdoc Biochemistry, Stanford University