Biostatistics Seminar Series: Molin Wang, PhD

Friday, December 6, 2019 - 10:00am

Location: LSL N210

The Department of Biostatistics and Epidemiology welcomes Dr. Molin Wang, Associate Professor in the Departments of Epidemiology and Biostatistics at the Harvard T.H. Chan School of Public Health, and the Department of Medicine at Harvard Medical School and Brigham and Women’s Hospital, for a talk titled "Cox model analyses with disease subtypes defined by partially observed multiple markers" on Friday, December 6 beginning at 10:00 AM in LSL N210.

Abstract: Complex diseases are often analyzed using disease subtypes classified by multiple markers to study disease etiologic heterogeneity. In such molecular pathological epidemiology research, we consider a weighted Cox proportional hazard model to evaluate the effect of exposure on various disease subtypes under the competing-risk settings in the presence of partially or completely missing markers. We propose an augmented inverse probability weighted estimating equation method that enjoys a double robustness property. For illustration, we have applied the methods to examine the association between pack-years of smoking before age 30 and incidence of colorectal cancer subtypes defined by a combination of four tumor molecular biomarkers (statuses of microsatellite instability, CpG island methylator phenotype, BRAF mutation, and KRAS mutation in the Nurses’ Health Study cohort.

Areas: Disease etiologic heterogeneity, Missing data in competing risks settings

In molecular pathological epidemiology research, of interest is often the disease etiologic heterogeneity; i.e., whether the exposure-disease association varies across the disease subtypes. When disease subtypes are defined by multiple markers, typically different sets of cases have missing values for different markers; that is, some cases with unavailable subtype data may still have partial information about the subtype due to missing data in some, but not all, of the markers. This talk will focus on statistical methods for evaluating the disease etiologic heterogeneity under this missing data problem. R functions implementing the methods have been published so that other investigators can use the methods.