Events

Upcoming

Efficient Bayesian Semiparametric Modeling and Variable Selection for Spatio-Temporal Transmission of Multiple Pathogens

Wednesday, April 2, 2025

Speaker: Dr. Nikolay Bliznyuk (University of Florida)

Mathematical modeling of infectious diseases plays an important role in the development and evaluation of intervention plans. These plans, such as the development of vaccines, are usually pathogen-specific, but laboratory confirmation of all pathogen-specific infections is rarely available. If an epidemic is a consequence of co-circulation of several pathogens, it is desirable to jointly model these pathogens in order to study the transmissibility of the disease to help inform public health policy.

A major challenge in utilizing laboratory test data is that it is not available for every infected person. Appropriate imputation of the missing pathogen information often requires a prohibitive amount of computation. To address it, we extend our earlier hierarchical Bayesian multi-pathogen framework that uses a latent process to link the disease counts and the lab test data. Under the proposed model, imputation of the unknown pathogen-specific cases can be effectively avoided by exploiting the relationship between multinomial and Poisson distributions. A variable selection prior is used to identify the risk factors and their proper functional form respecting the linear-nonlinear hierarchy. The efficiency gains of the proposed model and the performance of the selection priors are examined through simulation studies and on a real data case study from hand, foot and mouth disease (HFMD) in China.

Past

Modeling Structure and Cross-Country Variability in Misclassification Matrices of Verbal Autopsy Cause-of-Death Classifiers

Wednesday, January 22, 2025

Speaker: Dr. Sandipan Pramanik (Johns Hopkins University)

Verbal autopsy (VA) algorithms are routinely employed in low- and middle-income countries to determine individual causes of death (COD). The CODs are then aggregated to estimate population-level cause-specific mortality fractions (CSMFs) essential for public health policymaking. However, VA algorithms often misclassify COD, introducing bias in CSMF estimates. A recent method, VA-calibration, addresses this bias by utilizing a VA misclassification matrix derived from limited labeled COD data collected in the CHAMPS project. Due to the limited labeled samples, the data are pooled across countries to improve estimation precision, thereby implicitly assuming homogeneity in misclassification rates across countries. In this presentation, I will highlight substantial cross-country heterogeneity in VA misclassification, challenging this homogeneity assumption and revealing its impact on VA-calibration’s efficacy. To address this, I will propose a comprehensive country-specific VA misclassification matrix modeling framework in data-scarce settings. The framework introduces a novel base model that parsimoniously characterizes the misclassification matrix through two latent mechanisms: intrinsic accuracy and systematic preference. We theoretically prove that these mechanisms are identifiable from the data and manifest as a form of invariance in misclassification odds, a pattern evident in the CHAMPS data. Building on this, the framework then incorporates cross-country heterogeneity through interpretable effect sizes and uses shrinkage priors to balance the bias-variance tradeoff in misclassification matrix estimation. This effort broadens VA-calibration’s applicability and strengthens ongoing efforts of using VA for mortality surveillance. I will illustrate this through simulations and applications to mortality surveillance projects, such as COMSA in Mozambique and CA CODE.

Fast Bayesian Functional Principal Components Analysis

Wednesday, December 11, 2024

Speaker: Joe Sartini (Johns Hopkins University)

Functional Principal Components Analysis (FPCA) is one of the most successful and widely used analytic tools for functional data exploration and dimension reduction. Standard implementations of FPCA estimate the principal components from the data but ignore their sampling variability in subsequent inferences. To address this problem, we propose the Fast Bayesian Functional Principal Components Analysis (Fast BayesFPCA), that treats principal components as parameters on the Stiefel manifold. To ensure efficiency, stability, and scalability we introduce three innovations: (1) project all eigenfunctions onto an orthonormal spline basis, reducing modeling considerations to a smaller-dimensional Stiefel manifold; (2) induce a uniform prior on the Stiefel manifold of the principal component spline coefficients via the polar representation of a matrix with entries following independent standard Normal priors; and (3) constrain sampling leveraging the FPCA structure to improve stability. We demonstrate the improved credible interval coverage and computational efficiency of Fast BayesFPCA in comparison to existing software solutions. We then apply Fast BayesFPCA to actigraphy data from NHANES 2011-2014, a modelling task which could not be accomplished with existing MCMC-based Bayesian approaches.

Air Pollution Monitoring

Thursday, December 5, 2024

Speaker: Dr. Chris Heaney, Matthew Aubourg, Bonita Salmerón (Johns Hopkins Univeristy)

Data-related challenges in air pollution monitoring and health impacts, focused on South Baltimore and in partnership with South Baltimore Community Land Trust (represented by Greg Galen). Joint seminar with the JHU Causal Inference Working Group.

Backwards sequential Monte Carlo for efficient Bayesian optimal experimental design

Wednesday, November 13, 2024

Speaker: Andrew Chin (Johns Hopkins University)

The expected information gain (EIG) is a crucial quantity in Bayesian optimal experimental design (OED), quantifying how useful an experiment is by the amount we expect the posterior to differ from the prior. However, evaluating the EIG can be computationally expensive since it requires the posterior normalizing constant. A rich literature exists for estimation of this normalizing constant, with sequential Monte Carlo (SMC) approaches being one of the gold standards. In this work, we leverage two idiosyncrasies of OED to improve efficiency of EIG estimation via SMC. The first is that, in OED, we simulate the data and thus know the true underlying parameters. The second is that we ultimately care about the EIG, not the individual normalizing constants. This lets us create an EIG-specific SMC method that starts with a sample from the posterior and tempers backwards towards the prior. The key lies in the observation that, in certain cases, the Monte Carlo variance of SMC for the normalizing constant of a single dataset is significantly lower than the variance of the normalizing constants themselves across datasets. This suggests the potential to slightly increase variance while drastically decreasing computation time by reducing the SMC population, and taking this idea to the extreme gives rise to our method. We demonstrate our method on a simulated coupled spring-mass system where we observe order of magnitude performance improvements.