Events

Upcoming

Check back for future events!

Past

Backwards sequential Monte Carlo for efficient Bayesian optimal experimental design

Wednesday, November 13, 2024

Speaker: Andrew Chin (Johns Hopkins University)

The expected information gain (EIG) is a crucial quantity in Bayesian optimal experimental design (OED), quantifying how useful an experiment is by the amount we expect the posterior to differ from the prior. However, evaluating the EIG can be computationally expensive since it requires the posterior normalizing constant. A rich literature exists for estimation of this normalizing constant, with sequential Monte Carlo (SMC) approaches being one of the gold standards. In this work, we leverage two idiosyncrasies of OED to improve efficiency of EIG estimation via SMC. The first is that, in OED, we simulate the data and thus know the true underlying parameters. The second is that we ultimately care about the EIG, not the individual normalizing constants. This lets us create an EIG-specific SMC method that starts with a sample from the posterior and tempers backwards towards the prior. The key lies in the observation that, in certain cases, the Monte Carlo variance of SMC for the normalizing constant of a single dataset is significantly lower than the variance of the normalizing constants themselves across datasets. This suggests the potential to slightly increase variance while drastically decreasing computation time by reducing the SMC population, and taking this idea to the extreme gives rise to our method. We demonstrate our method on a simulated coupled spring-mass system where we observe order of magnitude performance improvements.

Neural Networks for Geospatial Data

Wednesday, October 16, 2024

Speaker: Wentao Zhan (Johns Hopkins University)

Geospatial data analysis has traditionally been model-based, with a mean model, customarily specified as a linear regression on the covariates, and a Gaussian process covariance model, encoding the spatial dependence. While non-linear machine learning algorithms like neural networks are increasingly used for spatial analysis, current approaches depart from the model-based setup and cannot explicitly incorporate spatial covariance.

In this talk, we will first go through a brief introduction to geospatial modeling, and several machine-learning-style extensions, followed by a focused discussion on NN-GLS — a novel neural network architecture recently proposed by us.

NN-GLS falls within the traditional Gaussian process (GP) geostatistical model. It accommodates non-linear mean functions while retaining all other advantages of GP, like explicit modeling of the spatial covariance and predicting at new locations via kriging. NN-GLS admits a representation as a special type of graph neural network (GNN). This connection facilitates the use of standard neural network computational techniques for irregular geospatial data, enabling novel and scalable mini-batching, backpropagation, and kriging schemes.

Besides, we provide a methodology to obtain uncertainty bounds for estimation and predictions from NN-GLS. Theoretically, we show that NN-GLS will be consistent for irregularly observed spatially correlated data processes. We also provide finite sample concentration rates which quantifies the need to accurately model the spatial covariance in neural networks for dependent data. Simulations and an application to air pollution modeling will be presented to demonstrate the methodology.

Challenges and opportunities in the analysis of joint models of longitudinal and survival data

Wednesday, May 15, 2024

Speaker: Dr. Eleni-Rosalina Andrinopoulou (Erasmus University Medical Center)

The increasing availability of clinical measures, such as electronic medical records, has enabled the collection of diverse information including multiple longitudinal measurements and survival outcomes. Consequently, there is a need to utilize methods that can examine the associations between exposures and longitudinal measurement outcomes simultaneously. This statistical approach is known as joint modeling of longitudinal and survival data, which typically involves integrating linear mixed effects models for longitudinal measurements with Cox models for censored survival outcomes.

This method is motivated by various clinical scenarios. For instance, patients with Cystic Fibrosis, a genetic lung disorder, face risks like exacerbation, lung transplantation, or mortality, and are regularly monitored with multiple biomarkers. Similarly, patients recovering from stroke undergo longitudinal assessments to track their progress over time. Although these outcomes are biologically interconnected, they are often analyzed separately in practice.

Analyzing such complex data presents several challenges. One key challenge is accurately characterizing patients’ longitudinal profiles that influence survival outcomes. It’s commonly assumed that the underlying longitudinal values are associated with survival outcomes, but sometimes specific aspects of these profiles, like the rate of biomarker progression, affect the hazard differently. Choosing the right functional form for this relationship is crucial and requires careful investigation due to its potential impact on results.

Another challenge arises from the high dimensionality of some datasets, such as registry data. Analyzing such comprehensive datasets using complex methodologies can be computationally expensive. Therefore, there’s a demand for algorithms capable of distributed analyses that can concurrently and impartially explore multiple correlated outcomes.

In this presentation, we will explore strategies to tackle these challenges effectively.

Bayesian extension of Multilevel Functional Principal Components Analysis with application to Continuous Glucose Monitoring Data

Wednesday, May 1, 2024

Speaker: Joe Sartini (Johns Hopkins University)

Multilevel functional principal components analysis (MFPCA) facilitates estimation of hierarchical covariance structures for functional data produced by wearable sensors, including continuous glucose monitors (CGM), all while accounting for covariate effects. There are several existing methods to efficiently fit these types of models, including the eminent fast covariance estimation and the recently proposed procedure of fitting appropriate localized mixed effects models and smoothing (Xiao et al. 2016, Leroux et al. 2023). However, these methods do not inherently account for uncertainty in the eigenfunctions during the estimation procedure. Most rely on bootstrap or asymptotic analytic results to perform inference after estimation. Towards this end, we fit MFPCA within a fully-Bayesian framework using MCMC, treating the orthogonal eigenfunctions as random. A model constructed in this way automatically accounts for variability in eigenfunction estimation and its interplay with both features of the data and the assumed hierarchical structure. The flexibility of this method also makes it well-suited to exploring the imposition of additional constraints on the eigenfunctions, such as mutual orthogonality across levels. We assess the convergence of our model using Grassmannian distances between the spaces spanned by sampled eigenfunctions at each level. After performing validation using a variety of simulated functional data, we compare the results of our model to the prominent existing approaches using 4-hour windows of CGM data for persons with diabetes centered around known mealtimes.