Events

Upcoming

Check back for future events!

Past

Fast Bayesian Functional Principal Components Analysis

Wednesday, December 11, 2024

Speaker: Joe Sartini (Johns Hopkins University)

Functional Principal Components Analysis (FPCA) is one of the most successful and widely used analytic tools for functional data exploration and dimension reduction. Standard implementations of FPCA estimate the principal components from the data but ignore their sampling variability in subsequent inferences. To address this problem, we propose the Fast Bayesian Functional Principal Components Analysis (Fast BayesFPCA), that treats principal components as parameters on the Stiefel manifold. To ensure efficiency, stability, and scalability we introduce three innovations: (1) project all eigenfunctions onto an orthonormal spline basis, reducing modeling considerations to a smaller-dimensional Stiefel manifold; (2) induce a uniform prior on the Stiefel manifold of the principal component spline coefficients via the polar representation of a matrix with entries following independent standard Normal priors; and (3) constrain sampling leveraging the FPCA structure to improve stability. We demonstrate the improved credible interval coverage and computational efficiency of Fast BayesFPCA in comparison to existing software solutions. We then apply Fast BayesFPCA to actigraphy data from NHANES 2011-2014, a modelling task which could not be accomplished with existing MCMC-based Bayesian approaches.

Air Pollution Monitoring

Thursday, December 5, 2024

Speaker: Dr. Chris Heaney, Matthew Aubourg, Bonita Salmerón (Johns Hopkins Univeristy)

Data-related challenges in air pollution monitoring and health impacts, focused on South Baltimore and in partnership with South Baltimore Community Land Trust (represented by Greg Galen). Joint seminar with the JHU Causal Inference Working Group.

Backwards sequential Monte Carlo for efficient Bayesian optimal experimental design

Wednesday, November 13, 2024

Speaker: Andrew Chin (Johns Hopkins University)

The expected information gain (EIG) is a crucial quantity in Bayesian optimal experimental design (OED), quantifying how useful an experiment is by the amount we expect the posterior to differ from the prior. However, evaluating the EIG can be computationally expensive since it requires the posterior normalizing constant. A rich literature exists for estimation of this normalizing constant, with sequential Monte Carlo (SMC) approaches being one of the gold standards. In this work, we leverage two idiosyncrasies of OED to improve efficiency of EIG estimation via SMC. The first is that, in OED, we simulate the data and thus know the true underlying parameters. The second is that we ultimately care about the EIG, not the individual normalizing constants. This lets us create an EIG-specific SMC method that starts with a sample from the posterior and tempers backwards towards the prior. The key lies in the observation that, in certain cases, the Monte Carlo variance of SMC for the normalizing constant of a single dataset is significantly lower than the variance of the normalizing constants themselves across datasets. This suggests the potential to slightly increase variance while drastically decreasing computation time by reducing the SMC population, and taking this idea to the extreme gives rise to our method. We demonstrate our method on a simulated coupled spring-mass system where we observe order of magnitude performance improvements.

Neural Networks for Geospatial Data

Wednesday, October 16, 2024

Speaker: Wentao Zhan (Johns Hopkins University)

Geospatial data analysis has traditionally been model-based, with a mean model, customarily specified as a linear regression on the covariates, and a Gaussian process covariance model, encoding the spatial dependence. While non-linear machine learning algorithms like neural networks are increasingly used for spatial analysis, current approaches depart from the model-based setup and cannot explicitly incorporate spatial covariance.

In this talk, we will first go through a brief introduction to geospatial modeling, and several machine-learning-style extensions, followed by a focused discussion on NN-GLS — a novel neural network architecture recently proposed by us.

NN-GLS falls within the traditional Gaussian process (GP) geostatistical model. It accommodates non-linear mean functions while retaining all other advantages of GP, like explicit modeling of the spatial covariance and predicting at new locations via kriging. NN-GLS admits a representation as a special type of graph neural network (GNN). This connection facilitates the use of standard neural network computational techniques for irregular geospatial data, enabling novel and scalable mini-batching, backpropagation, and kriging schemes.

Besides, we provide a methodology to obtain uncertainty bounds for estimation and predictions from NN-GLS. Theoretically, we show that NN-GLS will be consistent for irregularly observed spatially correlated data processes. We also provide finite sample concentration rates which quantifies the need to accurately model the spatial covariance in neural networks for dependent data. Simulations and an application to air pollution modeling will be presented to demonstrate the methodology.