Verbal autopsy (VA) algorithms are routinely employed in low- and middle-income countries to determine individual causes of death (COD). The CODs are then aggregated to estimate population-level cause-specific mortality fractions (CSMFs) essential for public health policymaking. However, VA algorithms often misclassify COD, introducing bias in CSMF estimates. A recent method, VA-calibration, addresses this bias by utilizing a VA misclassification matrix derived from limited labeled COD data collected in the CHAMPS project. Due to the limited labeled samples, the data are pooled across countries to improve estimation precision, thereby implicitly assuming homogeneity in misclassification rates across countries. In this presentation, I will highlight substantial cross-country heterogeneity in VA misclassification, challenging this homogeneity assumption and revealing its impact on VA-calibration’s efficacy. To address this, I will propose a comprehensive country-specific VA misclassification matrix modeling framework in data-scarce settings. The framework introduces a novel base model that parsimoniously characterizes the misclassification matrix through two latent mechanisms: intrinsic accuracy and systematic preference. We theoretically prove that these mechanisms are identifiable from the data and manifest as a form of invariance in misclassification odds, a pattern evident in the CHAMPS data. Building on this, the framework then incorporates cross-country heterogeneity through interpretable effect sizes and uses shrinkage priors to balance the bias-variance tradeoff in misclassification matrix estimation. This effort broadens VA-calibration’s applicability and strengthens ongoing efforts of using VA for mortality surveillance. I will illustrate this through simulations and applications to mortality surveillance projects, such as COMSA in Mozambique and CA CODE.