Data
Microdata pipeline
PolicyEngine constructs its representative household dataset through a 14-step pipeline. Public survey data is merged, stratified, and cloned to 10 geographic variants per household. Each clone is simulated through PolicyEngine US with stochastic take-up, then calibrated via L0-regularized optimization against administrative targets at the national, state, and congressional district levels simultaneously, producing 488 geographically representative datasets.
National
Geography-specific
Census CPS ASEC
Base microdata
Input
~150,000 persons
→
Output
~150,000 persons
Census Bureau CPS ASEC (March supplement)
The Current Population Survey Annual Social and Economic Supplement (CPS ASEC) provides the baseline microdata. The March 2025 supplement covers tax year 2024 with ~200,000 individuals across ~60,000 households.
- 93+ person-level columns: demographics, income, employment, disability
- 10 tax unit columns: AGI, federal/state taxes, credits (EITC, CTC, ACTC)
- 35 SPM unit columns: income, benefits, taxes, poverty thresholds
- Geographic identifiers: state FIPS, county FIPS, NYC flag