Python package guide
policyengine.py for US tax and benefit analysis
Reference examples covering household impact, policy reforms, microsimulation over calibrated microdata, and regional breakdowns.
Core concepts
The Python guide now follows the unified policyengine package. Four concepts show up throughout the workflow:
Household calculator
Call pe.us.calculate_household(...) with plain Python dicts; the typed result exposes every variable in the model.
Datasets
Use pe.us.ensure_datasets() to load representative microdata, then feed it into Simulation.
Reforms as dicts
A reform is a {"param.path": value} dict. Same shape for reform= (household) and policy= (microsim).
Outputs
Aggregate, ChangeAggregate, and pe.us.economic_impact_analysis() turn simulations into analysis.
US entity hierarchy
Outputs come back at the entity level where a variable is defined. Everything else is a mapping operation.
| Entity | Scope | Example variables |
|---|---|---|
person | Individual | employment_income, age, is_disabled |
marital_unit | Married couple or single adult | joint return grouping |
tax_unit | Tax filing unit | income_tax, ctc_value, eitc |
spm_unit | SPM poverty unit | snap, housing_assistance |
family | Related-by-blood grouping | family-level programmes |
household | All people at one address | household_net_income, state_fips, rent |
Parameter types
Every reform target is a parameter. Knowing which shape a parameter has tells you how to reference it in a Policy.
gov.irs.credits.ctc.amount.base[0].amountA scalar amount, rate, or threshold. Set a new value for a date range.
gov.irs.credits.refundableA list of values, often names of variables that qualify for a rule.
gov.hmrc.income_tax.ratesGraduated thresholds and rates. Access via .thresholds, .rates, .amounts.
gov.irs.credits.ctc.phase_out.threshold.JOINTParameter broken down by an enum (filing status, age band, region).
Simulation
Household-level analysis
Per-household calculations with pe.us.calculate_household: reforms, variation grids, programmatic builders, tracing, and charts.
Start with pe.us.calculate_household()
For one explicit family or household, call calculate_household with plain Python dicts. No wrapper class, no situation dictionary - keyword args for people and each entity, plus a year. The result is a typed object with one attribute per entity section.
pip install "policyengine[us]"import policyengine as pe
result = pe.us.calculate_household(
# One dict per person - keys are any person-level variable on the US model.
# Adults default to the same tax unit and household.
people=[
{"age": 35, "employment_income": 40000}, # primary earner
{"age": 33}, # spouse
{"age": 8}, # dependent
{"age": 5}, # dependent
],
# Tax-unit inputs (filing status, etc.)
tax_unit={"filing_status": "JOINT"},
# Household inputs. state_code is essentially always needed.
household={"state_code": "TX"},
# Year determines which parameter values apply.
year=2026,
)
# Attribute access on the typed result. Group entities (tax_unit, spm_unit,
# household) are single objects; person sections are lists (result.person[0]).
print(f"Net income: ${result.household.household_net_income:,.0f}")
print(f"EITC: ${result.tax_unit.eitc:,.0f}")
print(f"SNAP: ${result.spm_unit.snap:,.0f}")Net income: $51,261 EITC: $5,454 SNAP: $3,205
Microsimulation
Population-level analysis
Aggregate estimates over calibrated microdata: weighted totals, baseline-vs-reform impacts, regional slices, and distributional charts.
Representative datasets replace the old Microsimulation entry point
For population analysis, move to dataset-backed Simulation objects. pe.us.ensure_datasets() is the entry point: it loads cached HDF5 datasets when present and otherwise downloads and uprates them. Simulation.ensure() is the new canonical run method - it loads a cached result if available, otherwise runs and caches. pe.us.model supplies the pinned TaxBenefitModelVersion.
import policyengine as pe
from policyengine.core import Simulation
year = 2026
# ensure_datasets downloads from HuggingFace on first run, caches locally,
# and returns a {"<stem>_<year>": Dataset} dict.
datasets = pe.us.ensure_datasets(
datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"],
years=[year],
data_folder="./data",
)
dataset = datasets[f"enhanced_cps_2024_{year}"]
# pe.us.model is the country model version pinned by this policyengine.py release.
simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
# ensure() loads a cached run if available, otherwise runs and caches.
simulation.ensure()
output = simulation.output_dataset.data
print(output.household[["household_net_income", "household_tax"]].head())weight household_net_income household_tax 0 0.000000 162787.765625 49123.687500 1 94409.679688 13163.706055 -1462.839600 2 0.000000 14824.270508 0.000000 3 0.000000 144380.109375 33226.578125 4 0.000000 157499.937500 31022.513672
# Old mental model:
# from policyengine_us import Microsimulation
# sim = Microsimulation(dataset=...)
#
# New policyengine.py mental model:
# import policyengine as pe
# datasets = pe.us.ensure_datasets(...)
# simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
import policyengine as pe
from policyengine.core import Simulation
dataset = datasets[f"enhanced_cps_2024_{year}"]
simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
simulation.ensure()
print(simulation.release_bundle["bundle_id"])
print(type(simulation.output_dataset.data.household).__name__)us-4.0.0 MicroDataFrame
Reproducibility
Pin, verify, export
A policyengine.py release pins a country model to an exact certified data artifact and refuses to mix a model with data it was not certified against. Pin the bundle in requirements, verify the two manifest layers, and emit a TRACE TRO for citations.
Pin the bundle and save it next to every output
The user-facing reproducibility boundary in policyengine.py is the certified runtime bundle. It pins a policyengine.py version to an exact country-model version AND an exact certified data artifact. v4 adds a hard certification check at import time: the installed country package must match the bundled manifest. The practical workflow: pin policyengine in requirements, and write simulation.release_bundle to disk alongside the results you publish.
# Step 1: pin the exact policyengine.py release in your environment.
# pip install "policyengine[us]==4.3.0"
# Step 2: capture the certified runtime bundle next to every output you save.
import json
from pathlib import Path
import policyengine as pe
from policyengine.core import Simulation
datasets = pe.us.ensure_datasets(years=[2026], data_folder="./data")
dataset = next(iter(datasets.values()))
simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
simulation.ensure()
bundle = simulation.release_bundle
Path("outputs").mkdir(exist_ok=True)
Path("outputs/release_bundle.json").write_text(json.dumps(bundle, indent=2, default=str))
print("bundle_id:", bundle["bundle_id"])
print("country:", bundle["country_id"])
print("model:", bundle["model_package"], bundle["model_version"])
print("data:", bundle["data_package"], bundle["data_version"])
print("dataset:", bundle["dataset_filepath"])bundle_id: us-4.0.0 country: us model: policyengine-us 1.653.3 data: policyengine-us-data 1.73.0 dataset: ./data/enhanced_cps_2024_year_2026.h5
References
Where to go after the walkthrough
The model explorer, the policyengine.py repo, and the release-bundle docs are the three sources of truth. Use the quick-reference block below to check the bundle attached to any simulation you have already run.
# After running a simulation, inspect the certified runtime bundle
print(simulation.release_bundle){'bundle_id': 'us-4.0.0', 'country_id': 'us', 'policyengine_version': '4.0.0', 'model_package': 'policyengine-us', 'model_version': '1.653.3', 'data_package': 'policyengine-us-data', 'data_version': '1.73.0', 'default_dataset': 'enhanced_cps_2024', 'certified_data_build_id': 'policyengine-us-data-1.73.0', 'compatibility_basis': 'matching_data_build_fingerprint', ...}Variables and parameters
Use the model explorer after the walkthrough when you need exact variable names or parameter paths.
Release bundles
The release-bundles doc describes the two-manifest layer, the fingerprint compatibility rule, and artifact states.
Working scripts
The checked-in examples in policyengine.py are the best place to look when you need a longer end-to-end pattern or paper-style reproduction.