Python package guide

policyengine.py for US tax and benefit analysis

Reference examples covering household impact, policy reforms, microsimulation over calibrated microdata, and regional breakdowns.

Core concepts

The Python guide now follows the unified policyengine package. Four concepts show up throughout the workflow:

Household calculator

Call pe.us.calculate_household(...) with plain Python dicts; the typed result exposes every variable in the model.

Datasets

Use pe.us.ensure_datasets() to load representative microdata, then feed it into Simulation.

Reforms as dicts

A reform is a {"param.path": value} dict. Same shape for reform= (household) and policy= (microsim).

Outputs

Aggregate, ChangeAggregate, and pe.us.economic_impact_analysis() turn simulations into analysis.

US entity hierarchy

Outputs come back at the entity level where a variable is defined. Everything else is a mapping operation.

EntityScopeExample variables
personIndividualemployment_income, age, is_disabled
marital_unitMarried couple or single adultjoint return grouping
tax_unitTax filing unitincome_tax, ctc_value, eitc
spm_unitSPM poverty unitsnap, housing_assistance
familyRelated-by-blood groupingfamily-level programmes
householdAll people at one addresshousehold_net_income, state_fips, rent

Parameter types

Every reform target is a parameter. Knowing which shape a parameter has tells you how to reference it in a Policy.

Single valuegov.irs.credits.ctc.amount.base[0].amount

A scalar amount, rate, or threshold. Set a new value for a date range.

Listgov.irs.credits.refundable

A list of values, often names of variables that qualify for a rule.

Scalegov.hmrc.income_tax.rates

Graduated thresholds and rates. Access via .thresholds, .rates, .amounts.

Breakdowngov.irs.credits.ctc.phase_out.threshold.JOINT

Parameter broken down by an enum (filing status, age band, region).

Simulation

Household-level analysis

Per-household calculations with pe.us.calculate_household: reforms, variation grids, programmatic builders, tracing, and charts.

Start with pe.us.calculate_household()

For one explicit family or household, call calculate_household with plain Python dicts. No wrapper class, no situation dictionary - keyword args for people and each entity, plus a year. The result is a typed object with one attribute per entity section.

Install policyengine for US
pip install "policyengine[us]"
US household impact
import policyengine as pe

result = pe.us.calculate_household(
    # One dict per person - keys are any person-level variable on the US model.
    # Adults default to the same tax unit and household.
    people=[
        {"age": 35, "employment_income": 40000},  # primary earner
        {"age": 33},                              # spouse
        {"age": 8},                               # dependent
        {"age": 5},                               # dependent
    ],
    # Tax-unit inputs (filing status, etc.)
    tax_unit={"filing_status": "JOINT"},
    # Household inputs. state_code is essentially always needed.
    household={"state_code": "TX"},
    # Year determines which parameter values apply.
    year=2026,
)

# Attribute access on the typed result. Group entities (tax_unit, spm_unit,
# household) are single objects; person sections are lists (result.person[0]).
print(f"Net income: ${result.household.household_net_income:,.0f}")
print(f"EITC: ${result.tax_unit.eitc:,.0f}")
print(f"SNAP: ${result.spm_unit.snap:,.0f}")
Output
Net income: $51,261
EITC: $5,454
SNAP: $3,205
Next step: Household axes

Microsimulation

Population-level analysis

Aggregate estimates over calibrated microdata: weighted totals, baseline-vs-reform impacts, regional slices, and distributional charts.

Representative datasets replace the old Microsimulation entry point

For population analysis, move to dataset-backed Simulation objects. pe.us.ensure_datasets() is the entry point: it loads cached HDF5 datasets when present and otherwise downloads and uprates them. Simulation.ensure() is the new canonical run method - it loads a cached result if available, otherwise runs and caches. pe.us.model supplies the pinned TaxBenefitModelVersion.

US dataset-backed simulation
import policyengine as pe
from policyengine.core import Simulation

year = 2026
# ensure_datasets downloads from HuggingFace on first run, caches locally,
# and returns a {"<stem>_<year>": Dataset} dict.
datasets = pe.us.ensure_datasets(
    datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"],
    years=[year],
    data_folder="./data",
)
dataset = datasets[f"enhanced_cps_2024_{year}"]

# pe.us.model is the country model version pinned by this policyengine.py release.
simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
# ensure() loads a cached run if available, otherwise runs and caches.
simulation.ensure()

output = simulation.output_dataset.data
print(output.household[["household_net_income", "household_tax"]].head())
Output
         weight  household_net_income  household_tax
0      0.000000         162787.765625   49123.687500
1  94409.679688          13163.706055   -1462.839600
2      0.000000          14824.270508       0.000000
3      0.000000         144380.109375   33226.578125
4      0.000000         157499.937500   31022.513672
Old Microsimulation mental model -> new Simulation mental model
# Old mental model:
#   from policyengine_us import Microsimulation
#   sim = Microsimulation(dataset=...)
#
# New policyengine.py mental model:
#   import policyengine as pe
#   datasets = pe.us.ensure_datasets(...)
#   simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
import policyengine as pe
from policyengine.core import Simulation

dataset = datasets[f"enhanced_cps_2024_{year}"]
simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
simulation.ensure()

print(simulation.release_bundle["bundle_id"])
print(type(simulation.output_dataset.data.household).__name__)
Output
us-4.0.0
MicroDataFrame
Next step: Entity outputs

Reproducibility

Pin, verify, export

A policyengine.py release pins a country model to an exact certified data artifact and refuses to mix a model with data it was not certified against. Pin the bundle in requirements, verify the two manifest layers, and emit a TRACE TRO for citations.

Pin the bundle and save it next to every output

The user-facing reproducibility boundary in policyengine.py is the certified runtime bundle. It pins a policyengine.py version to an exact country-model version AND an exact certified data artifact. v4 adds a hard certification check at import time: the installed country package must match the bundled manifest. The practical workflow: pin policyengine in requirements, and write simulation.release_bundle to disk alongside the results you publish.

Pin the release and capture the bundle
# Step 1: pin the exact policyengine.py release in your environment.
# pip install "policyengine[us]==4.3.0"

# Step 2: capture the certified runtime bundle next to every output you save.
import json
from pathlib import Path

import policyengine as pe
from policyengine.core import Simulation

datasets = pe.us.ensure_datasets(years=[2026], data_folder="./data")
dataset = next(iter(datasets.values()))
simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
simulation.ensure()

bundle = simulation.release_bundle

Path("outputs").mkdir(exist_ok=True)
Path("outputs/release_bundle.json").write_text(json.dumps(bundle, indent=2, default=str))

print("bundle_id:", bundle["bundle_id"])
print("country:", bundle["country_id"])
print("model:", bundle["model_package"], bundle["model_version"])
print("data:", bundle["data_package"], bundle["data_version"])
print("dataset:", bundle["dataset_filepath"])
Output
bundle_id: us-4.0.0
country: us
model: policyengine-us 1.653.3
data: policyengine-us-data 1.73.0
dataset: ./data/enhanced_cps_2024_year_2026.h5
Next step: Two-manifest architecture

References

Where to go after the walkthrough

The model explorer, the policyengine.py repo, and the release-bundle docs are the three sources of truth. Use the quick-reference block below to check the bundle attached to any simulation you have already run.

Inspect the certified runtime bundle
# After running a simulation, inspect the certified runtime bundle
print(simulation.release_bundle)
Output
{'bundle_id': 'us-4.0.0', 'country_id': 'us', 'policyengine_version': '4.0.0', 'model_package': 'policyengine-us', 'model_version': '1.653.3', 'data_package': 'policyengine-us-data', 'data_version': '1.73.0', 'default_dataset': 'enhanced_cps_2024', 'certified_data_build_id': 'policyengine-us-data-1.73.0', 'compatibility_basis': 'matching_data_build_fingerprint', ...}
Model explorer

Variables and parameters

Use the model explorer after the walkthrough when you need exact variable names or parameter paths.

Reproducibility

Release bundles

The release-bundles doc describes the two-manifest layer, the fingerprint compatibility rule, and artifact states.

Examples

Working scripts

The checked-in examples in policyengine.py are the best place to look when you need a longer end-to-end pattern or paper-style reproduction.