Python package guide

`policyengine.py` for US tax and benefit analysis

Reference examples covering household impact, policy reforms, microsimulation over calibrated microdata, and regional breakdowns.

Core concepts

The Python guide now follows the unified policyengine package. Four concepts show up throughout the workflow:

Household calculator

Call pe.us.calculate_household(...) with plain Python dicts; the typed result exposes every variable in the model.

Datasets

Use pe.us.ensure_datasets() to load representative microdata, then feed it into Simulation.

Reforms as dicts

A reform is a {"param.path": value} dict. Same shape for reform= (household) and policy= (microsim).

Outputs

Aggregate, ChangeAggregate, and pe.us.economic_impact_analysis() turn simulations into analysis.

US entity hierarchy

Outputs come back at the entity level where a variable is defined. Everything else is a mapping operation.

Entity	Scope	Example variables
`person`	Individual	`employment_income, age, is_disabled`
`marital_unit`	Married couple or single adult	`joint return grouping`
`tax_unit`	Tax filing unit	`income_tax, ctc_value, eitc`
`spm_unit`	SPM poverty unit	`snap, housing_assistance`
`family`	Related-by-blood grouping	`family-level programmes`
`household`	All people at one address	`household_net_income, state_fips, rent`

Parameter types

Every reform target is a parameter. Knowing which shape a parameter has tells you how to reference it in a Policy.

Single valuegov.irs.credits.ctc.amount.base[0].amount

A scalar amount, rate, or threshold. Set a new value for a date range.

Listgov.irs.credits.refundable

A list of values, often names of variables that qualify for a rule.

Scalegov.hmrc.income_tax.rates

Graduated thresholds and rates. Access via .thresholds, .rates, .amounts.

Breakdowngov.irs.credits.ctc.phase_out.threshold.JOINT

Parameter broken down by an enum (filing status, age band, region).

Simulation

Household-level analysis

Per-household calculations with pe.us.calculate_household: reforms, variation grids, programmatic builders, tracing, and charts.

Start with pe.us.calculate_household()

For one explicit family or household, call calculate_household with plain Python dicts. No wrapper class, no situation dictionary - keyword args for people and each entity, plus a year. The result is a typed object with one attribute per entity section.

Install policyengine for US

pip install "policyengine[us]"

US household impact

import policyengine as pe

result = pe.us.calculate_household(
    # One dict per person - keys are any person-level variable on the US model.
    # Adults default to the same tax unit and household.
    people=[
        {"age": 35, "employment_income": 40000},  # primary earner
        {"age": 33},                              # spouse
        {"age": 8},                               # dependent
        {"age": 5},                               # dependent
    ],
    # Tax-unit inputs (filing status, etc.)
    tax_unit={"filing_status": "JOINT"},
    # Household inputs. state_code is essentially always needed.
    household={"state_code": "TX"},
    # Year determines which parameter values apply.
    year=2026,
)

# Attribute access on the typed result. Group entities (tax_unit, spm_unit,
# household) are single objects; person sections are lists (result.person[0]).
print(f"Net income: ${result.household.household_net_income:,.0f}")
print(f"EITC: ${result.tax_unit.eitc:,.0f}")
print(f"SNAP: ${result.spm_unit.snap:,.0f}")

Output

Net income: $51,261
EITC: $5,454
SNAP: $3,205

Next step: Household axes

Microsimulation

Population-level analysis

Aggregate estimates over calibrated microdata: weighted totals, baseline-vs-reform impacts, regional slices, and distributional charts.

Representative datasets replace the old Microsimulation entry point

For population analysis, move to dataset-backed Simulation objects. pe.us.ensure_datasets() is the entry point: it loads cached HDF5 datasets when present and otherwise downloads and uprates them. Simulation.ensure() is the new canonical run method - it loads a cached result if available, otherwise runs and caches. pe.us.model supplies the pinned TaxBenefitModelVersion.

US dataset-backed simulation

import policyengine as pe
from policyengine.core import Simulation

year = 2026
# ensure_datasets downloads from HuggingFace on first run, caches locally,
# and returns a {"<stem>_<year>": Dataset} dict.
datasets = pe.us.ensure_datasets(
    datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"],
    years=[year],
    data_folder="./data",
)
dataset = datasets[f"enhanced_cps_2024_{year}"]

# pe.us.model is the country model version pinned by this policyengine.py release.
simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
# ensure() loads a cached run if available, otherwise runs and caches.
simulation.ensure()

output = simulation.output_dataset.data
print(output.household[["household_net_income", "household_tax"]].head())

Output

         weight  household_net_income  household_tax
0      0.000000         162787.765625   49123.687500
1  94409.679688          13163.706055   -1462.839600
2      0.000000          14824.270508       0.000000
3      0.000000         144380.109375   33226.578125
4      0.000000         157499.937500   31022.513672

Old Microsimulation mental model -> new Simulation mental model

# Old mental model:
#   from policyengine_us import Microsimulation
#   sim = Microsimulation(dataset=...)
#
# New policyengine.py mental model:
#   import policyengine as pe
#   datasets = pe.us.ensure_datasets(...)
#   simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
import policyengine as pe
from policyengine.core import Simulation

dataset = datasets[f"enhanced_cps_2024_{year}"]
simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
simulation.ensure()

print(simulation.release_bundle["bundle_id"])
print(type(simulation.output_dataset.data.household).__name__)

Output

us-4.0.0
MicroDataFrame

Next step: Entity outputs

Reproducibility

Pin, verify, export

A policyengine.py release pins a country model to an exact certified data artifact and refuses to mix a model with data it was not certified against. Pin the bundle in requirements, verify the two manifest layers, and emit a TRACE TRO for citations.

Pin the bundle and save it next to every output

The user-facing reproducibility boundary in policyengine.py is the certified runtime bundle. It pins a policyengine.py version to an exact country-model version AND an exact certified data artifact. v4 adds a hard certification check at import time: the installed country package must match the bundled manifest. The practical workflow: pin policyengine in requirements, and write simulation.release_bundle to disk alongside the results you publish.

Pin the release and capture the bundle

# Step 1: pin the exact policyengine.py release in your environment.
# pip install "policyengine[us]==4.3.0"

# Step 2: capture the certified runtime bundle next to every output you save.
import json
from pathlib import Path

import policyengine as pe
from policyengine.core import Simulation

datasets = pe.us.ensure_datasets(years=[2026], data_folder="./data")
dataset = next(iter(datasets.values()))
simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)
simulation.ensure()

bundle = simulation.release_bundle

Path("outputs").mkdir(exist_ok=True)
Path("outputs/release_bundle.json").write_text(json.dumps(bundle, indent=2, default=str))

print("bundle_id:", bundle["bundle_id"])
print("country:", bundle["country_id"])
print("model:", bundle["model_package"], bundle["model_version"])
print("data:", bundle["data_package"], bundle["data_version"])
print("dataset:", bundle["dataset_filepath"])

Output

bundle_id: us-4.0.0
country: us
model: policyengine-us 1.653.3
data: policyengine-us-data 1.73.0
dataset: ./data/enhanced_cps_2024_year_2026.h5

Next step: Two-manifest architecture

References

Where to go after the walkthrough

The model explorer, the policyengine.py repo, and the release-bundle docs are the three sources of truth. Use the quick-reference block below to check the bundle attached to any simulation you have already run.

Inspect the certified runtime bundle

# After running a simulation, inspect the certified runtime bundle
print(simulation.release_bundle)

Output

{'bundle_id': 'us-4.0.0', 'country_id': 'us', 'policyengine_version': '4.0.0', 'model_package': 'policyengine-us', 'model_version': '1.653.3', 'data_package': 'policyengine-us-data', 'data_version': '1.73.0', 'default_dataset': 'enhanced_cps_2024', 'certified_data_build_id': 'policyengine-us-data-1.73.0', 'compatibility_basis': 'matching_data_build_fingerprint', ...}

Model explorer

Variables and parameters

Use the model explorer after the walkthrough when you need exact variable names or parameter paths.

Reproducibility

Release bundles

The release-bundles doc describes the two-manifest layer, the fingerprint compatibility rule, and artifact states.

Examples

Working scripts

The checked-in examples in policyengine.py are the best place to look when you need a longer end-to-end pattern or paper-style reproduction.

Explore variables and parameters Open policyengine Open release-bundle docs

policyengine.py for US tax and benefit analysis

Household calculator

Datasets

Reforms as dicts

Outputs

US entity hierarchy

Parameter types

Household-level analysis

Start with pe.us.calculate_household()

Population-level analysis

Representative datasets replace the old Microsimulation entry point

Pin, verify, export

Pin the bundle and save it next to every output

Where to go after the walkthrough

Variables and parameters

Release bundles

Working scripts

`policyengine.py` for US tax and benefit analysis