AI Bias in the U.S. Criminal Justice System: Risk Scores and Due Process
Algorithmic risk assessment tools are embedded in pretrial detention, sentencing, and parole decisions across dozens of U.S. jurisdictions, and the constitutional validity of their use remains contested in federal and state courts. This page examines how risk scoring instruments function, what structural factors produce racially and socioeconomically disparate outputs, where legal challenges have succeeded or stalled, and how due process doctrine applies to algorithmic decision-making in criminal proceedings. The analysis draws on published judicial opinions, federal agency guidance, and documented technical literature to provide a reference-grade account of a rapidly evolving legal domain.
- Definition and Scope
- Core Mechanics or Structure
- Causal Relationships or Drivers
- Classification Boundaries
- Tradeoffs and Tensions
- Common Misconceptions
- Checklist or Steps (Non-Advisory)
- Reference Table or Matrix
- References
Definition and Scope
Algorithmic risk assessment in the criminal justice context refers to the use of statistical models — often proprietary software — that assign a numerical score predicting the likelihood that a defendant will reoffend, fail to appear for trial, or commit a violent act. These instruments are distinct from clinical judgment or officer discretion: they aggregate static and dynamic variables into a scalar output that is then presented to a judge, pretrial services officer, or parole board. For a broader overview of AI systems in criminal courts, see AI in State Courts.
The scope of deployment is substantial. Pretrial risk assessments are used in 29 states plus Washington D.C. (Arnold Ventures, Pretrial Justice Reform, 2022). Recidivism tools appear in federal presentence investigations under the First Step Act of 2018 (Pub. L. 115-391), which mandated the Bureau of Prisons to develop a risk-and-needs assessment system called PATTERN (Prisoner Assessment Tool Targeting Estimated Risk and Needs). At the state level, instruments such as COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), PSA (Public Safety Assessment), LSI-R (Level of Service Inventory-Revised), and the Ohio Risk Assessment System (ORAS) are each authorized by statute or administrative rule in at least one jurisdiction.
Due process implications arise because these tools influence deprivations of liberty — pretrial detention, sentence length, and parole denial — while their methodologies are frequently shielded from public disclosure under trade secret claims. The Fourteenth Amendment's due process clause, as interpreted in Mathews v. Eldridge, 424 U.S. 319 (1976), requires that the procedures used to deprive a person of liberty interests must be commensurate with the magnitude of what is at stake. How that balancing applies to opaque algorithmic outputs is the central legal question treated on this page and explored further at Algorithmic Due Process.
Core Mechanics or Structure
Risk assessment instruments operate by assigning weighted scores to input variables drawn from criminal history records, demographic data, and in some tools, self-reported survey responses. COMPAS, developed by Northpointe (now Equivant), uses 137 questions grouped into subscales covering criminal involvement, relationships, substance abuse, and residence stability (Northpointe, COMPAS Risk & Need Assessment System, 2015). The subscale scores are summed and mapped to a decile scale of 1–10, where higher scores predict higher risk.
The PSA, developed by Arnold Ventures, is publicly available and uses 9 static variables drawn exclusively from criminal history — no interview required. This contrasts with COMPAS's hybrid approach. The LSI-R uses 54 items covering 10 domains including accommodation, finances, and leisure activities. Each instrument's cut-off thresholds — the score levels at which judges receive "low," "medium," or "high" risk labels — are set by the vendor or jurisdiction and are rarely validated on the local population before deployment.
Validation studies are critical to understanding failure modes. When Northpointe conducted the COMPAS validation, the sample was drawn from Broward County, Florida. ProPublica's 2016 analysis (Machine Bias) applied the same Broward County dataset and found that Black defendants were nearly twice as likely as white defendants to be falsely flagged as high risk (false positive rate of 44.9% vs. 23.5%), while white defendants were more likely to be falsely classified as low risk when they went on to reoffend. Northpointe disputed the methodology, arguing that the instrument is equally accurate across racial groups when measured by AUC (area under the curve) — a calibration metric. Both claims are mathematically defensible under different fairness definitions, which is the core of the technical tension. See COMPAS Risk Assessment Tools for a dedicated treatment.
Causal Relationships or Drivers
The disparate outputs produced by risk scoring tools are not random — they trace to identifiable structural inputs. Criminal history data reflects policing patterns: neighborhoods subjected to higher-density patrol generate more arrests per capita, independent of underlying offense rates. A defendant from a heavily policed zip code accumulates a longer rap sheet not solely because of behavior but because of surveillance density. Because criminal history is the dominant predictor variable in most instruments, it encodes and amplifies existing enforcement disparities.
Socioeconomic variables also enter indirectly. Employment status, residential stability, and education level — factors correlated with race and class due to documented historical inequities — appear as neutral predictors in risk models but function as proxies for protected characteristics. The U.S. Government Accountability Office noted in its 2023 report (GAO-23-105483) that federal agencies using algorithmic tools in benefit and enforcement decisions often lack adequate bias testing protocols, a finding with direct analogues in the criminal justice setting.
The feedback loop compounds over time: defendants labeled high-risk are detained pretrial, which increases the likelihood of conviction and reduces income stability, which feeds into later risk assessments. This dynamic is documented in academic literature on AI Pretrial Detention Decisions, where pretrial detention itself is shown to increase the probability of a guilty plea regardless of actual culpability.
Classification Boundaries
Risk assessment instruments in the U.S. criminal justice system can be classified along three axes:
By decision point: Pretrial risk tools (PSA, Virginia Pretrial Risk Assessment Instrument – VPRAI) operate before conviction and inform release conditions. Presentence tools (COMPAS General Recidivism, LSI-R) inform sentencing. Post-conviction tools (PATTERN, LSI-R) inform parole and programming allocation.
By input methodology: Pure-record tools use only administrative criminal justice data (PSA). Hybrid tools combine records with structured interview responses (COMPAS, LSI-R). Survey-heavy tools rely primarily on self-report items covering attitudes and social circumstances.
By proprietary status: Open-source or publicly documented instruments (PSA, ORAS) allow independent validation. Proprietary instruments (COMPAS, Salient Factor Score variants) restrict access to scoring logic under trade secret protection, creating the due process tension addressed in State v. Loomis, 881 N.W.2d 749 (Wis. 2016), where the Wisconsin Supreme Court held that using COMPAS did not violate due process because the sentencing court did not rely on it mechanically — but critics noted the court never audited the algorithm itself.
Tradeoffs and Tensions
The most technically rigorous debate concerns mathematical incompatibility between competing definitions of fairness. Computer scientists Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan demonstrated in a 2016 working paper (NBER Working Paper 22257) that three intuitively reasonable fairness criteria — calibration within groups, equal false positive rates, and equal false negative rates — cannot simultaneously be satisfied unless base rates are equal across groups. Because recidivism base rates differ across demographic groups in existing datasets (themselves a product of differential enforcement), no algorithm can satisfy all three criteria at once. Courts and legislatures have not resolved which fairness metric should govern.
The due process tension is equally unresolved. The Loomis court conditioned its holding on the fact that Wisconsin judges must state that COMPAS is not the determinative factor. But requiring a judge to articulate a non-reliance disclaimer does not ensure actual non-reliance, and the instrument's numeric score creates an anchoring effect documented in behavioral economics literature. Related due process questions arise in AI Sentencing Guidelines and AI Parole and Probation Decisions, both of which involve liberty deprivations without full adversarial testing of algorithmic inputs.
The Sixth Amendment's confrontation principles, as extended through Crawford v. Washington, 541 U.S. 36 (2004), are also implicated when a vendor refuses to disclose source code under trade secret claims. At least one federal district court has ordered partial disclosure of COMPAS documentation in post-conviction proceedings, though no circuit-level precedent has established a constitutional right to full algorithmic transparency.
Common Misconceptions
Misconception 1: Risk scores predict individual behavior.
Risk instruments estimate group-level probabilities. A score of 8 out of 10 does not mean an 80% chance that this particular defendant will reoffend — it means the defendant resembles a historical group of which a higher proportion reoffended. Applying a population-level probability to an individual sentencing decision conflates statistical inference with individual prediction, a logical error documented by the MacArthur Foundation's Research Network on Law and Neuroscience.
Misconception 2: Removing race as an input variable eliminates racial bias.
COMPAS does not use race as a direct input variable. The disparate impact documented by ProPublica and others arises from proxy variables correlated with race — criminal history, socioeconomic indicators — not from a race field in the dataset. Excluding race while retaining its proxies does not eliminate disparate impact; it conceals the pathway.
Misconception 3: Validated instruments are accurate.
Validation in this context typically means the instrument performed at a certain AUC on a historical dataset, not that it accurately predicts future behavior for any given individual. The GAO has flagged gaps in ongoing validation obligations for federal tools (GAO-23-105483), and instrument accuracy degrades when applied to populations demographically different from the validation sample.
Misconception 4: Due process challenges have been uniformly unsuccessful.
While Loomis upheld COMPAS use, the legal landscape is not settled. In State v. Gordon, 261 A.3d 981 (Vt. 2021), Vermont's Supreme Court held that mechanical reliance on a risk score without adequate judicial reasoning violated sentencing requirements. California, New York, and Illinois have each enacted statutory or regulatory constraints on pretrial risk tool use through legislative actions passed between 2019 and 2023.
Checklist or Steps (Non-Advisory)
The following sequence describes the procedural elements typically present in a jurisdiction that formally incorporates risk assessment tools into criminal proceedings. This is a descriptive framework, not legal guidance.
Phase 1 — Instrument selection and procurement
- Jurisdiction identifies the decision point (pretrial, sentencing, parole)
- Procurement process specifies validation requirements for the local population
- Vendor contract addresses trade secret disclosure obligations, if any
- Legislative or administrative authorization is confirmed
Phase 2 — Assessment administration
- Trained assessor (pretrial services officer, probation officer) administers the instrument
- Data inputs are drawn from verified criminal justice records and, if applicable, structured interview
- Score is generated and mapped to a risk level category (low/medium/high)
Phase 3 — Score transmission
- Score report is included in pretrial release report, presentence investigation report (PSR), or parole dossier
- Defense counsel receives the report with sufficient time to respond before the relevant hearing
- Report includes the instrument's name, version, and the variables that contributed to the score
Phase 4 — Judicial or board use
- Decision-maker is instructed that the score is one factor among many, not a determinative variable
- Written decision articulates the weight given to the score and other individualized factors
- Record preserves the basis for a subsequent appellate challenge
Phase 5 — Post-decision review
- Appellate courts review whether the score was used mechanically or as one factor
- Defense counsel examines whether the instrument was validated on a comparable population
- Jurisdiction conducts periodic audits for disparate impact across demographic groups
This framework maps to the procedural requirements analyzed in detail at AI Evidence Admissibility and AI Constitutional Law Questions.
Reference Table or Matrix
| Instrument | Developer | Input Type | Public Documentation | Primary Use Stage | Known Fairness Dispute |
|---|---|---|---|---|---|
| COMPAS | Equivant (Northpointe) | Records + interview (137 items) | Partial (sample report public) | Pretrial, sentencing, parole | ProPublica 2016 false positive disparity finding |
| PSA | Arnold Ventures | Records only (9 items) | Full (open methodology) | Pretrial | Less studied; validated in 40+ jurisdictions |
| LSI-R | Multi-Health Systems | Interview + records (54 items) | Proprietary | Sentencing, parole | Age and gender bias documented in academic literature |
| PATTERN | U.S. Dept. of Justice / BOP | Records (administrative) | Federal Register publication | Federal post-conviction (BOP programming) | 2019 DOJ OIG found Black and Hispanic inmates scored higher (DOJ OIG Report 20-028) |
| VPRAI | Virginia Department of Criminal Justice Services | Records only | Publicly available | Pretrial | Periodically revalidated by state agency |
| ORAS | Ohio Department of Rehabilitation and Correction | Records + interview | Publicly available | Pretrial, sentencing, supervision | Multiple academic revalidation studies published |
Definitions of key technical terms used in this table — AUC, calibration, false positive rate — are provided at AI Legal Definitions Glossary.
References
- Arnold Ventures — Pretrial Justice Reform
- First Step Act of 2018, Pub. L. 115-391
- U.S. Government Accountability Office, GAO-23-105483: Artificial Intelligence
- ProPublica — Machine Bias (Angwin et al., 2016)
- Northpointe / Equivant — COMPAS Risk & Need Assessment System (2015)
- NBER Working Paper 22257 — Kleinberg, Mullainathan, Raghavan (2016)
- DOJ Office of Inspector General Report 20-028 — PATTERN Risk Tool Review
- State v. Loomis, 881 N.W.2d 749 (Wis. 2016)
- Crawford v. Washington, 541 U.S. 36 (2004)
- Mathews v. Eldridge, 424 U.S. 319 (1976)
- MacArthur Foundation Research Network on Law and Neuroscience
- U.S. Bureau of Prisons — PATTERN Overview
- Virginia Department of Criminal Justice Services — VPRAI