March 8, 2026 · 4 min read · aiml.qa

AI Bias Audit: A Practical Guide for Startup CTOs

How to run an AI bias audit - what algorithmic bias is, which fairness metrics to use, how to choose the right criterion for your use case, and a worked credit scoring example.

AI Bias Audit: A Practical Guide for Startup CTOs

Algorithmic bias occurs when an AI model produces systematically different outcomes for different demographic groups - outcomes that cannot be explained by legitimate differences in the features relevant to the prediction task.

Bias in AI is not a fringe concern. It is a mainstream regulatory, legal, and business risk. EU AI Act, FCA guidance, CFPB guidance, and EEOC enforcement all create obligations around AI fairness for companies operating in regulated industries. Even outside regulated industries, a bias incident - a viral post showing your AI treats different users differently - carries serious reputational risk.

Here is a practical guide for startup CTOs on how to audit for it.

What Causes Algorithmic Bias?

Historical bias in training data - If your training data reflects historical discrimination (e.g., loan approval data that historically approved fewer loans to women), a model trained to replicate that data will replicate the discrimination.

Representation bias - If certain demographic groups are underrepresented in training data, the model will have less accurate predictions for those groups. It learned less about them.

Measurement bias - If the features used to train the model are measured differently across demographic groups (e.g., a proxy variable that is more accurate for one group than another), the model will use that feature differently across groups.

Feedback loops - In deployed systems, biased predictions can generate biased training data for the next model version, compounding the bias over time.

The Four Fairness Criteria

There is no single definition of algorithmic fairness. These are the four most commonly used criteria:

Demographic parity - The model should produce positive outcomes at the same rate across demographic groups. A credit model satisfies demographic parity if it approves loans at the same rate for all demographic groups.

Equalized odds - The model should have the same false positive rate AND false negative rate across demographic groups. A fraud model satisfies equalized odds if it flags legitimate transactions at the same rate (equal FPR) and misses actual fraud at the same rate (equal FNR) across demographic groups.

Equal opportunity - A relaxed version of equalized odds: the model should have the same false negative rate (true positive rate) across groups. For a credit model, equal opportunity means creditworthy applicants are approved at the same rate regardless of demographic group.

Predictive rate parity - When the model predicts a positive outcome, it should be correct at the same rate across groups. A recidivism model satisfies predictive rate parity if its positive predictions (high risk) are accurate at the same rate across demographic groups.

How to Choose the Right Criterion

The mathematically correct choice depends on the base rates of the outcome in your population. In most real-world datasets, it is mathematically impossible to satisfy all four criteria simultaneously when base rates differ across groups (the Impossibility Theorem of fairness).

The practically correct choice depends on your domain context and regulatory obligations:

  • Credit and lending: Equal opportunity is typically most relevant - creditworthy applicants from all groups should be approved
  • Fraud detection: Equalized odds is most relevant - false positives (legitimate customers blocked) and false negatives (actual fraud missed) should be equal across groups
  • Hiring AI: Demographic parity is most relevant under equal employment opportunity law
  • Healthcare: Equal opportunity - patients who would benefit from treatment should be identified at equal rates

A Worked Example: Credit Scoring

A fintech startup has a credit scoring model. They want to audit it for bias across gender.

Step 1: Define the protected attribute and groups - Gender (Male / Female / Other). Note: “Other” may have insufficient sample size for statistical significance; document this.

Step 2: Measure performance by group:

GroupApproval RateFPR (Default rate among approved)FNR (Creditworthy rejected)
Male68%4.2%11%
Female54%3.8%19%

Step 3: Calculate disparity metrics:

  • Approval rate disparity: 68% vs 54% - a 14 percentage point gap (demographic parity violation)
  • FNR disparity: 11% vs 19% - female creditworthy applicants are rejected at nearly twice the rate (equal opportunity violation)

Step 4: Investigate root cause - Is the disparity explained by legitimate features (income distribution, employment history) or by a spurious proxy variable correlating with gender?

Step 5: Remediate - Options include: reweighting training data, removing proxy variables, threshold adjustment by group (where legally permitted), or collecting additional training data for underrepresented groups.

Step 6: Document - Record the disparity metrics, root cause investigation, and remediation actions taken. This documentation is required for EU AI Act conformity assessment and US fair lending regulatory review.

Book an AI bias audit sprint if you want an independent assessment of your model’s fairness across demographic subgroups.

Ship AI You Can Trust.

Book a free 30-minute AI QA scope call with our experts. We review your model, data pipeline, or AI product - and show you exactly what to test before you ship.

Talk to an Expert