Why Citations Fail

Stanford research shows Westlaw AI-Assisted Research hallucinates 33% of the time. Lexis+ AI hallucinates 17%. By early 2026, researchers had catalogued over 1,200 cases worldwide where AI-generated citations failed in court — and those are just the ones that got caught. Here's what every litigator needs to know about how citations actually fail.

April 15, 20268 minCornerstone

Imagine filing a brief. You've spent weeks on the argument. The logic is tight, the authorities are stacked, the holding you cite is exactly on point. The judge reads it, pulls up the case — and it doesn't exist.

Not "doesn't quite support your argument." Not "was overturned." The case was never decided. The court never sat. The parties never sued each other. Your AI invented the whole thing, and you signed your name to it.

This isn't a hypothetical. It's happening right now, in courtrooms across America, at a scale that should terrify every lawyer who uses AI to draft anything.

The numbers

Stanford researchers ran a preregistered study across the leading legal AI research tools — the ones lawyers actually pay for, the ones that market themselves as "hallucination-free." The results:

Lexis+ AI hallucinated 17% of the time
Westlaw AI-Assisted Research hallucinated 33% of the time
GPT-4 (raw, no legal guardrails) hallucinated 43% of the time

That's not some edge case. That's one in three citations from the tool your firm pays thousands a year to use. One in three.

General-purpose models performed even worse. In an earlier Stanford study, the same research team found hallucination rates between 58% and 88% on verifiable legal questions. Llama 2 got it wrong nearly nine times out of ten.

The pace of documented incidents has accelerated sharply. Damien Charlotin's running database of AI hallucination cases catalogued hundreds of incidents through 2025, then crossed 1,200 cases worldwide in early 2026 — and continues to grow weekly. Those are just the ones that got caught.

The graveyard

Citation hallucinations don't fail quietly. They fail in courtrooms. In front of judges. On the record.

Mata v. Avianca (S.D.N.Y. 2023) — The case that started it all. Steven Schwartz, a New York attorney with three decades of experience, submitted a brief containing six entirely fabricated cases generated by ChatGPT. When the court asked him to produce the opinions, he went back to ChatGPT and asked if they were real. ChatGPT said yes — and assured him the cases "can be found in reputable legal databases such as LexisNexis and Westlaw." They couldn't. Judge P. Kevin Castel imposed a single $5,000 sanction, jointly and severally, on Schwartz, his colleague Peter LoDuca, and their firm — and ordered them to personally notify every judge falsely credited as author of the invented opinions.

Five thousand dollars sounds light. It wasn't. The reputational damage was the real sentence. Schwartz's name is now the first thing that comes up when anyone Googles "AI hallucination lawyer." Thirty years of practice, reduced to a cautionary tale.

Whiting v. City of Athens, Tennessee (6th Cir. 2026) — The Sixth Circuit escalated. Two Tennessee attorneys submitted briefs containing more than two dozen fake or misrepresented citations across three consolidated appeals. The court found they "repeatedly misrepresented the record, cited non-existent cases, and cited cases for propositions of law that they did not even discuss, much less support." The order set $15,000 in punitive sanctions per attorney plus the appellees' attorney fees and double costs — totalling roughly $116,000 once the fee accounting was complete, among the largest AI hallucination penalties on record. Both attorneys were referred for potential disciplinary proceedings.

Lacey v. State Farm General Insurance Co. (C.D. Cal. 2025) — Attorneys from Ellis George LLP and K&L Gates LLP — not solo practitioners, but lawyers at major firms — submitted a brief where 9 of 27 citations were wrong. At least two cases were completely fabricated. Special Master Michael Wilner found they "collectively acted in a manner that was tantamount to bad faith" and imposed $31,100 in sanctions, jointly and severally.

Mavy v. Commissioner of Social Security Administration (D. Ariz. 2025) — 12 of 19 cited cases in the opening brief were fabricated, misleading, or unsupported. That's 63% of the brief's legal foundation — gone. The court revoked the attorney's pro hac vice status, struck the brief, and referred her to her state bar.

Southern District of Florida (2025) — Attorney James Martin Paul filed motions across eight related cases, each containing AI-hallucinated citations. The court dismissed four federal matters, ordered him to pay opposing counsel's attorney fees, required him to attach the sanctions order to any new complaint for two years, and referred him to the Florida Bar for discipline. Eight separate filings, eight separate failures of basic verification.

The MyPillow case — Coomer v. Lindell (D. Colo. 2025) — Two attorneys representing Mike Lindell, Christopher Kachouroff and Jennifer DeMaster, filed a document Judge Nina Wang found contained nearly 30 defective citations, including fabricated cases. They were sanctioned $3,000 each for violating Rule 11. The headlines wrote themselves.

And here's the twist that should keep you up at night: in California's first published opinion on AI-hallucinated citations — Noland v. Land of the Free, L.P. (Cal. Ct. App. 2025) — the court imposed $10,000 in sanctions on attorney Amir Mostafavi for filing briefs with 21 fabricated citations across 23 cases. When the court declined to award fees to the respondents, it noted that their counsel had not flagged the fabrications either — a quiet signal that judges are watching how AI hallucinations propagate through litigation, even when the formal duty to verify still rests with the filer.

Five ways a citation fails

Most people get this wrong: they think the risk is binary. The citation exists, or it doesn't.

It's not that simple. A citation can fail in five fundamentally different ways — and each one requires a different response.

1. Fabricated — The case doesn't exist at all. No docket number, no opinion, no parties. Pure invention. This is Mata v. Avianca territory — the AI generated something that looks like a citation but points to thin air.

2. Misattributed Holding — The case exists. You can look it up. But it doesn't say what your brief claims it says. The opinion addresses a different issue entirely, or the holding is about a related but distinct point of law. This is arguably the most dangerous failure mode because it passes a simple existence check. The case is real. The claim about it isn't.

3. Wrong Jurisdiction — A Ninth Circuit opinion cited as binding authority in a Second Circuit brief. A state appellate decision cited as if it came from a federal court. The case is real, the holding might even be relevant — but it has no binding force in the jurisdiction where you're filing.

4. Hallucinated Detail — The case exists and roughly supports the point. But the year is wrong. Or the court is wrong. Or one of the parties has been swapped. Or the vote count is fabricated. Close enough to pass a glance, wrong enough to undermine your credibility.

5. Coverage Gap — The case isn't in any public database. It might be an unreported decision, a restricted-access opinion, or a case from a jurisdiction that doesn't publish digitally. It's not fabricated — you just can't verify it, which means the court can't either.

Every citation checker on the market today gives you a binary answer: PASS or FAIL. The citation exists, or it doesn't. But "exists" doesn't mean "supports your argument." And "not found" doesn't mean "fabricated."

The real cost

Sanctions are the visible cost. The invisible ones are worse.

A malpractice claim from a client whose case was damaged by a fabricated citation. A bar investigation that drags on for months. The referral that doesn't come because a partner at another firm Googled your name and found a sanctions order. The client who quietly switches firms because they read about AI hallucinations in the news and aren't sure you're checking.

And here's what's changed: judges are no longer surprised. They're expecting it. Courts across the country are implementing AI disclosure requirements. Some require lawyers to certify that no AI-generated content was submitted without verification. The grace period for "I didn't know AI could do that" is over.

The question isn't whether you'll encounter a hallucinated citation. At a 17–33% failure rate from the tools lawyers actually use, you already have. The question is whether you caught it before the judge did.

What checking actually looks like

A good verification isn't a yes-or-no lookup. It's a three-layer process:

Layer 1 — Does this case exist? Cross-reference against public court databases. Match the parties, the docket, the court, the year. If it's not there, stop.

Layer 2 — Does the opinion say what the brief claims? Read the actual text of the opinion. Compare the holding to the proposition in your brief. This is where most failures hide — behind real case names with wrong claims.

Layer 3 — Does the authority bind? Is the cited court in the right jurisdiction? Is the opinion from the right level of the court hierarchy? Is persuasive authority being presented as binding?

Any tool that skips Layer 2 and Layer 3 is giving you a false sense of security. You've confirmed the case exists — congratulations. You still don't know if your brief is accurate.

What this means for your firm

Knowing the names of the cases that got sanctioned isn't enough. The question is whether your firm has a process for catching what KeyCite and Shepard's pass through. A five-minute audit on every outgoing brief — that produces a partner-facing PDF your malpractice carrier expects to see — costs less than a single billable hour. The one your firm spent on the brief that contained the fabricated cite cost a lot more.

This is why we built TrustCitation

TrustCitation doesn't give you a traffic light. It gives you a diagnostic report.

Every citation runs through all three layers. When something fails, you don't just get a red flag — you get the specific failure category, a confidence score, the relevant passage from the actual opinion, and a plain-English explanation of what went wrong and how to fix it.

Because "this citation failed" isn't useful. "This citation attributes a holding the court never made — here's what the court actually said" is.

Knowing that your citations are wrong has been possible for years. Knowing why they're wrong — and exactly how to fix each one before you file — that's what was missing.

TrustCitation's validation methodology is grounded in published epistemological research on AI knowledge verification. Read the research paper →

Check your last brief.

Five-minute audit. Partner-facing PDF. Free for the first brief, no signup needed.

Check your last brief →More notes →