Mata v. Avianca: What KeyCite Missed and What a Real Audit Would Have Caught

Six fabricated cases. A 30-year veteran's name on the brief. A $5,000 sanction and a career rewritten as a cautionary tale. Here's what the existing cite-check workflow actually verified that day — and what it didn't.

April 28, 20269 minAudit Walkthrough

Mata v. Avianca: What KeyCite Missed and What a Real Audit Would Have Caught — hero

By the time the judge in Mata v. Avianca held the brief in his hands, the cite-check had already happened. That's the part of the story most retellings skip.

Steven Schwartz did not file a brief that nobody had checked. He filed a brief he believed had been verified — by ChatGPT, which assured him the cases "can be found in reputable legal databases such as LexisNexis and Westlaw." When the court demanded the underlying opinions, Schwartz went back to ChatGPT a second time and asked the same question. ChatGPT confirmed again. The cases existed. The cases were real. They weren't.

What broke down wasn't a missing process. What broke down was a process that had no way of catching what it was catching. And that gap — the distance between "the system said yes" and "the system actually checked" — is the gap our entire firm-level cite-check workflow sits inside in 2026.

This piece walks the Mata brief through the audit it should have had. Not the audit ChatGPT pretended to run, and not the audit KeyCite would have produced. The audit that would have flagged what the partners on the case needed to see.

The cases that didn't exist

The brief filed in Mata v. Avianca cited six fabricated cases. Reproducing them with the level of fidelity their authors meant for them to have:

Varghese v. China Southern Airlines Co. Ltd., 925 F.3d 1339 (11th Cir. 2019)
Shaboon v. Egyptair, 2013 IL App (1st) 111279-U (Ill. App. Ct. 2013)
Petersen v. Iran Air, 905 F. Supp. 2d 121 (D.D.C. 2012)
Martinez v. Delta Air Lines, Inc., 2019 WL 4639462 (Tex. App. Sept. 25, 2019)
Estate of Durden v. KLM Royal Dutch Airlines, 2017 WL 2418825 (Ga. Ct. App. June 7, 2017)
Miller v. United Airlines, Inc., 174 F.3d 366 (2d Cir. 1999)

Five of those reporters look almost right. The volume numbers fall in plausible ranges. The court abbreviations match the kind of court a hypothetical aviation case would actually go to. The party names are the kind of party names that show up in real airline-tort litigation. The dates are recent enough to feel current and old enough not to be obviously verifiable from memory.

That's the thing about a well-trained language model: it doesn't hallucinate randomly. It hallucinates plausibly. The shape of the citation is correct. The case behind the shape doesn't exist.

What KeyCite did, and didn't, do

The brief was almost certainly KeyCited before it was filed. Most briefs are. Most paralegals do at least one Westlaw or Lexis pass before a senior associate signs off, and most senior associates do at least a spot-check before a partner signs.

What KeyCite is built to do is well-defined and important:

It tells you whether a case has been overruled, distinguished, criticized, or limited by later courts.
It tells you whether the current treatment status is good law, bad law, or somewhere in between.
It cross-references related authorities on similar points of law.
It validates that the citation format matches a real opinion in the Westlaw corpus.

What KeyCite is not built to do is also important, and rarely articulated out loud:

It does not tell you whether the holding cited in your brief matches the holding actually present in the opinion.
It does not tell you whether the opinion exists at all, in the rare case the AI invented it from whole cloth.
It does not read your brief. It looks at citation strings.

So what happened in Mata? KeyCite ran. KeyCite returned what it always returns: a treatment indicator on each citation string the system could find. For a fabricated case, KeyCite's behavior is not a red error message. It is, more often, a quiet absence of treatment. The tool was never designed for this failure mode. It cannot tell you that Varghese v. China Southern Airlines, 925 F.3d 1339 doesn't exist; it can only tell you that it has nothing to add about it.

In a world where 800+ cases of AI-fabricated citations have been documented by mid-2026, the absence of a KeyCite indicator next to a citation is no longer a neutral signal. It is the first thing that should fail an audit.

What a real audit would have flagged

An audit that actually checked these citations — not a treatment scan, but a real verification pass — would have produced a one-page diagnostic report before the brief left the building. Reconstructing what that report would have shown, citation by citation:

1. Varghese v. China Southern Airlines — Layer 1 fail. No matching case found in CourtListener's federal database for the cited reporter. No matching docket in the 11th Circuit's PACER index for 2019. The volume number is plausible but unmatched. Recommended status: RED — Fabricated. Recommended action: confirm with the drafting attorney; do not file as cited.

2. Shaboon v. Egyptair — Layer 1 fail. The "2013 IL App (1st)" series exists, but no opinion at the cited slip-opinion number can be located. The combination of an Illinois state appellate court adjudicating a claim against Egyptair is also structurally unusual — Illinois state courts do not typically have jurisdiction over a foreign sovereign-owned airline absent specific factual ties. Recommended status: RED — Fabricated.

3. Petersen v. Iran Air — Layer 1 fail. No matching opinion in the District of Columbia federal docket index for 2012 at the cited reporter. The Iran Air party name does appear in unrelated litigation; this specific case does not. Recommended status: RED — Fabricated.

4. Martinez v. Delta Air Lines — Layer 1 fail. No corresponding Texas Court of Appeals opinion found at the WL identifier. Recommended status: RED — Fabricated.

5. Estate of Durden v. KLM Royal Dutch Airlines — Layer 1 fail. No corresponding Georgia Court of Appeals opinion found at the WL identifier. Recommended status: RED — Fabricated.

6. Miller v. United Airlines, Inc., 174 F.3d 366 (2d Cir. 1999)* — Layer 1 partial match. A Miller v. United Airlines opinion does exist, but the cited reporter and year do not match any opinion in the Second Circuit's published record at that volume. Recommended status: YELLOW — Verify with drafting attorney. This is the failure mode that's hardest to catch by eye. The party names are familiar. The reporter is real. The case behind the reporter is not the case being cited.

A diagnostic report with five RED flags and one YELLOW would have stopped the filing. It would have stopped it in three minutes — not three weeks, not three months, not after the judge had read the brief and asked for the opinions.

The artifact, not the catch

Here's what's easy to miss in this story: catching the citations was the small part. Producing something the partner could act on was the big part.

A paralegal who runs a verification pass and finds five problems has done valuable work. A paralegal who hands the supervising attorney a one-page PDF — case caption at the top, six citations in a table, severity tags color-coded, a 90-second summary at the bottom — has done valuable work that the partner can defend on Monday morning. The first is invisible. The second is the answer to the next carrier renewal questionnaire.

We sometimes forget that the deepest cost of Mata wasn't the $5,000 sanction. It was the absence of a paper trail. There was no document in the file showing that someone, somewhere in the firm, had verified those citations against the underlying record before the brief went out the door. There was a timestamp on a ChatGPT conversation. There was no audit log.

A real cite-check artifact does three things at once. It catches the failure. It tells the supervising attorney what the failure is. And it leaves a record showing the firm had a process. In 2026, with carriers writing AI questions into renewal forms and bar opinions translating supervisory duty under Rules 5.1 and 5.3 into something teeth can be sunk into, the third thing is no longer optional.

What's changed since 2023

The bench is no longer surprised. It is expecting hallucinated citations and treating their presence as a failure of supervision rather than a quirk of new technology. The 2026 New York opinion in 2026-NY-Slip-Op-26014 extended sanction liability to a supervising attorney who had not personally used AI but had failed to keep abreast of it; the court framed the failure as one of professional currency, not technical competence. I didn't know AI could do that stopped being a defense around the time Mata was published. I supervise people who use it but I'm not personally familiar stopped being a defense in 2026.

What this means operationally is straightforward and uncomfortable. The firms that haven't yet built a documented citation-verification step into the brief workflow are the firms whose names will appear in the next round of trade-press headlines. The firms that have built one will be the ones quietly answering carrier questionnaires with confidence and continuing to file.

The cases will keep coming. The Charlotin database is adding two to three new fabricated-citation orders a day. By the time you finish reading this piece, the count is no longer the count it was when this piece was published. Treatment indicators alone won't catch them. Three-layer audits will. The artifact your paralegal can hand your partner before the brief leaves the firm is the difference between Mata happening to your firm and Mata being the cautionary tale you cite to junior associates when you're explaining the workflow.

That artifact is what TrustCitation is. The audit is the product. The PDF is the bridge.

Check your last brief.

Five-minute audit. Partner-facing PDF. Free for the first brief, no signup needed.

Check your last brief →More notes →