Before (Baseline)
15.0%
After (Fine-Tuned)
95.0%
Key Efficiency
5.3x
Improvement in machine-parseability
Before vs. After Comparison
● Baseline Model Output
"Here is the extraction for the invoice.
Note that tax was not listed.
```json
{
\"vendor\": \"Cloud Net\",
\"inv_no\": \"CN-1\",
\"total\": 500.0,
\"date\": \"Jan 10 2024\"
}
```
I hope this helps!"
Errors: Markdown fences, Prose prefix, Non-ISO date, Key name mismatch.
● Fine-Tuned Model Output
{
"vendor": "Cloud Net",
"invoice_number": "CN-1",
"date": "2024-01-10",
"due_date": null,
"currency": "USD",
"subtotal": 500.0,
"tax": null,
"total": 500.0,
"line_items": [
{"description": "Service charge", "quantity": 1.0, "unit_price": 500.0}
]
}
Success: Clean JSON, Mandatory keys present, ISO date format.
Deep Failure Analysis
We didn't stop at success. We analyzed 5 specific edge cases where the fine-tuned model still failed (e.g., European number formats, nested tables). This quantitative rigor is what drives production-grade AI.
01
Thousands Separator Confusion
Misidentified '.' as decimal point in European formats.
02
Multi-Page Continuity
Loss of context across page breaks in long POs.
Prompting vs. Fine-Tuning
"While few-shot prompting can achieve high accuracy on simple tasks, fine-tuning remains the superior choice for production systems requiring extreme reliability, low latency, and consistent structural adherence across diverse layouts."
Read Full Analysis →