Llama 3.2 Fine-Tuning Showcase

Before (Baseline)

15.0%

After (Fine-Tuned)

95.0%

Key Efficiency

5.3x

Improvement in machine-parseability

Before vs. After Comparison

● Baseline Model Output

"Here is the extraction for the invoice. 
Note that tax was not listed.
```json
{
  \"vendor\": \"Cloud Net\",
  \"inv_no\": \"CN-1\",
  \"total\": 500.0,
  \"date\": \"Jan 10 2024\"
}
```
I hope this helps!"

Errors: Markdown fences, Prose prefix, Non-ISO date, Key name mismatch.

● Fine-Tuned Model Output

{
  "vendor": "Cloud Net",
  "invoice_number": "CN-1",
  "date": "2024-01-10",
  "due_date": null,
  "currency": "USD",
  "subtotal": 500.0,
  "tax": null,
  "total": 500.0,
  "line_items": [
    {"description": "Service charge", "quantity": 1.0, "unit_price": 500.0}
  ]
}

Success: Clean JSON, Mandatory keys present, ISO date format.

Deep Failure Analysis

We didn't stop at success. We analyzed 5 specific edge cases where the fine-tuned model still failed (e.g., European number formats, nested tables). This quantitative rigor is what drives production-grade AI.

Thousands Separator Confusion

Misidentified '.' as decimal point in European formats.

Multi-Page Continuity

Loss of context across page breaks in long POs.

Prompting vs. Fine-Tuning

"While few-shot prompting can achieve high accuracy on simple tasks, fine-tuning remains the superior choice for production systems requiring extreme reliability, low latency, and consistent structural adherence across diverse layouts."

Read Full Analysis →