Financial automation often hits a wall with unformatted invoices. This project demonstrates how to bridge the gap between "pixels" and "database entries" using advanced AI-powered spatial OCR and LLM reasoning.
Problem: The Heterogeneity of Paper
Our finance team was manually entering data from 1,000+ vendors, each with their own invoice layout.
- Rule-based OCR: Failed as soon as a font changed or a logo moved.
- Manual Overhead: 3 FTEs dedicated entirely to data entry.
- Error Rates: 5% entry error rate led to significant reconciliation issues at quarter-end.
Solution: Structured Extraction Pipeline
I designed a "Vision-to-JSON" pipeline that combines highly accurate spatial OCR with the semantic understanding of GPT-4o.
Engineering Insight: The "Cold Extraction" Challenge
[!TIP] Traditional OCR only gives you text coordinates. By converting OCR results into a Spatially-Aware Markdown representation before feeding it to the LLM, we preserved the "Visual Table" relationships, drastically improving extraction accuracy for nested line items.
The "Reliability" Layer
- Azure Document Intelligence: Chosen for its robust layout-aware OCR which handles rotated text, watermarks, and complex grid structures better than open-source alternatives.
- Pydantic Validation & Self-Correction: We used Instructor to force the LLM into a strict Pydantic schema. If validation failed (e.g., a missing Tax ID or an impossible date), the system automatically triggered a "Reflection" pass, sending the error back to the LLM for self-correction.
- Source Mapping: Every extracted field is mapped back to its original bounding box coordinates in the PDF. This ensures that a human auditor can verify the "Ground Truth" in a single click.
Security & PII
Handling invoices involves sensitive financial data and Tax IDs. We implemented an anonymization layer that masks PII (Personally Identifiable Information) before the data is logged for monitoring, ensuring compliance with local data protection acts.
Impact: Automation at Scale
The transition from manual entry to the LLM-pipeline delivered:
- -60% reduction in manual data entry time, allowing the team to focus on high-level financial analysis.
- 99% accuracy on core fields like "Total Amount" and "Invoice Date" after implementing the self-correction loop.
- Auditability: Quarter-end reconciliations dropped from weeks to days due to the machine-verifiable source mapping.