Repeatability
High
The same six fields must be extracted from every invoice, and the output schema is fixed. While vendor layouts vary, the extraction logic is structurally identical across all 45 documents, which strongly favors automation.
Ambiguity Tolerance
High
Success criteria are crisp: a CSV with five named columns and a JSON line-items field, populated from each invoice. There is little interpretive ambiguity about what 'done' looks like, though edge cases like missing payment terms need a defined fallback.
Data & Tool Availability
High
The PDFs are the only input needed, and mature tools exist for this pipeline — PDF parsers, OCR engines (e.g., Tesseract, AWS Textract, Azure Form Recognizer), and LLM-based extraction. No external APIs or live credentials are required.
Error Cost
Medium
A wrong total amount or misread payment term could cause a late payment or accounting discrepancy, which is a real but recoverable business error. The output is a CSV that a human can audit before it enters any financial system, limiting downstream damage.
Human Judgment Required
Low
No taste, ethics, or relationship context is needed — this is pure structured extraction. The only judgment calls are handling ambiguous OCR output or unusual invoice formats, which can be flagged for human review rather than requiring human execution throughout.