COOKBOOK_DATA_EXTRACTION
COOKBOOKTurn unstructured text (PDFs, Emails) into structured JSON data.
OVERVIEW#
Extracting structured data from unstructured documents is a classic NLP task. In this recipe, we'll build an agent that extracts invoice details into a validated JSON schema.
1
DEFINE_THE_SCHEMA
The trick is to create a "No-Op" tool whose only purpose is to define the output structure.
extract-invoice.ts
TESTING_EXTRACTION#
Run the agent against a raw text invoice to verify the extraction accuracy.
typescript
COST_OPTIMIZATION
For high volume, use a cheaper model like `gpt-3.5-turbo` or `mistral-small` once you have verified the prompt works reliably.