Intelligent Document Processing in Action: Lessons from DoorDash’s AI-Powered Menu System

Infographic showing the Intelligent Document Processing workflow: menu photo → OCR/LLM transcription → guardrail model → high-confidence automation or human review.

Introduction

Intelligent Document Processing (IDP) is one of the most practical and impactful applications of artificial intelligence today. It’s the backbone of countless enterprise workflows — from processing invoices and contracts to digitizing healthcare records, government applications, and compliance documents. Yet despite the hype around large language models (LLMs), anyone who has tried to automate real-world document processing knows the same truth: documents are messy.

Inconsistent layouts, incomplete scans, and poor-quality images can turn even the smartest AI pipeline into a liability. That’s why the recent work from DoorDash’s engineering team is so valuable. While their focus was on digitizing restaurant menus, the lessons apply far beyond food delivery. In many ways, their system is a case study in practical IDP at scale.

This article explores how DoorDash approached the challenge, why guardrails matter as much as models, and what enterprises can learn when implementing IDP solutions across industries.

The Challenge of Real-World Documents

DoorDash faced the same obstacles that plague every organization working with unstructured or semi-structured documents.

Inconsistent Structures
- Menus aren’t standardized. Some use multiple columns, others have decorative fonts, and categories are often scattered across the page. OCR systems frequently scramble text order, which leads to LLMs pairing the wrong items and attributes.
- Enterprises see the same with contracts, invoices, and compliance forms that vary wildly in design.
Incomplete Inputs
- Menu photos are often cropped or partial, leaving out critical sections. LLMs, when given incomplete context, tend to “hallucinate” or guess — leading to inaccurate records.
- Think about a healthcare intake form with missing sections, or a scanned tax form missing a page.
Poor Image Quality
- Dim lighting, glare, cluttered backgrounds, and skewed angles reduce OCR accuracy and cascade errors into LLMs.
- In the enterprise, blurry scans or photocopies produce the same downstream issues.

These three problems — inconsistency, incompleteness, and poor quality — aren’t just food delivery issues. They’re the universal hurdles of IDP.

The Baseline Approach: OCR → LLM

DoorDash’s engineering team began with a straightforward pipeline:

Step 1: Use Optical Character Recognition (OCR) to extract raw text from photos.
Step 2: Pass the text to a Large Language Model to structure it into categories, items, and attributes.

As a prototype, it worked. Machines could, in principle, take a menu photo and output a digital version. But scaling revealed cracks: frequent mismatches, misplaced attributes, and errors that made the output unreliable.

This highlights a critical point for enterprises: a proof-of-concept pipeline is not the same as a production system. Moving from demo to deployment requires more than bigger models — it requires architectural guardrails.

Introducing Guardrails: The Systemic Fix

The breakthrough came when DoorDash added what they called a guardrail model.

At its core, this was a classifier that predicted whether an AI-generated transcription was good enough to trust. If the output met the accuracy threshold, it was published automatically. If not, it was routed to human reviewers.

Multi-View Features

What made the guardrail effective was its multi-source feature engineering:

Image-level features: Blurriness, glare, clutter, or skew.
OCR features: Token order, confidence scores, junk text, and reliability signals.
LLM features: Internal consistency, completeness, and coverage.

By combining these perspectives, the guardrail directly attacked the three major failure modes: inconsistent structures, incomplete menus, and poor photo quality.

A Broader Lesson for IDP

Guardrails represent more than a technical fix. They embody a governance mindset for AI adoption. Just as societies need checks and balances, enterprises need systems that decide when automation is safe and when human oversight is essential.

Why Simpler Models Sometimes Win

Interestingly, DoorDash discovered that the best-performing guardrail wasn’t a deep neural network. Instead, it was LightGBM, a gradient-boosted decision tree.

LightGBM outperformed CNNs, ResNets, and Vision Transformers in both accuracy and efficiency.
The reason was simple: limited labeled data. Complex neural nets often underperform when sample sizes are small, while decision trees thrive.

This is a crucial reminder for enterprise AI teams: don’t overengineer. The most sophisticated architecture isn’t always the best. Pragmatism, not hype, should guide technology choices.

Designing a Human-in-the-Loop Production Pipeline

With guardrails in place, DoorDash built a full production pipeline that balanced automation with human review:

Validation: Basic checks confirm the menu photo is usable.
Transcription: OCR+LLM pipeline produces structured data.
Guardrail Inference: Multi-view features are scored for accuracy.
Routing: High-confidence outputs are automated; low-confidence outputs go to humans.

This hybrid approach created scalable efficiency:

Machines handle the easy cases at lightning speed.
Humans focus on edge cases where judgment is critical.

For enterprises adopting IDP, this is the blueprint: automation plus human oversight, mediated by guardrails.

Evolution Toward Multimodal LLMs

AI research moves fast, and DoorDash quickly began testing multimodal LLMs that can process both images and text directly.

Strengths: Better at understanding layouts, columns, and context.
Weaknesses: More brittle with poor-quality photos.

Instead of replacing the old pipeline, DoorDash ran both pipelines in parallel:

OCR+LLM provided stability across noisy inputs.
Multimodal LLMs excelled with unusual layouts.
The guardrail decided which result to trust.

This hybrid system delivered the best of both worlds — a pragmatic example of how to integrate emerging models without throwing away proven systems.

Lessons for Intelligent Document Processing (IDP)

The DoorDash case study illustrates several universal lessons for IDP adoption:

Guardrails are essential.
- AI needs supervision. Guardrails provide an accuracy filter, ensuring automation scales responsibly.
Humans remain in the loop.
- The goal isn’t replacing people but focusing their effort where it matters most.
Simplicity often beats sophistication.
- LightGBM outperforming transformers is a perfect reminder: fit the tool to the problem, not the hype.
Preprocessing matters.
- De-noising, de-skewing, and glare reduction can dramatically improve upstream OCR/LLM accuracy.
Hybrid systems are the future.
- OCR, LLMs, and multimodal models can coexist, each covering the other’s weaknesses.

Applications Beyond Menus

The same IDP framework applies to nearly every industry:

Finance: Automating invoice processing, expense receipts, and account reconciliation.
Healthcare: Digitizing patient forms, lab results, and medical records.
Legal: Parsing contracts, compliance filings, and case documents.
Government: Processing permits, tax forms, and benefits applications.

In the Microsoft/.NET Ecosystem

For organizations already invested in Microsoft technologies, the parallels are clear:

ML.NET can replicate guardrail-style classifiers.
Azure AI Document Intelligence provides OCR and document understanding at enterprise scale.
Semantic Kernel can orchestrate hybrid pipelines that combine LLMs, guardrails, and human review.

By drawing on these tools, businesses can apply the DoorDash blueprint directly within their existing .NET environments.

A Stoic Reflection on Guardrails

There’s a philosophical dimension here too. Stoicism teaches us that wisdom is not simply knowing what to do, but knowing when to act and when to hold back. DoorDash’s system embodies this principle: automation proceeds only when the guardrail judges it wise.

Enterprises adopting IDP can learn from this. The goal isn’t unchecked automation, but automation that recognizes its limits — and defers to human judgment when necessary.

Conclusion

Automating menu transcription may seem like a niche problem, but DoorDash’s solution is a masterclass in applied Intelligent Document Processing.

They began with a simple OCR-to-LLM pipeline.
They recognized its limitations and added guardrails.
They embraced human-in-the-loop design for balance.
They evolved toward multimodal models without abandoning pragmatism.

For enterprises everywhere, the lessons are clear: guardrails, hybrid models, and human oversight aren’t optional — they’re the path to scalable, responsible automation.

IDP isn’t just about extracting text from documents. It’s about building systems that balance intelligence with humility — ensuring that automation accelerates progress without sacrificing accuracy or trust.

References

Keith Baldwin

See Full Bio