The traditional way to ship a document parser is to train one. You collect a thousand or more labelled examples per document type, stand up a fine-tuning pipeline, run evals, ship a model artifact, and roll it into production. It's a real engineering discipline and the people doing it are good at their job — but the timeline it produces is the timeline of a model release.
Mindee, one of the most established players in this space, shipped nine new pre-trained models in 2024. Across twelve months, that's roughly one new format every five weeks. That cadence is what the economics of training force on you, not a sign that anyone is slacking.
The other side of that coin is what happens when a formatchanges — a bank rolls a new column layout, a payroll system reorders sections, an invoice template gets a redesign. Every change is a re-training project on its own. New labels, new evals, new release. Weeks, not hours.