XLiteOCR turns documents and images into structured, machine-readable data without sending a single page to a third party. It runs on ordinary CPUs, needs no GPU, and is free to download, run, and modify.
Every page comes back as clean, typed data your systems can act on, not just a wall of characters.
Accurate text recognition with precise bounding boxes and confidence scores for every line, from images and multi-page PDFs alike.
A capability most engines skip: the detected color of the text in each region, returned as hex, RGB, and a readable color name. Tell a red stamp from black body copy.
Structured mode reconstructs document layout into markdown with typed blocks (title, text, table) and exports tables as clean HTML, ready to ingest.
Logos, diagrams, and line art are vectorized to scalable SVG, so figures stay crisp at any size instead of being trapped as pixels. Preserve vital context in technical manuals and engineering schematics.
Documents are processed in memory and never written to disk. Nothing leaves your infrastructure, which keeps regulated and sensitive data in your control.
No GPU, no specialized hardware, no cloud accelerator bill. Deploy it on the servers you already run.
Cloud OCR APIs meter you per page and route your documents through someone else's servers. XLiteOCR flips both.
Process a thousand pages or a million; the cost of running it does not change with volume.
Documents never leave your network. That is the difference between a vendor questionnaire you can pass and one you cannot.
It installs and runs inside your environment, fully under your control. No external dependency to manage, no third party in the loop.
The full source is Apache-2.0 licensed. Read it, audit it, modify it, and run it indefinitely. No black box, no lock-in, no metered cloud API.
Wherever paper and PDFs become data, on infrastructure you control.
Pull line items, totals, and tables into your finance systems without shipping financial documents to a third party.
Digitize intake forms, contracts, and archives at scale, keeping regulated records inside your own walls.
Wire document capture into your own back-office workflows and data pipelines, with no per-call metering to budget around.
Recover text color and vectorize logos and diagrams to SVG for downstream design and reporting pipelines.
XLiteOCR is released under the Apache License 2.0 and is assembled only from permissively licensed components: Apache 2.0, MIT, BSD, and similar. There is no GPL or AGPL anywhere in the stack, so there are no copyleft obligations or licensing surprises to clear before you put it into production.
Upload an image or PDF and watch XLiteOCR return text, per-region color, tables, and figures in seconds, all running on a plain CPU.
Open the live demo