XLiteOCR
Open-source OCR by EdgeXene

Open-source OCR you self-host.
Free, private, and Apache-2.0 licensed.

XLiteOCR turns documents and images into structured, machine-readable data without sending a single page to a third party. It runs on ordinary CPUs, needs no GPU, and is free to download, run, and modify.

More than text. Structure.

Every page comes back as clean, typed data your systems can act on, not just a wall of characters.

Text and layout

Accurate text recognition with precise bounding boxes and confidence scores for every line, from images and multi-page PDFs alike.

Per-region color

A capability most engines skip: the detected color of the text in each region, returned as hex, RGB, and a readable color name. Tell a red stamp from black body copy.

Tables and markdown

Structured mode reconstructs document layout into markdown with typed blocks (title, text, table) and exports tables as clean HTML, ready to ingest.

Figures to SVG

Logos, diagrams, and line art are vectorized to scalable SVG, so figures stay crisp at any size instead of being trapped as pixels. Preserve vital context in technical manuals and engineering schematics.

Private by design

Documents are processed in memory and never written to disk. Nothing leaves your infrastructure, which keeps regulated and sensitive data in your control.

Runs on plain CPUs

No GPU, no specialized hardware, no cloud accelerator bill. Deploy it on the servers you already run.

Why teams choose to self-host

Cloud OCR APIs meter you per page and route your documents through someone else's servers. XLiteOCR flips both.

No per-page fees

Process a thousand pages or a million; the cost of running it does not change with volume.

Your data stays put

Documents never leave your network. That is the difference between a vendor questionnaire you can pass and one you cannot.

You own the deployment

It installs and runs inside your environment, fully under your control. No external dependency to manage, no third party in the loop.

Open and inspectable

The full source is Apache-2.0 licensed. Read it, audit it, modify it, and run it indefinitely. No black box, no lock-in, no metered cloud API.

Built for document-heavy work

Wherever paper and PDFs become data, on infrastructure you control.

Invoices and receipts

Pull line items, totals, and tables into your finance systems without shipping financial documents to a third party.

Forms and records

Digitize intake forms, contracts, and archives at scale, keeping regulated records inside your own walls.

Internal automation

Wire document capture into your own back-office workflows and data pipelines, with no per-call metering to budget around.

Brand and design assets

Recover text color and vectorize logos and diagrams to SVG for downstream design and reporting pipelines.

Apache-2.0, top to bottom

XLiteOCR is released under the Apache License 2.0 and is assembled only from permissively licensed components: Apache 2.0, MIT, BSD, and similar. There is no GPL or AGPL anywhere in the stack, so there are no copyleft obligations or licensing surprises to clear before you put it into production.

Apache 2.0 MIT BSD No GPL / AGPL No GPU Self-hosted

See it read a document live

Upload an image or PDF and watch XLiteOCR return text, per-region color, tables, and figures in seconds, all running on a plain CPU.

Open the live demo