Skip to content
100% private — files stay on your device

OCR — extract text from a scanned PDF

Turns images of text into actual text using the Tesseract OCR engine, running entirely in your browser via WebAssembly. Choose the document language for best accuracy, then download the recognized text. Scans at 300 dpi give the best results.

Select a scanned PDF
or drag & drop here — files never leave your device
    OCR is CPU-intensive; limiting pages speeds it up.

    How to use the OCR PDF tool

    1

    Select or drag your scanned PDF.

    2

    Pick the document language.

    3

    Click “Run OCR” and download the recognized text (.txt).

    Your files stay on your device

    This tool runs entirely in your browser using JavaScript and WebAssembly. There is no upload step and no server processing — open your network panel and check: zero document data is transmitted. It even keeps working offline once the page has loaded.

    Frequently asked questions

    How accurate is the OCR?

    On clean 300-dpi scans of printed text, Tesseract typically reads 95–99% of characters correctly. Accuracy drops with low resolution, skewed pages, handwriting or decorative fonts.

    Why does the first run take longer?

    The OCR engine (about 4 MB) and the language model for your chosen language download once, then are cached by the browser. The recognition itself runs on your CPU.

    Is my scanned document private?

    Yes. Unlike most online OCR services, recognition runs in your browser via WebAssembly. The pages are never uploaded — only the OCR engine itself is downloaded.

    Is it safe to use this tool with confidential documents?

    Yes — and verifiably so. PDFAgent has no upload step: your file is processed by JavaScript running in your own browser and never leaves your device. You can open your browser’s network panel (or even go offline after loading the page) and confirm that no document data is transmitted.

    Related tools