OCR — extract text from a scanned PDF
Turns images of text into actual text using the Tesseract OCR engine, running entirely in your browser via WebAssembly. Choose the document language for best accuracy, then download the recognized text. Scans at 300 dpi give the best results.
How to use the OCR PDF tool
Select or drag your scanned PDF.
Pick the document language.
Click “Run OCR” and download the recognized text (.txt).
Your files stay on your device
This tool runs entirely in your browser using JavaScript and WebAssembly. There is no upload step and no server processing — open your network panel and check: zero document data is transmitted. It even keeps working offline once the page has loaded.
Frequently asked questions
How accurate is the OCR?
On clean 300-dpi scans of printed text, Tesseract typically reads 95–99% of characters correctly. Accuracy drops with low resolution, skewed pages, handwriting or decorative fonts.
Why does the first run take longer?
The OCR engine (about 4 MB) and the language model for your chosen language download once, then are cached by the browser. The recognition itself runs on your CPU.
Is my scanned document private?
Yes. Unlike most online OCR services, recognition runs in your browser via WebAssembly. The pages are never uploaded — only the OCR engine itself is downloaded.
Is it safe to use this tool with confidential documents?
Yes — and verifiably so. PDFAgent has no upload step: your file is processed by JavaScript running in your own browser and never leaves your device. You can open your browser’s network panel (or even go offline after loading the page) and confirm that no document data is transmitted.