Bulk OCR PDF — Make Scanned PDFs Searchable

Drop one or more scanned PDF files below. Each page is rendered and OCR'd in your browser, then an invisible text layer is embedded — giving you fully searchable PDFs. No upload. Nothing leaves your device.

📄
Drop PDF files here or click to choose
Multiple files OK · Only .pdf accepted
Initializing…
🔒 Your files stay on your device — no server, no upload

How it works

Every step runs inside your browser tab. No file ever touches a server.

1 — Render pages PDF.js reads each page and draws it onto an HTML Canvas at your chosen DPI. Scanned images become crisp bitmaps ready for OCR.
2 — OCR with Tesseract Tesseract.js runs in a Web Worker — it won't freeze the page. It returns every word with its bounding-box coordinates and confidence score.
3 — Embed text layer pdf-lib copies the original PDF bytes and overlays each word as invisible text, positioned precisely to match the scanned image. The visual appearance is unchanged.
4 — Bundle & download All processed files are packed into a single ZIP by JSZip and downloaded to your device. Open any result in a PDF viewer and use Ctrl/Cmd + F to search.

Accuracy depends on scan quality and DPI. For crisp, straight scans 200 dpi is usually enough. Use 300 dpi for dense text or small fonts.

Frequently asked questions

Are my files really never uploaded?
Yes. This tool uses the browser's built-in File API — files are read from disk directly into memory in your browser tab. The OCR engine (Tesseract.js) and the PDF library (pdf-lib) are both JavaScript libraries that run locally. You can disconnect from the internet after the page loads and the tool still works.
Will the invisible text layer affect the PDF visually?
No. The embedded text has opacity 0 — it is completely invisible. The original scanned image remains exactly as it was. The only change is that the PDF now has a text layer that search tools (Ctrl+F, Spotlight, Windows Search, Acrobat "Find") can index and search.
How accurate is the OCR?
Tesseract 4 (LSTM engine) achieves near-production accuracy on clean, straight, high-contrast scans — typically 95–99 % character accuracy for English. Accuracy drops for hand-written text, very small fonts (<8 pt), heavy compression artifacts, or skewed pages. Using 300 dpi instead of 150 dpi can recover 3–8 % accuracy on borderline scans. The tool embeds whatever Tesseract recognizes, so searching will reflect its accuracy.
How many files can I process at once?
There is no hard limit — you can drop 50 files if you want. Files are processed one at a time to avoid running out of memory, so the queue works through them sequentially. A 10-page scanned document at 200 dpi typically takes 30–90 seconds depending on your CPU. Large batches may take several minutes, but you can leave the tab open and return when the download button appears.
What does "bulk OCR PDF files browser offline" actually mean?
It means running Optical Character Recognition (OCR) on many scanned PDF files at the same time, entirely inside your web browser, without an internet connection to a processing server. Traditional OCR services (Adobe Acrobat, Adobe Scan, Google Drive, ABBYY FineReader online) send your file to their servers — this tool does not. Your legal documents, medical records, or confidential scans never leave your machine.
My PDF already has text — will OCR help?
If your PDF already has a text layer (created from a word processor, not a scanner), OCR is not needed and won't hurt, but it won't help either. This tool is designed for image-only PDFs — scanned paper documents, photos-to-PDF, fax archives, and similar sources. If you're unsure, try pressing Ctrl+F in your PDF viewer: if you can search and highlight text, the PDF already has a text layer.