- Are my files really never uploaded?
- Yes. This tool uses the browser's built-in File API — files are read from disk directly into memory in your browser tab. The OCR engine (Tesseract.js) and the PDF library (pdf-lib) are both JavaScript libraries that run locally. You can disconnect from the internet after the page loads and the tool still works.
- Will the invisible text layer affect the PDF visually?
- No. The embedded text has opacity 0 — it is completely invisible. The original scanned image remains exactly as it was. The only change is that the PDF now has a text layer that search tools (Ctrl+F, Spotlight, Windows Search, Acrobat "Find") can index and search.
- How accurate is the OCR?
- Tesseract 4 (LSTM engine) achieves near-production accuracy on clean, straight, high-contrast scans — typically 95–99 % character accuracy for English. Accuracy drops for hand-written text, very small fonts (<8 pt), heavy compression artifacts, or skewed pages. Using 300 dpi instead of 150 dpi can recover 3–8 % accuracy on borderline scans. The tool embeds whatever Tesseract recognizes, so searching will reflect its accuracy.
- How many files can I process at once?
- There is no hard limit — you can drop 50 files if you want. Files are processed one at a time to avoid running out of memory, so the queue works through them sequentially. A 10-page scanned document at 200 dpi typically takes 30–90 seconds depending on your CPU. Large batches may take several minutes, but you can leave the tab open and return when the download button appears.
- What does "bulk OCR PDF files browser offline" actually mean?
- It means running Optical Character Recognition (OCR) on many scanned PDF files at the same time, entirely inside your web browser, without an internet connection to a processing server. Traditional OCR services (Adobe Acrobat, Adobe Scan, Google Drive, ABBYY FineReader online) send your file to their servers — this tool does not. Your legal documents, medical records, or confidential scans never leave your machine.
- My PDF already has text — will OCR help?
- If your PDF already has a text layer (created from a word processor, not a scanner), OCR is not needed and won't hurt, but it won't help either. This tool is designed for image-only PDFs — scanned paper documents, photos-to-PDF, fax archives, and similar sources. If you're unsure, try pressing Ctrl+F in your PDF viewer: if you can search and highlight text, the PDF already has a text layer.