- Will it work offline?
- On first use, the browser downloads Tesseract.js and its English language data (~10 MB) from a CDN. After that the page can work offline if you keep the tab open. The language data is cached by your browser — subsequent uses in the same browser are instant.
- What makes this PDF "searchable"?
- A standard image-only PDF is just a picture — you cannot select or search text in it. This tool adds an invisible text layer that exactly overlays the image. PDF readers (Adobe, Preview, Chrome PDF viewer, etc.) index that layer, so Cmd/Ctrl+F finds words, and you can drag-select and copy text out. The document still looks like your original photo; only the searchability changes.
- How accurate is the OCR?
- Tesseract.js uses Tesseract 4's LSTM neural network engine, which achieves around 90–97% character accuracy on clear, printed English text. Accuracy drops for very small fonts, handwriting, non-Latin scripts (it defaults to English — see the language hint below the OCR card), heavily shadowed photos, or extreme angles. The perspective-crop and shadow-removal steps are specifically designed to maximize Tesseract accuracy before recognition runs.
- Can I process multiple pages into one PDF?
- Yes — add multiple photos in Step 1. Each photo becomes its own page. After OCR runs on a page, click "Next page" to move to the next. When all pages are done, click "Build PDF" to merge all pages into a single multi-page searchable PDF file.
- What happens to my photos and the text extracted?
- Nothing is transmitted to any server. The images stay in your browser's memory (JavaScript heap) for the duration of the session and are discarded when you close the tab. No analytics, no third-party tracking on the image content — only standard CDN requests for the Tesseract.js library itself.