PDF Hidden Text Finder

Check whether a PDF contains text hidden behind redaction rectangles. The file is processed entirely in your browser — it is never uploaded anywhere.

Drop a PDF here or click to browse

Your file stays in the browser — no upload, no server.

⚠ This tool detects text that survives as extractable data in the PDF file even though it appears visually covered. It checks content streams and annotation layers — if the text was truly "burned in" as a bitmap, it will correctly show as clean.
Loading PDF…

How it works

When a document is redacted by simply drawing a black rectangle over sensitive text — without properly removing the underlying data — the original text remains in the PDF's content stream and can be extracted by any PDF reader or copy-paste action.

1. Render to canvas Each page is rendered at 2× scale using pdf.js, giving pixel-accurate visual output without a plugin.
2. Detect dark rectangles The canvas pixel data is scanned for contiguous solid-dark regions (very dark fill with large area) that match the visual signature of a redaction bar.
3. Extract text positions pdf.js getTextContent() returns every text item with its bounding box in PDF user-space coordinates, then normalised to match the rendered canvas.
4. Intersect & flag Any text item whose bounding box overlaps a detected dark rectangle is flagged. You see the page number, bounding-box coordinates, and the exposed text string.

Proper redaction requires burning the text layer out of the PDF (e.g. using Adobe Acrobat's Redact tool or PDF Redactor, which replaces the content stream rather than drawing on top of it).

Frequently asked questions

Why does hidden text exist in redacted PDFs?
The most common cause is "paint over" redaction: an author draws a filled black rectangle (or uses an annotation) on top of the text rather than deleting the underlying content. The PDF format stores visual layers separately from the text data, so covering text visually does not remove it from the file. Any software that can parse PDF content streams — including pdf.js, pdftotext, or even a simple copy-paste — can retrieve the original text. This tool automates that cross-check in seconds.
What does it mean if the tool reports "No hidden text detected"?
It means no extractable text item was found to fall within a detected dark rectangle on any page. There are two reasons a redacted document might show as clean: the text was properly removed from the content stream during redaction (correct), or the page was scanned as an image before redaction so there was never a text layer to begin with (also no risk). The tool cannot detect hidden text in fully rasterised/scanned PDFs because there is no text layer to extract — that is expected and correct behaviour.
Does my PDF get sent to a server?
No. The entire analysis runs inside your browser using the pdf.js library loaded from a CDN. The PDF bytes are read from your local disk via the File API and processed entirely in JavaScript on your own machine. Nothing is uploaded. You can disconnect from the internet after the page loads and the tool will still work.
What counts as a "dark rectangle" for detection purposes?
The tool scans each rendered page for connected regions of pixels where the red, green, and blue channels are all below 60 (out of 255) — essentially black or very dark ink. It then filters for regions with a minimum area (≥ 400 square PDF points by default) to avoid flagging small dots, thin rule lines, or letter strokes. Regions that are tall and narrow (like a letter "I") are also excluded because they are unlikely to be redaction bars.
Can I use this to check court filings or legal documents?
Yes — this is one of the most common use cases. Journalists and attorneys have historically discovered sensitive information in improperly redacted court documents using exactly this technique (notably the 2005 Sony BMG rootkit case and several high-profile government leak cases). This tool automates the check. However, always verify flagged findings manually before drawing conclusions, and consult a legal professional before publishing or acting on the content.