- Why does hidden text exist in redacted PDFs?
- The most common cause is "paint over" redaction: an author draws a filled black rectangle (or uses an annotation) on top of the text rather than deleting the underlying content. The PDF format stores visual layers separately from the text data, so covering text visually does not remove it from the file. Any software that can parse PDF content streams — including pdf.js, pdftotext, or even a simple copy-paste — can retrieve the original text. This tool automates that cross-check in seconds.
- What does it mean if the tool reports "No hidden text detected"?
- It means no extractable text item was found to fall within a detected dark rectangle on any page. There are two reasons a redacted document might show as clean: the text was properly removed from the content stream during redaction (correct), or the page was scanned as an image before redaction so there was never a text layer to begin with (also no risk). The tool cannot detect hidden text in fully rasterised/scanned PDFs because there is no text layer to extract — that is expected and correct behaviour.
- Does my PDF get sent to a server?
- No. The entire analysis runs inside your browser using the pdf.js library loaded from a CDN. The PDF bytes are read from your local disk via the File API and processed entirely in JavaScript on your own machine. Nothing is uploaded. You can disconnect from the internet after the page loads and the tool will still work.
- What counts as a "dark rectangle" for detection purposes?
- The tool scans each rendered page for connected regions of pixels where the red, green, and blue channels are all below 60 (out of 255) — essentially black or very dark ink. It then filters for regions with a minimum area (≥ 400 square PDF points by default) to avoid flagging small dots, thin rule lines, or letter strokes. Regions that are tall and narrow (like a letter "I") are also excluded because they are unlikely to be redaction bars.
- Can I use this to check court filings or legal documents?
- Yes — this is one of the most common use cases. Journalists and attorneys have historically discovered sensitive information in improperly redacted court documents using exactly this technique (notably the 2005 Sony BMG rootkit case and several high-profile government leak cases). This tool automates the check. However, always verify flagged findings manually before drawing conclusions, and consult a legal professional before publishing or acting on the content.