IT
OmnvertImage • Document • Network

Redact a PDF — permanently remove text

Remove sensitive text from a PDF using keywords or ready-made presets (emails, phones, credit cards, IBANs). Underlying text is permanently deleted, not just covered.

True redaction permanently removes underlying text. There is no undo.
2
Pick presets or add keywords
Tick presets below and/or add your own keywords.
3
Redact & download
Text is removed permanently from every page.

PDF Preview

No file selected
Your PDF preview will appear here.

How to use this tool

  1. Upload a PDF that has selectable text. If it's a scan, run PDF OCR first — otherwise there's no text layer to redact.
  2. Pick one or more presets (emails, phones, credit cards, IBANs) and/or type keywords — one per line. Keywords match case-insensitively across every page.
  3. Click 'Redact & Download'. Every match is covered with a black box and the underlying text is permanently removed from the PDF content stream.
  4. Keep the original file safe. The downloaded redacted PDF is a separate file — there is no undo.

Presets (regex-based)

  • emailmatches addresses like alice@example.com
  • phonematches common phone formats with optional country code
  • credit-cardmatches 13–19 digit card numbers (with spaces/dashes)
  • ibanmatches IBANs (2 letters + 2 digits + 10–30 alphanumerics)

Tips

  • Keywords are literal — add common variants (Mr. Smith / Smith, Mr / John Smith) to catch all occurrences.
  • Scanned PDFs won't redact properly — the visible ink is an image. Run PDF OCR, download, then redact the result.
  • If redaction says no hits were found, check that your PDF has selectable text (try Ctrl+F in a viewer).
  • Large PDFs (close to 50 MB) take longer. Processing is done on the server and the file is deleted after you download it.

Important: redaction is permanent

Unlike a black rectangle drawn on top of text, this tool uses apply_redactions to erase the underlying text from the PDF content stream. The redacted file cannot be reversed back into the original. Always keep the original separately in case you need to re-issue with different redactions.

Server-sideProcessed server-side

This tool uses a server-side service for processing; uploaded files or requests are not kept for long-term storage.

About

Redaction is one of the few PDF operations where doing it badly can have real consequences, and the gap between what looks redacted and what actually is redacted is the source of nearly every public redaction failure that has made the news. The familiar pattern: someone receives a sensitive document, places a black rectangle over the words they want to hide, saves the PDF, and ships it to the world believing the contents are gone. They're not gone. The black rectangle is a graphical object on top of the page; the text underneath it is still part of the PDF's content stream, and anyone who copies the redacted region into another document, runs the PDF through a text extractor, or simply uses Acrobat's selection tool gets the original text back in seconds. This tool does not do that. It removes the text from the underlying content stream as well as drawing the visible black rectangle, so the redacted content cannot be recovered by any of the standard recovery techniques.

The categories of content that genuinely need redaction are surprisingly broad once you start looking. Legal discovery preparation requires redacting privileged communications, attorney work product, and information protected by court orders. FOIA-style government release packages need to redact personal information, certain national security details, and ongoing investigation specifics. Internal documents going to external reviewers (auditors, regulators, prospective acquirers in M&A) need to redact information the recipient isn't entitled to see. Medical records moving between systems need redaction of patient identifiers under HIPAA. Interview transcripts going into journalism need source names removed. Financial statements going public need to remove specific dollar amounts in some sections. Each of these has its own legal regime, but the underlying technical operation — making specific text genuinely disappear — is the same.

The three input methods cover different operational situations. Keyword-based redaction is the right approach when you know exactly what to remove — a specific name, a case number, a project codename — and want every occurrence struck from the document. Regex presets handle common PII categories where the pattern is predictable but the specific values vary: email addresses, phone numbers, social security numbers, credit card numbers, dates of birth. The regex approach catches every email in a document without you having to know which addresses are present, which is invaluable in long discovery documents where the volume of personal information would be impractical to enumerate manually. Rectangle redaction is for cases where the content to remove isn't text-based — a signature image, a logo, a chart that shouldn't be visible — or where the surrounding text needs to remain intact while the rectangle's contents go.

Regex preset libraries are valuable specifically because they encode patterns that are easy to get subtly wrong if you write them yourself. A naive 'phone number' regex might match `(123) 456-7890` but miss `123.456.7890` or `+1-123-456-7890`; a naive 'email' regex might miss addresses with plus-signs, subdomains, or unusual TLDs. The presets here have been tested against real-world content and handle the formatting variations, which means they catch matches that hand-rolled patterns would miss. For categories where the legal exposure of missing a redaction is significant — privacy regulators, court sanctions, professional ethics complaints — using a preset is meaningfully safer than writing a one-off pattern under deadline pressure.

Visual confirmation matters in redaction more than in most operations because the consequences of an error are large and the error mode is silent. The interface here shows the page with redactions applied before the file is finalised, so you can verify that the right content is being removed and nothing additional was caught by mistake. Reviewing the preview before downloading the redacted file is a habit worth picking up; the alternative is shipping a document and discovering after the fact that a regex caught something it shouldn't have, or that a rectangle covered too little to fully obscure the target. The cost of the preview check is seconds; the cost of getting it wrong can be career-defining.

There's a distinction worth understanding between visual redaction and what's sometimes called 'metadata redaction' — removing information embedded in the PDF's metadata fields rather than visible content. PDFs can carry author names, creation dates, software signatures, and even drafting history that doesn't appear on the visible page but can be extracted with metadata-reading tools. Comprehensive redaction handles both: visible content removed from the content stream, and metadata fields cleared from the document properties. This tool addresses both layers, but it's worth being explicit about the dual nature of the operation because shipping a 'redacted' file with intact metadata defeats some of the purpose.

Common failure modes in redaction workflows are educational because they show what to watch for. Black rectangles drawn in tools that don't strip the underlying text are the classic case — visible redaction without functional redaction. Text rendered in PDF using outlines rather than actual characters can appear redacted by a regex pattern but actually be unmodified vector strokes that still encode the original letters. Forms with autofill values that show the actual input even after the visible field is redacted. Comments and annotations that contain copies of the redacted content but live in a layer the redaction didn't touch. The path here addresses all of these by working at the content-stream level rather than just drawing visual rectangles, but understanding why simpler approaches fail is the foundation for trusting that a redaction actually worked.

Operationally, the workflow is upload, identify content to redact, preview, finalise, download. Files are processed in temporary storage, links expire quickly, no signup is required, no watermark is added, no per-day quota counts down in the background. Multiple PDFs can run through one after another, useful when redacting a stack of documents for a single discovery production rather than a single one-off file. Most files process in a few seconds; large multi-hundred-page documents with many regex matches take proportionally longer but still complete in a single pass. The output is a flattened PDF with no hidden layers — the redacted content is genuinely gone from the file, not just covered up.

There's a final operational note worth emphasising for anyone using this for legal or compliance work. The redacted PDF is a derivative document; the original PDF still exists wherever you stored it. Best practice in regulated environments is to keep the original in a controlled location with appropriate access logging, generate the redacted version on-demand for distribution, and never rely on the redacted file as the canonical record. This separates the 'master copy that contains everything' from the 'distribution copy that contains only what's authorised', and it ensures that an internal investigation or regulatory inquiry can always recover the full context without having to reverse-engineer the redactions. The tool produces the distribution copy; the original stays under your existing controls.

A small but practical addition: redaction-by-rectangle is occasionally needed in places where keyword and regex approaches don't fit cleanly. Hand-drawn signatures, employee photos in HR documents, building floor plans with sensitive equipment locations, scientific data charts where specific points need obscuring, contract clauses where the formatting makes regex matching unreliable. The rectangle interface for these cases is direct — click and drag on the page preview to define the area, repeat for additional areas, finalise. Combined with keyword-based redaction in the same session, the two approaches cover essentially every redaction need that comes up in practice without requiring a switch between tools.

Use cases

  • Publish procurement documents with PII removed.
  • Share legal exhibits with client names redacted.
  • Anonymise bank statements before sharing with a lender.
  • Strip phone numbers from a contact-heavy PDF before uploading.

How it works

  1. 1Upload a PDF with selectable text (OCR first if scanned).
  2. 2Pick presets and/or enter keywords, one per line.
  3. 3Apply and download — redactions are permanent.

FAQ

Is the underlying text really gone?

Yes. Redactions are applied with PyMuPDF's apply_redactions, which removes text from the page content stream — not just covers it with a box.

Can I undo redactions?

No. Keep the original file safe. The redacted download is a separate file without the removed text.

Does it work on scanned PDFs?

Only if they have a text layer. Run OCR first so the tool knows which regions to redact.

Are files stored?

No. Files are processed transiently and deleted after download.