Document Redaction: How to Redact a PDF
In an age when information is both a valuable asset and a potential liability, protecting sensitive data has become more critical than ever. Document redaction - the process of permanently removing sensitive information from documents - is a key strategy to ensure data security and privacy.
In this Redaction Guide, we explore what document redaction is, why it’s essential and how to perform it effectively, with practical tips about Adobe functions and tools like Facit’s redaction software.
Redaction is required to prevent data breaches in any media, including video and audio, however in this guide we focus on PDF and common business documents’ redaction.
What is Document Redaction?
Document redaction refers to the process of concealing or permanently removing confidential or sensitive information from documents before sharing, storing or publishing them. Redaction is commonly applied to text, images and metadata that might inadvertently expose sensitive information.
For instance, in a legal contract you may need to redact specific client names, or in a medical report you might require the removal of patient identifiers. Effective redaction ensures that the information cannot be recovered or viewed by unauthorised individuals.
Why Would You Need to Redact a Document?
The need for document redaction arises in numerous fields, driven by privacy concerns, legal obligations and compliance with regulations such as GDPR, HIPAA and CCPA. Here are some common scenarios:
Legal Documents
Legal professionals often deal with sensitive contracts, evidence and case files. Document redaction helps protect client confidentiality by obscuring names, addresses or case details that shouldn’t be disclosed to opposing parties or the public.
Medical Records
Healthcare providers handle medical records containing personal health information (PHI). Redacting patient identifiers, such as Social Security numbers or contact details ensures compliance with HIPAA and other privacy laws while sharing medical data for research or litigation.
Government Documents
Government agencies redact sensitive information from public records to protect national security, citizen privacy or ongoing investigations. Examples include redacting addresses, military details or classified intelligence.
Sensitive Business Information
Companies frequently share documents with external partners, vendors, clients or individuals such as data access requestors. To protect individuals, intellectual property or trade secrets, businesses redact personal data, financial data, proprietary methods and internal communications.
All companies hold data that must be protected and that is potentially at risk of being breached, notably when it leaves the company’s secure IT environment and is shared with third parties.
It is when businesses share information with third parties, that accidents can happen, when documents are not reliably redacted, and sensitive data is most at risk of being viewed by unauthorised people.
As Subject Access Requests (SARs) have been on the rise for several years, SARs present a notable time when redaction should be applied to protect the personal data of all but the subject of interest.
What is a PDF?
PDF stands for Portable Document Format. It's a file format that enables users to share and print documents that can't be modified. PDFs are a versatile way to present and exchange documents across different devices and operating systems, and are among the most commonly used formats for sharing information and supply documents.
Accessibility of PDFs
PDFs can be created to meet accessibility standards, which makes them more usable for people with disabilities.
Archiving PDFs
PDFs can be used to archive important documents because they preserve the content, structure and appearance of the document over time.
PDF Extended Graphic Design Capabilities
PDFs are a standard file format for sending print-ready documents to clients or printing services.
Using PDFs for Data Collection
PDFs can be used to collect data from users, such as in online applications, feedback forms and customer surveys.
How Do You Redact a Document?
Understanding how to redact a document effectively is crucial to safeguarding sensitive information. The process varies depending on the format of the document, with PDFs being one of the most commonly redacted file types. Let’s break down the steps:
Steps to Redact a PDF
Open the PDF in a Redaction Tool
Use software like Adobe Acrobat, Facit’s redaction tool or other dedicated redaction software to access the file.Select the Content to Redact
Identify the text, images or areas that need to be hidden. Most tools provide a selection tool to mark these regions.Apply the Redaction
Once marked, apply the redaction to remove the content permanently.Review and Save
Double-check the document to ensure all sensitive information is removed. Save the file securely and ensure that the redaction cannot be reversed.
Note
When redacting PDFs or other document formats in order to share information, do not redact source files or the original document, as this may be needed in full at a later date. Redact a copy of the original document to share with third parties.
The invention of the PDF and Classic Adobe Redaction
The Portable Document Format (PDF) was invented by Dr. John Warnock, co-founder of Adobe, as part of Adobe's "Camelot Project" in the early 1990s.
Warnock's goal was to enable users to capture documents from any application, send them electronically and view and print them on any device.
The first version of Adobe Acrobat, the program that could read PDFs, was released in 1993.
Subsequently the PDF made it easier to share documents across computer systems, which helped make the paperless office a potential reality.
The PDF is now a trusted file format used by businesses around the world, and we should therefore refer to Adobe’s own advice on PDF redaction.
Adobe Redaction Tools
With Adobe Acrobat Pro, you can redact documents to remove account numbers, home addresses and other personal details in digital form. Adobe says:
Redacting a PDF with Adobe’s redaction software does more than simply black out sensitive information. That’s because PDFs contain ‘metadata’, which is an extra layer of data beyond simple text and images.
Metadata can include: Name of the author of the document; document description; keywords; and dates and times of creation or modification.
PDFs can also contain formatting elements such as JavaScript that change how the document looks and is presented. You might want to keep every bit of this information private.
Deleting Text and Images
Simply hiding content in a PDF by covering it with a black box is not sufficient. True redaction involves removing the underlying data. Use software that ensures the content is fully deleted and cannot be uncovered by manipulating the document.
While this is true of hiding content in PDFs, it is also true of hiding content completely in other document formats, including popular formats such as MS Word and Excel documents, and unstructured data formats such as emails.
Data must be irreversibly removed from documents to assure privacy compliance. While Adobe’s redaction tools may be effective for redacting PDFs, they are not suitable for redacting all document formats.
Sanitising Hidden Content
Metadata, comments and revision histories can often contain sensitive information. Use redaction tools to sanitise the document by removing hidden content. For instance, Facit’s redaction software includes a feature to clean metadata and ensure complete security.
Common Challenges with Document Redaction
While the process seems straightforward, there are pitfalls that can compromise your efforts. Here are some common challenges:
Hidden Information Left Behind
Redacted documents may still contain sensitive information in metadata or hidden layers. Failing to sanitise these elements can expose critical data. See the Appendix to this article on reversible metadata in PDFs and other document formats.
Inconsistent Redaction Processes
Manually redacting documents like PDFs can lead to inconsistencies, where some sensitive data remains visible. Standardising the redaction process with automated tools helps to maintain thoroughness.
Manual Redaction Errors
Human error is a significant risk when redacting documents manually. Overlooking a single instance of sensitive information can result in legal or reputational damage.
How Facit’s Redaction Software Can Help
Facit’s redaction software is designed to streamline the redaction process by providing reliable and efficient solutions for safeguarding your documents, including PDFs.
Improve Accuracy and Efficiency
Facit’s software automates the redaction process, reduces the risk of human error and ensures consistency. It’s faster and more reliable than manual methods and competing redaction tools, even for large volumes of documents.
Redact Text, Images and Hidden Content
Facit’s redaction tool enables users to redact visible text and images while also sanitising hidden content like metadata, annotations and revision histories, to ensure comprehensive security.
Customisable Redaction Features
Facit’s redaction software offers customisable options, allowing you to tailor the redaction process to your specific needs. This includes bulk redactions, pattern-based redactions (e.g., searching for Social Security numbers or vehicle registration numbers) and other user-defined parameters.
Secure and Compliant Workflows
Designed with security in mind, Facit’s software ensures compliance with data privacy regulations. It creates audit trails and secure workflows to meet the strictest standards.
Try Facit’s Redaction Software Today
Benefits of Using Automated Redaction Tools
Automated redaction tools like Facit’s provide significant advantages over manual methods. Here are some key benefits:
Saves Time and Reduces Errors
Automation significantly speeds up the redaction process, especially for large datasets. Automation also minimises the chance of human error and ensures that critical information is not accidentally left visible.
Ensures Sensitive Data is Fully Removed
Automated tools not only hide but completely remove sensitive information, which makes it impossible to recover the redacted content.
Simplifies Compliance with Data Privacy Regulations
Redaction tools help organisations comply with data privacy laws by providing secure, audit-ready solutions. They streamline adherence to GDPR, HIPAA and similar regulations.
Final Thoughts: Redaction, PDFs and All Document Formats
Protect Your Sensitive Information Today
Document redaction is a non-negotiable practice in today’s data-driven world. By understanding the importance of effective redaction and leveraging the right tools, organisations can protect their sensitive information and maintain trust.
Explore Facit’s Redaction Solutions
Facit’s redaction software simplifies and secures the redaction process, making it accessible to businesses of all sizes. Whether you’re handling PDFs, commercial business data, legal contracts, medical records or internal reports, Facit ensures your data is safe.
Get Started with Facit Redaction Software
Fast, Reliable Document Redaction Software
Related Article
Appendix: FOIA, SARs and Metadata Risks
Apart from GDPR and HIPAA, there are a number of regulations that mandate how information is to be supplied to people requesting data.
The Freedom of Information Act (FOIA), for example, came into force in 2005 and to provide access to data held by public bodies.
Making an FOIA Request
An FOIA data requester can stipulate how they want to receive information and can request to see a copy of the information, or request an opportunity to inspect the original. The requester can specify a preference for an electronic format, or a particular format such as PDF, or hard copy.
People are able to request any information a public body holds in a FOIA request, in any format: print, spreadsheets, images, audio recordings, email communications, even instant messages sent on work devices. The data-holding public entity generally has 20 days to respond to FOIA requests.
Preparing Information Correctly to Share with a Data Requester
Whether under an FOIA request or GDPR Subject Access Request, the data-holding body is required to redact personal data of anyone who is not the subject of interest.
Typical items that must be redacted are: social security numbers, taxpayer identification numbers, names of minors, dates of birth, financial account numbers, home addresses, passport numbers and vehicle registration details.
However, when preparing to release sensitive information, be aware that personally identifying information extends to distinguishing marks, name badges, location signs and information on computer screens, all of which are easy to miss when redacting business documents, PDFs, images, emails or video footage.
Information and Data Requests Volumes Rising
Subject Access Requests made under GDPR have the same redaction requirements as FOIA requests, though the fulfilment deadline is longer, at 30 days.
A significantly higher numbers of requests are made to businesses, and there has been a surge in SARs in recent years as the public has become more informed about privacy rights.
The consequences of privacy breaches for businesses are fines, adverse publicity and reputational damage, hence redaction of documents such as PDFs and spreadsheets, and of sensitive information in communications such as emails, is vital.
The Hidden Risks that make SAR fulfilment even harder!
In its guidance document ‘How to disclose information safely. Removing personal data from information requests and datasets’, the Information Commissioner’s Office (ICO) provides cautionary insight into removing personal data from information requests and datasets.
Metadata Risks Identified by the ICO
“Files rarely contain just the information entered by the author or just what is displayed on the screen. So-called metadata or ‘data about data’ is embedded within the file and can include information such as previous authors, changes made to previous versions, comments or annotations. Photographs taken with smartphones and tablets can contain the GPS coordinates of where the image was taken, time and date or information about the type of device used. Emails contain information about the sender and recipient as well as routing information about how the message was delivered.”
Be aware that any electronic file format, including PDFs, spreadsheets and images, will contain metadata that may not be obvious to the ‘naked eye’ but can be read by people with a little technical knowledge or decoded by several means.
12 Examples of ‘Hidden’ Information That Pose Privacy Risks
The following examples illustrate how not all data is clearly visible in common document types:
Reversible PDF Redaction
In 2021, the London Borough of Lambeth was involved in a court case after a subject data requester reversed the redaction on child protection documents held by the local authority. A parent was able to reverse PDF redaction and identify members of his family who had made child protection referrals about his children.Formatting Style Risks
When creating a document template, an author can ‘hide’ data by setting the font colour to be the same as the background (e.g., white on white). While hidden data would not be evident in a printed copy, it remains accessible in a digital copy.Layered Content Risks
Where pictures or objects have been overlaid or placed over other content.Placed Outside the Area of Display
The author can place data at the edge of a document, outside the visible area. For example, spreadsheet software supports thousands of columns and a million rows of data. Other programs have off-page pasteboards on which data is not immediately obvious.Hidden Rows and Columns
MS Excel includes a function to ‘hide’ rows or columns from view, and an ‘unhide’ option to reveal data.Hidden Worksheets
MS Excel also allows an entire worksheet to be hidden from view.Embedded Documents or Files
Files and documents can be inserted or pasted into other documents, which can create privacy protection issues. For example, you can add documents to a PDF by attaching or embedding them. Similarly, PDF files can be embedded in other files, such as MS Word documents. In both cases, to protect data, the containing file and the embedded file may require redaction.Pivot Tables
The source data summarised within a Pivot table can be retrieved by double-clicking on the table, even if the original worksheet has been deleted or the Pivot table has been copied into a new workbook.Charts
Charts, like Pivot tables, can contain an embedded copy of the source data.Functions
Functions such as LOOKUP and VLOOKUP create and store a cache of the source data which can potentially be retrieved even if copied into a new document.The ‘Track Changes’ feature in MS Word
Tracking can be turned on through the Review tab in MS Word, which marks up and shows any changes that anyone makes to the document. For example, deleted text is struck-through nut is displayed until approved or rejected.Failed Paper Redaction
Risks associated with paper redaction include ‘show through’ on the reverse of a document and degradation of the redaction substance.
Lessons Learned about Privacy Redaction Risks
The key lesson to draw from this list of redaction hazards is that it is risky to share original documents. Even copying documents does not eliminate the risk of copying hidden metadata or ‘invisible’ information.
Using Facit’s redaction tools sensitive data is completely removed, not just masked.