Cookie consent

This site uses cookies that need consent. Learn more

Skip to content
Facit Data Systems

Metadata – what is it, and why is it a hazard when fulfilling data requests?

Woman working at a computer.
Here we explore the hidden perils of document redaction (data masking, removal or blacking-out) which make an already difficult task so much harder to fulfill.

There are a number of regulations that stipulate when and how information is to be supplied by the data holding body to the data requester.

Two directives that protect people’s rights to data access and personal privacy are the Freedom of Information Act (FOIA) and the General Data Protection Regulation (GDPR). The FOIA came into force in 2005 and provides public access to data held by public bodies; while GDPR was introduced in 2018 and applies to most UK businesses and organisations.

What can you ask for when making an FOIA request?

An FOIA data requester can stipulate how they want to receive information at the time of the request. The requester may ask to see a copy of the information, or to have an opportunity to inspect it. The requester can also express a preference for the information in a particular form, for example, electronic or hard copy. If the preference is for an electronic copy, the requester’s preference can extend to a particular electronic format.

People are able to request any information a public body holds. That means any data on printed documents, spreadsheets, images, audio recordings, email communications, or even instant messages sent on work devices can all fall within the scope of an FOIA request. The data-holding body generally has 20 days to respond to FOIA requests.

Preparing information correctly to share with a data requester

The data-holding body is required to redact any personal data that does not refer to the person making the subject access request before it is sent to them, to avoid disclosing information about other people.

Typical items that must be redacted are: social security numbers, taxpayer identification numbers, names of minors, dates of birth, financial account numbers, home addresses, passport numbers, and driver license numbers.

GDPR DSARs, required redaction and risks

Data subject access requests made under GDPR are subject to the same redaction requirements as FOIA requests. The fulfilment deadline is slightly longer, at 30 days, but significantly higher numbers and more complex requests are made to businesses. As recently reported by Facit, there has been a surge in both DSARs and ICO privacy breaches. The consequences of privacy breaches for businesses are very heavy fines and adverse publicity that damages brand reputation.

The hidden risks that make DSAR fulfilment even harder!

In its guidance document ‘How to disclose information safely. Removing personal data from information requests and datasets’, the ICO offers some cautionary insight into the presence of metadata in shared documents that can lead to privacy breaches.

Metadata risks identified by the ICO

“Files rarely contain just the information entered by the author or just what is displayed on the screen. So-called metadata or ‘data about data’ is embedded within the file and can include information such as previous authors, changes made to previous versions, comments or annotations. Photographs taken with smartphones and tablets can contain the GPS coordinates of where the image was taken, time and date or information about the type of device used. Emails contain information about the sender and recipient as well as routing information about how the message was delivered.”

The document’s metadata section concludes: “Publication of such complex file types in their raw form can contain an amount of metadata that may not be appropriate for disclosure.”

Document redacted with marker pen.

12 examples of ‘hidden’ information and metadata that pose privacy risks

Most data or information within a document or dataset will be clearly visible or identifiable, however, the following examples illustrate when this may not be the case in common document types:

1) Reversible PDF redaction: In 2021, the London Borough of Lambeth was involved in a protracted court case after a subject data requester reversed the redaction on child protection documents held by the local authority. The requester (a parent) reversed PDF redaction and identified that members of his own family had made child protection referrals about his children.

2) Hidden by formatting styles: The author when creating a template may have chosen to ‘hide’ certain data by setting the font colour to be the same as the background (e.g., white on white or black on black). While the hidden data would not be disclosed if the document were printed, it remains accessible within a digital copy.

3) Layered content: where pictures or objects have been overlaid or placed over other content.

4) Placed outside the area of display: The author may have placed data at the end or edge of the document, which is outside the normal visible area. For example, EXCEL supports more than 16 thousand columns and 1 million rows of data, and other software programs have off-page pasteboards on which data may not be immediately obvious.

5) Hidden rows and columns: EXCEL includes a function to ‘hide’ rows or columns from view, with a reverse function to ‘unhide’.

6) Hidden worksheets: EXCEL also allows an entire worksheet to be hidden from view.

7) Embedded documents or files: Files and document can be inserted or pasted into documents.

8) Pivot tables: The source data summarised within a Pivot table can be retrieved by double-clicking on the table, even if the original worksheet has been deleted or the Pivot table has been copied into a new workbook.

9) Charts: Charts like Pivot tables can contain an embedded copy of the source data.

10) Functions: Functions such as LOOKUP and VLOOKUP also create and store a cache of the source data which can potentially be retrieved even if copied into a new document.

11) The ‘Track Changes’ feature in WORD: Tracking can be turned on through the Review tab, and marks up and shows any changes that anyone makes to the document, for example, deleted text is retained but displayed as struck-through until approved or rejected.

12) Failed paper redaction: Risks associated with paper redaction include ‘show through’ on the reverse of a document and degradation of the redaction substance.

Lessons learned about privacy redaction metadata risks

The key lesson to draw from the redaction risks listed is that it is extremely risky to share original documents. Even copying documents does not eliminate the risk of copying hidden metadata or ‘invisible’ information.

Facit recommends that after the redaction process has been completed a new copy of the information should be created and checked to ensure all the redacted information has been completely removed, not just masked.