09/04/2025

By

guldmann

Right To Be Forgotten (RTBF)

The Right to Be Forgotten (RTBF), mandated by regulations like GDPR and CCPA, requires organizations to permanently delete personally identifiable information (PII) upon request within a defined timeframe.

RTBF in a data lake poses a unique set of challenges, while A data lake is traditionally designed to store all data indefinitely; acting as an immutable storage layer where raw, unaltered data is kept for future analysis, contradicts RTBF, where some data under certain conditions must be deleted permanently, and not just made invisible.

Personally Identifiable Information

PII (Personally Identifiable Information) refers to any data that can be used to uniquely identify an individual. It includes direct identifiers (like names or Social Security numbers) and indirect identifiers (like IP addresses or location data) that, when combined, can reveal a person’s identity.

Sensitive PII, Direct Identifiers

Is information that explicitly identify a person, and if exposed could pose a risk for identity theft or fraud.

Full name
Social Security Number (SSN) / National Identification Number like the Danish CPR
Passport number
Driver’s license number
Credit card number
Personal phone number
Personal email address
Home address
Biometric data (fingerprints, retina scans, facial recognition)

Quasi PII, Indirect Identifiers

Is information which not individually poses a risk, but when combined could uniquely identify a person.

Date of birth
ZIP code
IP address
Device identifiers (IMEI, MAC address)
Browsing history
Geolocation data
Employment details

PII vs. Sensitive Personal Data

PII is often mixed up with Sensitive Personal Data, which has a much stricter regulation. Under GDPR, sensitive data has stricter processing rules than general PII.

Racial/ethnic origin
Political opinions
Religious beliefs
Health records
Sexual orientation
Genetic/biometric data (if used for identification)

RTBF Conditions

Now this is the tricky part, while some of it quickly turns into a judgement scenario, when is it that RTBF must be honored.

The data is no longer necessary for the original purpose it was collected.
The individual withdraws consent, and there is no other legal basis to retain it.
The individual objects to processing, and there are no overriding legitimate grounds.
The data was processed unlawfully (e.g., collected without proper consent).
The data must be erased due to a legal obligation (e.g., regulatory requirements).

But its also important to understand when RTBF is not to be honored.

Legal or regulatory compliance requires retention (e.g., financial, tax, health records).
The data is needed for public interest (e.g., research, scientific studies).
The data is required for the exercise of free expression or journalism.
It is necessary for legal claims (e.g., defending against lawsuits).
The request is excessive or unreasonable (e.g., repeated requests without justification).

There is certain deadlines to be met, in order to be compliant.

GDPR: Must process the request within one month (can extend to 3 months in complex cases).
CCPA: Must process the request within 45 days (can extend by another 45 days if necessary).

Data ingestion and segregation

The first step is the implement Segregation in the ingestion layers of different types of data (e.g., PII Sensitive PII, Quasi PII, Sensitive, and non-PII Data Elements) to enforce security, compliance, and governance. These Four distinct layers should be linkable using a surrogate key, allowing downstream systems to consume the fully combined dataset or, in RTBF scenarios, access only the anonymized portions of the data. By clustering PII, Sensitive PII, Quasi-PII, and Sensitive data into separate layers, we can delete data from these layers without jeopardizing analytical processes, as the anonymized portions of the data remain intact. It also enables the enforcement of special access controls on sensitive data, preventing it from being included in general information modules.

Deletion of Data based on a specialized rule engine, enabling us to stay compliant in the Bronze Layer.
Pseudonymization, Tokenized Data for Privacy + Retention, Instead of storing raw PII, store tokens or hashed values in the data lake. Example: Store a hashed email, the benefits is to achieve RTBF compliance without full deletion or publication of data in order for Analytics still being able to use anonymized data.
Assignment of surrogate key, make data relatable across the 5 different data layers.
Governance enabling us to enforce Table & Column-Level Security (Databricks Unity Catalog / Table ACLs)
Segregation, splitting our raw data into 5 buckets, depending their Sensitivity information or PII classification.

Share

About the blog

RAW is a WordPress blog theme design inspired by the Brutalist concepts from the homonymous Architectural movement.

Get updated

Subscribe to our newsletter and receive our very latest news.

The Golden Hour

Right To Be Forgotten (RTBF)

Personally Identifiable Information

Sensitive PII, Direct Identifiers

Quasi PII, Indirect Identifiers

PII vs. Sensitive Personal Data

RTBF Conditions

Data ingestion and segregation

Like this:

Leave a ReplyCancel reply

Share

About the blog

Get updated

Thank you for your response. ✨

Right To Be Forgotten (RTBF)

Personally Identifiable Information

Sensitive PII, Direct Identifiers

Quasi PII, Indirect Identifiers

PII vs. Sensitive Personal Data

RTBF Conditions

Data ingestion and segregation

Share this:

Like this:

Leave a ReplyCancel reply

Share

About the blog

Get updated

Thank you for your response. ✨

Discover more from The Golden Hour