Dus inderdaad niet alles bij elkaar, niet alles leesbaar, gegevens niet zonder meer koppelbaar.
Er zijn technieken die dat allemaal ondersteunen, maar je moet het systeem daarop wel ontwerpen en ook de daarbij behorende veiligheidsmaatregelen nemen die passen bij die techniek.
Samenvatting
The goal is:
-
If one database is stolen → attacker cannot reconstruct a full identity.
-
If multiple databases are stolen → attacker still cannot easily combine records.
-
The company → can reliably retrieve and combine the correct data when authorized.
This is a classic privacy-by-design and pseudonymization architecture problem.
Below is a secure and practical architecture used in finance, healthcare, and government systems.
Core Principle: Never Use a Real Identifier as the Link
Do NOT use:
-
Name
-
Passport number
-
SSN
-
Bank account
-
Email
as a cross-database key.
Instead use:
Random, non-derivable internal identifiers
Recommended Architecture: Tokenized Identity Vault Model
Step 1 — Generate a Random Master Client ID
When a client is created:
-
Generate a cryptographically secure random ID
Example: 256-bit random value (UUIDv4 is OK, but longer is better)
-
This becomes the Internal Master ID (IMID)
This ID:
Step 2 — Physically Separate Data by Sensitivity
Example structure:
Database A — Identity Data
-
IMID
-
Name
-
Date of birth
-
Address
Database B — Government Documents
-
DocToken (NOT IMID)
-
Passport number
-
Passport photo
-
SSN
Database C — Financial Data
-
FinanceToken (NOT IMID)
-
Bank account number
-
IBAN
-
Payment history
Critical Security Layer: Use Different Tokens Per Database
Instead of storing IMID everywhere:
For each database:
Token = HMAC(IMID, SecretKeyForThatDatabase)
Each database has:
-
Its own secret key
-
Its own token format
So:
-
DB A uses Token_A
-
DB B uses Token_B
-
DB C uses Token_C
If DB A is stolen:
-
Attacker only sees Token_A
-
Cannot compute Token_B or Token_C
-
Cannot correlate records across databases
Even if DB A + DB B are stolen:
Add Field-Level Encryption
For highly sensitive fields:
-
Passport number
-
SSN
-
Bank account number
Use:
-
Strong symmetric encryption (AES-256-GCM)
-
Separate encryption keys per data category
-
Keys stored in a Hardware Security Module (HSM) or key vault
Never store encryption keys in the same database.
Introduce a Secure Identity Broker Service
Only one internal service is allowed to:
Other services:
This creates a single controlled correlation point.
Even Better: Zero-Trust Segmentation
Each database should:
-
Be on a different network segment
-
Have separate credentials
-
Use different DB engines if possible
-
Be hosted on different machines / clusters
-
Have strict access control lists
An attacker compromising one server should not automatically reach others.
Additional Hardening
Salted Hash Indexing
For searchable fields (like passport number):
Access Logging + Monitoring
Log:
Rate limiting
Prevent mass extraction.
What Happens in a Breach?
If one database is stolen:
If two databases are stolen:
If application server is stolen:
- Without HSM / key vault → encrypted data useless.
What You Should NOT Do
Don’t just “split tables” in one database.
Don’t use the same client ID everywhere.
Don’t rely only on encryption at rest.
Don’t derive tokens without secret keys.
Don’t store mapping tables in plain form.
1
Real-World Pattern Name
This architecture is often called:
-
Tokenization architecture
-
Pseudonymization with keyed derivation
-
Data vault with identity broker
-
Privacy-preserving data segmentation
It’s used by:
1
If You Want Maximum Protection
You can go one step further:
Use format-preserving encryption (FPE) or
Use secure multi-party computation for highly sensitive analytics
But for most companies, the tokenized vault model is sufficient.
1
Summary Architecture Diagram (Conceptual)
[ Identity Broker ]
|
-------------------------------------
| | |
DB A DB B DB C
(Identity) (Government Docs) (Financial)
Token_A Token_B Token_C
Only the broker knows how to convert:
IMID → Token_A / Token_B / Token_C
Below is a GDPR-compliant data architecture blueprint tailored for a telecom / ISP provider handling:
This aligns with the General Data Protection Regulation (GDPR) and telecom-specific rules like the **ePrivacy Directive.
Regulatory Reality for Telecoms
Telecoms are high-risk under GDPR because they process:
-
Direct identifiers (name, address)
-
Government IDs
-
Payment data
-
Location data
-
Traffic metadata (who contacted whom, when, IP logs)
-
Sometimes lawful interception data
Under GDPR this typically requires:
-
Art. 6 lawful basis (contract, legal obligation)
-
Art. 9 safeguards (if special categories apply)
-
Art. 25 data protection by design
-
Art. 32 security of processing
-
DPIA (Art. 35) mandatory in most telecom cases
High-Level GDPR-Compliant Architecture
Core Principle:
Separate identification, service data, and traffic data so no single breach reveals a full subscriber profile.
Data Domain Segmentation Model
Domain 1 — Identity Vault (Highest Sensitivity)
Contains:
-
Legal name
-
Address
-
Date of birth
-
Passport / ID number
-
SSN (if collected)
Key design:
-
Random Master Subscriber ID (MSID)
-
Encrypted fields (AES-GCM)
-
Separate key vault (HSM-backed)
-
Extremely restricted access
Only:
-
KYC service
-
Legal compliance team
No billing system direct access.
Domain 2 — Billing & Financial
Contains:
Uses:
BillingToken = HMAC(MSID, BillingSecret)
No names stored here.
Domain 3 — Network Provisioning
Contains:
-
SIM number (ICCID)
-
IMSI
-
Device identifiers
-
IP assignments
-
Service plan
Uses:
NetworkToken = HMAC(MSID, NetworkSecret)
No direct identity data.
Domain 4 — Traffic & Metadata (Extremely Sensitive)
Contains:
Uses:
TrafficToken = HMAC(MSID, TrafficSecret)
Traffic database:
-
Physically isolated
-
Short retention window
-
Strict audit trail
This separation is critical under the ePrivacy Directive.
Identity Broker Service (Controlled Correlation Point)
Only one internal service may:
All correlation:
-
Logged
-
Justified
-
Audited
-
Rate limited
This satisfies:
Data Minimization & Purpose Limitation
For telecom:
Example:
Provisioning system:
- Does NOT need passport number.
Traffic system:
Billing:
- Does NOT need location logs.
Strict API-based access:
Each system only receives:
- The minimum data required.
Retention Segmentation
Telecom providers must separate retention clocks.
Example:
| Data Type |
Retention |
| Billing records |
7–10 years (tax law) |
| Traffic metadata |
6–12 months (national law dependent) |
| KYC documents |
Duration of contract + X years |
| Marketing consent |
Until withdrawn |
Automate deletion by domain.
Do not centralize retention logic.
Encryption & Key Strategy
-
Separate encryption keys per domain
-
Keys stored in HSM or managed key service
-
Key rotation policy
-
Key access logging
-
No key material in app servers
If database is stolen:
Access Control Model (Telecom-Specific)
Use:
-
Attribute-Based Access Control (ABAC)
-
Role-Based Access Control (RBAC)
-
Just-in-time privilege escalation
-
Strong MFA for sensitive systems
Example:
-
Customer service → limited identity view
-
Finance team → billing domain only
-
Network engineers → provisioning only
-
Security team → traffic only (logged)
Handling Lawful Interception (Very Sensitive)
Telecoms must sometimes support lawful intercept.
Best practice:
-
Dedicated lawful intercept gateway
-
Separate legal authorization workflow
-
No direct database querying
-
All requests digitally signed & archived
Do NOT allow engineers to manually combine identity + traffic data.
Data Subject Rights Architecture
Under GDPR Art. 15–22:
You must support:
Your Identity Broker should:
-
Retrieve domain data
-
Assemble report
-
Log disclosure event
-
Apply redaction where required
Never expose raw database structure.

Breach Impact Reduction Strategy
If:
Identity vault breached:
No traffic or financial linkage possible.
Traffic DB breached:
Attacker sees pseudonymous tokens only.
Billing breached:
No names.
App server breached:
Encrypted fields + no token secrets.
This dramatically reduces GDPR fine exposure by showing:
-
Privacy by design
-
Technical safeguards
-
Compartmentalization
Supervisory authorities evaluate this positively.

Additional Telecom Safeguards
Pseudonymized analytics
Analytics must use traffic tokens only.
Differential privacy for statistics
SIEM monitoring across domains
DPIA documentation
Mandatory for telecom.
Vendor segmentation
If cloud hosted:
-
Separate VPC per domain
-
No shared credentials

Organisational Layer (Often Overlooked)
Technical compliance alone is insufficient.
You also need:
-
Appointed DPO (usually mandatory for telecoms)
-
Processing register (Art. 30)
-
Vendor DPAs
-
Incident response plan (72-hour breach rule)
-
Regular penetration tests

Conceptual GDPR-Compliant Layout
[ Identity Broker ]
|
------------------------------------------------
| | | |
Identity Vault Billing DB Network DB Traffic DB
(MSID) (BillingTok) (NetworkTok) (TrafficTok)
Each token derived with separate secret keys.

Key GDPR Articles You Satisfy With This Design
-
Art. 5 — Data minimization
-
Art. 6 — Lawful basis
-
Art. 25 — Privacy by design
-
Art. 30 — Records of processing
-
Art. 32 — Security of processing
-
Art. 35 — DPIA
-
Art. 33 — Breach notification readiness
Final Advice
For a telecom/ISP:
If you only encrypt but do not compartmentalize → high regulatory risk.
If you compartmentalize but don’t control correlation → still high risk.
If you centralize everything in one “customer 360” database → very high risk.
The combination of:
is the safest GDPR-aligned structure.