Skip to content

ADR-002: Security, Privacy, and Pseudonymization Strategy

Status

Accepted

Context

FintraOS handles highly sensitive financial data. We must adhere to strict global privacy standards (GDPR, CCPA) while enabling persistent storage for AI training. Key constraints: - "Store Everything": We need long-term history for models. - Right to be Forgotten: We must be able to "delete" a user if requested. - Identity Agnostic: We care about behaviour (transactions), not legal identity (names/addresses). - Multi-Tenant: Data must be strictly segregated.

Decision

1. Identity Model: The "Financial Passport"

  • Decision: FintraOS will store core identity attributes (Name, Surname, DOB, Email, Phone, Country, National ID) to enable cross-platform portability.
  • Rationale: To fulfill the promise of "Bring your financial history with you," we must reliably match users across different applications (Tenants). Hashing alone is insufficient for high-confidence matching in a global context.
  • Implementation:
  • PII Storage: Identity fields are stored in a dedicated, strictly access-controlled IdentityStore.
  • Encryption: All PII is encrypted at rest using the Tenant's Key (for isolation) AND the User's Key (for deletion).
  • Usage: This data is only used for resolution and matching. It is never exposed to downstream models (Brain/Forecast) which operate on pseudonymised IDs.

2. Deletion Strategy: Crypto-Shredding

  • Problem: "Hard deleting" events from an immutable Event Log (Kafka/EventStore) is technically impossible or extremely expensive.
  • Solution: Crypto-Shredding.
  • Every Profile has a unique Data Encryption Key (DEK) stored in FintraOS Guard (Vault/KMS).
  • All sensitive fields (transaction amounts, balances, merchant names) in the Event Store are encrypted with this DEK.
  • To "Delete" a User: We simply delete their DEK from the Key Management System.
  • Result: The historical data remains in the log but becomes mathematically unreadable garbage. This satisfies GDPR "Right to Erasure."

3. Cross-Tenant Identity Resolution (Definite Data Strategy)

  • Problem: A single human user may use multiple apps (Tenants) powered by FintraOS. We need to link their profiles to build a richer model, without storing a central "Master User Record" of PII.
  • Solution: Definite Data Matching. We prioritize strong, legally verified identifiers over probabilistic signals.
  • Strong Match (Auto-Merge): Exact match on National ID / Passport Number + Country.
  • Medium Match (Prompt User): Exact match on Full Name + DOB + Email. This triggers a "Found existing profile" prompt.
  • Weak Match (Ignored): Partial matches (e.g. Phone only) are ignored to prevent false positives (family plans, recycled numbers).
  • Rationale: Financial data merging requires 100% certainty. Best-guess matching is unacceptable for credit or wealth use cases.

4. Feedback Loop & Privacy

  • Decision: User corrections (e.g., categorising a transaction) are logged for manual review to improve the global model.
  • Privacy Control: Before any data enters the "Global Training Set," it is stripped of TenantID and ProfileID. It becomes a generic feature vector: [Amount: -5.00, Merchant: "Tesco", Time: 12:00] -> Label: "Groceries".

5. AI & LLM Governance (Context Scoping)

  • Problem: Generative AI models (LLMs) can "hallucinate" or leak data if not strictly bounded.
  • Decision: AI Context Scoping.
  • Implementation:
  • RAG Enforcement: The vector database (Qdrant) is strictly partitioned by TenantID and ProfileID.
  • Query Injection: Module-Guard automatically injects filter predicates into every LLM retrieval request.
  • Output Scanning: All LLM responses are scanned for PII before being returned to the client.
  • Rationale: Prevents "Cross-Tenant Leakage" (User A asking about User B's data) and ensures compliance with data residency laws.

Consequences

  • Performance: Every read/write requires a call to the KMS to fetch keys. We need aggressive caching of DEKs in memory (with strict TTL).
  • Search: We cannot easily search encrypted fields (e.g., "Find all users who shopped at Tesco"). We will need a separate, ephemeral "Search Index" (Elasticsearch) that can be rebuilt or purged easily.

Related: Dashboard | [[ADR-001]] | Module-Guard