Consent & Legal Basis Engine

Phase 1 established what data the enterprise holds. Phase 2 establishes whether the enterprise has the right to use it. Every processing activity — storage, transformation, analytics, model training, third-party sharing — must be traceable to a lawful basis. Under GDPR, that basis is one of six enumerated grounds (Article 6). Under the DPDP Act, it is either consent or a legitimate use (Section 4 and Section 7). Under CCPA/CPRA, it is disclosure with an opt-out right. Without a consent and legal basis engine, every downstream governance control operates on a foundation of unverified legality.

The Four Components

A consent and legal basis architecture comprises four distinct systems. Each has a single responsibility. No system carries another system's function.

┌────────────────────────────────────────┐
│  1. CONSENT MANAGEMENT SYSTEM          │
│     Source of truth for consent state   │
├────────────────────────────────────────┤
│  2. PURPOSE TAXONOMY                   │
│     Canonical register of all purposes  │
├────────────────────────────────────────┤
│  3. LEGAL BASIS MAPPING                │
│     Activity → basis per jurisdiction   │
├────────────────────────────────────────┤
│  4. CONSENT GATE                       │
│     Stateless enforcement at pipeline   │
│     boundary                            │
└────────────────────────────────────────┘

Component 1: Consent Management System

The Consent Management System (CMS) is the single source of truth for every consent decision made by every Data Principal (the DPDP Act's term for the individual whose data is processed). It is not a cookie banner. It is a transactional system that records, versions, and serves consent state.

For each Data Principal, the CMS stores:

CONSENT RECORD:
  principal_id:      user_12345
  consent_version:   v3.2
  notice_version:    notice_2025_03
  jurisdiction:      IN
  purposes_granted:  [service_delivery, 
                      service_improvement,
                      marketing_email]
  purposes_denied:   [third_party_sharing,
                      model_training]
  collection_method: explicit_opt_in
  timestamp_granted: 2025-03-01T10:23:00Z
  timestamp_updated: 2025-06-15T14:07:00Z
  withdrawal_log:    [{purpose: marketing_email,
                       withdrawn: 2025-06-15T14:07:00Z}]

Critical design properties of the CMS:

Versioned. Every consent interaction is immutably logged. When a user modifies their preferences, the prior version is retained. This is non-negotiable for audit — a regulator may ask what consent was in effect on a specific date for a specific user.

Notice-linked. Consent is valid only in the context of the notice that was displayed when consent was collected. If the notice changes (new purposes added, language modified), prior consent does not automatically extend to the new notice. The DPDP Rules require that consent be based on a notice that is understandable without reference to any other document. The CMS must link every consent record to the exact notice version the user saw.

Granular. Consent is recorded per purpose, not as a single binary. A user may consent to service delivery but deny model training. The system must store, serve, and enforce this granularity. Bundled consent — "agree to all or use nothing" — is non-compliant under both GDPR (recital 43) and DPDP Act (Section 6, which requires specific and informed consent).

Queryable in real time. Downstream systems — pipelines, feature stores, model training jobs — must be able to query a user's consent state at the point of data consumption. This requires sub-100ms query latency on the CMS, which means it must be backed by a low-latency datastore (Redis, DynamoDB, or equivalent), not a relational database used for batch reporting.

Component 2: Purpose Taxonomy

A Purpose Taxonomy is a canonical, machine-readable register of every reason the enterprise processes personal data. It is not a free-text field. It is an enumerated set, centrally governed, and referenced by every consent record, every processing activity, and every pipeline.

Example taxonomy for a financial services enterprise:

PURPOSE TAXONOMY:
  service_delivery       → Core product functionality
  service_improvement    → Bug fixes, UX optimization
  personalization        → Content and offer targeting
  analytics_aggregate    → Non-individual statistical analysis
  analytics_individual   → Individual-level behavioral analysis
  model_training         → ML/AI model training
  model_inference        → Real-time model predictions
  fraud_detection        → Transaction monitoring, AML/CTF
  marketing_email        → Promotional email communication
  marketing_push         → Push notification marketing
  third_party_sharing    → Data shared with external partners
  regulatory_reporting   → Mandated regulatory submissions
  legal_obligation       → Processing required by law

Each purpose code is immutable once published. If a purpose changes scope, a new code is created and the old code is deprecated — never modified. This ensures that historical consent records referencing the old code remain accurate.

The taxonomy is the contract between the legal team (who defines what each purpose means), the product team (who decides which purposes to present to users), and the engineering team (who enforces purpose restrictions in pipelines). Without a shared taxonomy, consent is ambiguous and enforcement is impossible.

Component 3: Legal Basis Mapping

Not all data processing requires consent. GDPR provides six legal bases: consent, contract, legal obligation, vital interests, public task, and legitimate interests. DPDP Act provides two primary bases: consent (Section 6) and legitimate uses (Section 7, which covers state functions, legal obligations, medical emergencies, and employment). CCPA/CPRA operates on a disclosure-plus-opt-out model rather than affirmative consent for most processing.

A Legal Basis Mapping connects every processing activity (from the Phase 1 Processing Activity Register) to its lawful basis, per jurisdiction:

LEGAL BASIS MAP:
┌─────────────────────┬──────────┬────────────┬───────────┐
│  Processing Activity│  GDPR    │  DPDP Act  │  CCPA     │
├─────────────────────┼──────────┼────────────┼───────────┤
│  service_delivery   │ Contract │ Legit Use  │ Disclosed │
│  fraud_detection    │ Legit Int│ Legit Use  │ Disclosed │
│  model_training     │ Consent  │ Consent    │ Opt-out   │
│  marketing_email    │ Consent  │ Consent    │ Opt-out   │
│  regulatory_report  │ Legal Obl│ Legit Use  │ Exempt    │
│  analytics_individ  │ Consent  │ Consent    │ Opt-out   │
│  analytics_aggreg   │ Legit Int│ Legit Use  │ Disclosed │
│  third_party_share  │ Consent  │ Consent    │ Opt-out   │
└─────────────────────┴──────────┴────────────┴───────────┘

This mapping determines enforcement behavior. If the basis for model_training is "Consent" under DPDP Act, the Consent Gate must validate that the user's consent record includes purpose=model_training before their data enters the training pipeline. If the basis for fraud_detection is "Legitimate Use" under DPDP Act, no consent check is needed — but the processing activity must still be documented in the register.

For enterprises operating across jurisdictions simultaneously, the mapping must resolve the strictest applicable basis per user. A user located in India is governed by DPDP Act. A user in the EU is governed by GDPR. A user in California is governed by CCPA/CPRA. The Consent Gate must know the user's jurisdiction and apply the correct basis map.

Component 4: The Consent Gate

The Consent Gate is a stateless validation layer that sits at the boundary between raw data and any downstream consumer — analytics pipelines, feature engineering, model training, third-party data shares. It is the enforcement mechanism of the entire consent architecture.

DATA FLOWS FROM SOURCE SYSTEMS
              │
              ▼
┌──────────────────────────────────────┐
│          CONSENT GATE                │
│                                      │
│  FOR each record:                    │
│    1. Resolve user jurisdiction      │
│    2. Look up required legal basis   │
│       for this pipeline's purpose    │
│    3. IF basis = consent:            │
│         Query CMS for user's         │
│         consent on this purpose      │
│         IF NOT granted → BLOCK       │
│       IF basis = legitimate_use:     │
│         PASS (but log)               │
│       IF basis = contract:           │
│         Verify active contract       │
│         IF NOT active → BLOCK        │
│    4. Log decision: PASS or BLOCK    │
│       with timestamp, basis, user_id │
│                                      │
│  STATELESS: stores nothing           │
│  Queries CMS and Legal Basis Map     │
│  in real time                        │
└──────────────┬───────────────────────┘
               │
               │ Only lawfully processable
               │ data passes through
               ▼
        DOWNSTREAM CONSUMERS
        ├─ Analytics Pipeline
        ├─ Feature Engineering
        ├─ Model Training
        ├─ Third-Party Share
        └─ Reporting

The gate is stateless — it stores no data itself. It queries the CMS for consent state, queries the Legal Basis Map for the required basis, evaluates, and passes or blocks. Every decision is logged with the user identifier, the purpose, the legal basis applied, the jurisdiction, and the outcome (pass or block). This log is the audit trail that proves enforcement.

The gate must sit at every pipeline boundary. Not at ingestion alone — at every point where data moves from one processing context to another. Data ingested for service_delivery cannot silently flow into model_training without passing through a gate that validates the user's consent for model_training. This is where most implementations fail: the gate exists at the front door but not at the internal boundaries.

Consent Withdrawal and Cascade

Consent is not permanent. A Data Principal can withdraw consent at any time, and the DPDP Act requires that withdrawal must be as easy as the original grant. When withdrawal occurs, the system must cascade:

USER WITHDRAWS CONSENT FOR model_training
              │
              ▼
┌──────────────────────────────────┐
│  CONSENT MANAGEMENT SYSTEM       │
│  Updates consent record:         │
│  purposes_granted removes        │
│  model_training                  │
│  withdrawal_log appended         │
└──────────────┬───────────────────┘
               │
               ▼
┌──────────────────────────────────┐
│  DOWNSTREAM CASCADE              │
│                                  │
│  1. Feature Store:               │
│     Query for all features       │
│     derived from this user's     │
│     data (via lineage keys)      │
│     → Delete or restrict         │
│                                  │
│  2. Training Datasets:           │
│     Flag any dataset that        │
│     included this user's data    │
│     → Mark as tainted            │
│                                  │
│  3. Model Registry:              │
│     Identify models trained on   │
│     tainted datasets             │
│     → Flag for review            │
│     → Retrain decision           │
│                                  │
│  4. Third-Party Shares:          │
│     Notify any third party that  │
│     received this user's data    │
│     under model_training purpose │
│     → Request deletion           │
│                                  │
│  5. Audit Log:                   │
│     Record every cascade action  │
│     with timestamp               │
└──────────────────────────────────┘

Withdrawal cascade is the hardest operational problem in consent management. It requires lineage keys in the Feature Store (established in Phase 1's asset inventory), versioned training datasets in the Model Registry, and contractual deletion clauses with third parties. Without these prerequisites, withdrawal is acknowledged but not actioned — a compliance fiction.

The critical constraint: you cannot untrain a model. If a model was trained on data from a user who later withdraws consent, the options are: retrain the model excluding that user's data, assess whether the individual's contribution to the model is material (often negligible in large datasets), or document the gap and the remediation timeline. The GDPR's Article 17 (right to erasure) and DPDP Act's Section 12 (right to erasure) both require erasure of personal data, but neither explicitly addresses data embedded in model weights. This is an active area of regulatory interpretation. The defensible position is: document, assess materiality, retrain where feasible, and log the decision.

The Complete Consent Architecture

┌────────────────────┐     ┌──────────────────┐
│  PRODUCT SURFACE   │     │  PURPOSE         │
│  (App, Web, SDK)   │     │  TAXONOMY        │
│  Collects consent  │     │  Canonical list  │
│  per purpose       │     │  of all purposes │
└────────┬───────────┘     └────────┬─────────┘
         │                          │
         ▼                          ▼
┌─────────────────────────────────────────────┐
│  CONSENT MANAGEMENT SYSTEM                  │
│  Stores: principal_id, purposes, version,   │
│  jurisdiction, notice_version, timestamps   │
│  Queryable: sub-100ms                       │
└────────────────────┬────────────────────────┘
                     │
         ┌───────────┼───────────────┐
         ▼           ▼               ▼
┌──────────────┐ ┌────────────┐ ┌──────────────┐
│ LEGAL BASIS  │ │  CONSENT   │ │  WITHDRAWAL  │
│ MAP          │ │  GATE      │ │  CASCADE     │
│ Activity →   │ │  Validates │ │  ENGINE      │
│ basis per    │ │  at every  │ │  Propagates  │
│ jurisdiction │ │  pipeline  │ │  to feature  │
│              │ │  boundary  │ │  store,model │
│              │ │            │ │  reg,3rd pty │
└──────────────┘ └─────┬──────┘ └──────────────┘
                       │
                       ▼
               GOVERNED DATA
               FLOWS TO:
               ├─ Analytics
               ├─ Feature Store
               ├─ Model Training
               ├─ Third Parties
               └─ Reporting

What Done Looks Like

Phase 2 is complete when every processing activity from the Phase 1 register has a documented legal basis per jurisdiction. A Purpose Taxonomy is published, machine-readable, and referenced by all consent records and all pipeline configurations. The Consent Management System stores granular, versioned, notice-linked consent for every Data Principal. Consent Gates are deployed at every pipeline boundary — not just ingestion but every internal handoff where data moves to a new processing context. Withdrawal triggers an automated cascade to the Feature Store, Training Datasets, Model Registry, and third-party processors, with every action logged. The audit trail can reconstruct: for any user, at any point in time, what consent was in effect, what data was processed, under what basis, and what happened when consent changed.

Without Phase 2, Phase 3 (Training Data Governance) has no mechanism to validate that training data was lawfully collected for the purpose of model training. You are training models on data whose legal basis you cannot prove.

Next: Article 3 — Training Data Governance

Appendix: Key Terms in Plain Language

Data Principal — The individual whose personal data is being processed. Called "Data Subject" under GDPR. The person behind the data.

Data Fiduciary — The organization that determines why and how personal data is processed. Called "Data Controller" under GDPR. The entity accountable for compliance.

Legal Basis — The lawful justification for processing personal data. Without one, the processing is illegal regardless of how well the data is secured.

Consent — The Data Principal's explicit, informed, specific agreement to have their data processed for a stated purpose. Must be freely given, not bundled, and as easy to withdraw as it was to grant.

Legitimate Use — Under DPDP Act, processing permitted without consent for specific purposes: state functions, legal obligations, medical emergencies, employment. The equivalent of GDPR's "legitimate interests" but more narrowly defined.

Purpose Taxonomy — A controlled, enumerated list of every reason the organization processes personal data. Not free text. A fixed vocabulary that the legal, product, and engineering teams share.

Consent Gate — A validation checkpoint that sits between data sources and data consumers. It checks whether the user consented to the specific purpose this pipeline serves. If not, the record is blocked. The gate stores nothing — it only checks and passes or blocks.

Notice — The document shown to the user that explains what data is collected, why, and how. Under DPDP Act, the notice must be understandable on its own, provide an itemised description of personal data and purposes, and include a means to withdraw consent.

Consent Versioning — Keeping an immutable history of every consent change. If a user consented in January, modified in March, and withdrew in June, all three states are recorded. A regulator can ask what was in effect on any date.

Withdrawal Cascade — The chain of actions triggered when a user withdraws consent. Data must be deleted or restricted in every downstream system that processed it under the withdrawn purpose. This includes feature stores, training datasets, model registries, and third-party recipients.

Stateless — A system that does not store data between requests. The Consent Gate is stateless — it queries the CMS for each decision, makes the decision, logs it elsewhere, and forgets. This makes it simple, fast, and horizontally scalable.

Lineage Keys — Pointers stored in downstream systems (like a feature store) that reference which source records produced each derived record. These enable withdrawal cascades — when a user withdraws, the system follows lineage keys backward to find and remove all derived data.

Model Weights — The internal numerical parameters of a trained machine learning model. When a model is trained on user data, that data's statistical patterns are encoded into the weights. You cannot extract or delete an individual's data from the weights without retraining the model.

GDPR Article 6 — The provision listing the six lawful bases for processing personal data: consent, contract, legal obligation, vital interests, public task, and legitimate interests.

DPDP Section 6 — The DPDP Act provision governing consent: must be free, specific, informed, unconditional, and unambiguous, with clear affirmative action by the Data Principal.

DPDP Section 7 — The DPDP Act provision defining legitimate uses: processing permitted without consent for specified state, legal, medical, and employment purposes.

CCPA/CPRA Opt-Out — California's model where most processing is permitted by default if disclosed, but users have the right to opt out of sale or sharing of their data and certain profiling activities.

Consent & Legal Basis Engine: Every Byte Traceable to a Lawful Basis