Written by

Sumeshwar Pandey

View Profile
DPDP Act, 2023 Legacy data India

Handling Legacy Data Collected Before DPDP

What Indian leadership teams need to do with years of historical personal data under the DPDP Act and Rules.
Key takeaways
  • The DPDP Act is not retrospective on how legacy data was collected, but it fully governs any ongoing processing of that data after commencement.
  • Section 5(2) and the DPDP Rules create a transition regime: historic consents can continue, but you must provide DPDP-grade notices for legacy data within prescribed timelines.
  • A credible legacy data strategy treats old datasets as a portfolio, classifying them by lawful basis, sensitivity, business value, and sectoral retention duties before deciding to retain, re-notice, anonymise, or delete.
  • Most enterprises will need a 12–18 month, phased programme covering data inventory, consent and notice strategy, execution and tooling, and ongoing monitoring and reporting.
  • Governance is as important as technology: boards, data protection leads, CISOs, CIOs, and business owners all need clear roles, metrics, and escalation paths for DPDP-era legacy data decisions.

Legacy data under DPDP: why it is now a board-level issue

Imagine your next board or audit committee meeting. Management reports that DPDP-compliant notices are now live on apps and websites, new consent flows have been redesigned, and a Data Protection Officer has been appointed. A director then asks a simple question: what about the eight years of CRM records, marketing lists, employee files, transaction logs, and backups sitting across on-premise servers and cloud accounts? In most Indian enterprises, that is where the real exposure lies.
The Digital Personal Data Protection Act, 2023 does not reopen the question of whether older collection practices were lawful at the time. But from the date the Act and DPDP Rules commence for you, any ongoing processing of digital personal data – including data digitised from paper forms – is governed by the new regime. That means legacy data is only historical in age, not in regulatory relevance. If you continue to store, analyse, enrich, or share it, you are now doing so under DPDP obligations.[1]
Unmanaged legacy datasets concentrate regulatory, security, and reputational risk. They are often sprawling, poorly documented, and full of unnecessary or outdated details, including high-risk attributes such as financial and health information. They expand the impact radius of any breach, make it hard to honour access and erasure requests, and increase the chances that you are using data for purposes that no longer have a defensible consent or legal basis. With the DPDP Act providing for significant rupee-denominated penalties per type of contravention, and with the Data Protection Board empowered to examine your governance posture, boards can no longer treat legacy data as a purely operational clean-up issue.[1]

How the DPDP Act and Rules treat data collected before commencement

For leadership, the first clarity point is scope. The DPDP Act applies to the processing of digital personal data in India, regardless of when that data was collected. What it does not do is retrospectively punish collection that took place under older laws. The line the Act draws is between past collection and present or future processing. If your teams are still using an old marketing list today, running analytics on transaction histories, or holding years of HR archives in active systems, those activities must now comply with DPDP requirements.
Section 5 sets out the rule that consent-based processing must be preceded by a clear, itemised notice. Section 5(2) then creates a transition framework for legacy consents: if a Data Principal voluntarily gave you personal data for a specified purpose before the Act commenced and has not indicated otherwise, that consent can continue, but you must provide a DPDP-grade notice as soon as reasonably practicable, in the format and channels the DPDP Rules specify for historical data. Current interpretations of the notified Rules point to a general compliance window of around twelve months from notification for most obligations, with scope for phased commencement by class of Data Fiduciary; any use of legacy datasets beyond that window without updated notices or a different lawful basis will be difficult to defend.[3]
The broader rights and duties framework also applies to old data. Data Principals can seek access, correction, and erasure, and can withdraw consent, regardless of when their data was first captured. As a Data Fiduciary, you must implement reasonable security safeguards, notify reportable personal data breaches to the authorities and affected individuals as required, and maintain a functioning grievance redress mechanism. At the same time, DPDP recognises that you may need to retain data to comply with other laws or defend legal claims, so retention that is necessary for compliance with tax, employment, financial-sector, or telecom record-keeping obligations can have its own lawful basis; what it does not justify is open-ended reuse of that data for new marketing, profiling, or analytics beyond those purposes.[1]

Designing a legacy data strategy: classify, decide, and prioritise

The most practical way to approach legacy data is to treat it as a portfolio that needs to be shrunk and rebalanced, not as a monolith to be kept or deleted wholesale. That portfolio needs to be classified along four main dimensions: purpose, lawful basis, risk, and business value. Board-level sponsorship is critical here, because the answers cut across legal, technology, operations, and commercial strategy.
Start with purpose and lawful basis. For each major dataset, ask what you are actually doing with it today and what you plan to do over the next few years. Some purposes are likely anchored in consent, such as direct marketing based on email or phone lists. Others may fall within legitimate uses recognised by the Act, such as processing needed to comply with tax law, labour regulations, or anti-fraud obligations. In parallel, assess risk: does the dataset include high-risk attributes like financial transactions, health details, biometric identifiers, or information about children; is it linked to identity, or aggregated and pseudonymised; how many individuals are covered; and how widely is it shared, including with processors and partners. Finally, evaluate business value: is the data actively used for decisions that matter to revenue, risk models, product performance, or customer support, or is it primarily a convenience archive or a “just in case” store of records.
Once you have these dimensions, treatment options become clearer. For a legacy customer marketing list where contact details were captured years ago via broad consent language, business value may be modest and risk high, especially if many contacts are dormant. A rational outcome is often to send a DPDP-compliant notice, seek fresh, specific marketing consent from active contacts, and systematically delete or anonymise segments with low engagement or unreachable addresses. For product or app telemetry logs tied to user identifiers, the analytics value may be significant, but so is privacy and security risk. Here, shortening retention periods, aggregating or anonymising older data, and ensuring a clear notice that explains service-improvement uses can help you preserve insight while reducing exposure. HR archives sit in a different bucket: laws on employment, provident fund, gratuity, and taxation often require defined minimum retention periods. In such cases, the right answer is usually to retain only what is legally and operationally necessary, restrict access tightly, repurpose as little as possible for other uses, and implement clear retention schedules that trigger deletion after statutory windows close. For system logs and backups, value beyond a few months is often limited to forensic investigations, while risk remains high; this is a strong candidate for aggressive minimisation, encryption, and time-bound retention aligned with security and sectoral guidance.
Prioritisation follows from this classification. High-risk, low-value datasets – such as old marketing databases with poor data quality and uncertain consents – are prime candidates for early remediation through deletion or anonymisation. High-value, high-risk datasets warrant deeper design work to tighten purpose, notices, and retention while protecting their analytical benefits. Low-risk, low-value datasets can often be handled through straightforward retention rules. Mapping these categories into waves of work over a 12–18 month horizon helps your team align legal deadlines with realistic delivery capacity, instead of attempting an undifferentiated clean-up that stalls under its own weight.

Operationalising the clean-up: a phased programme for enterprises

At board level, what you need is not a laundry list of tasks but a clear, time-bound programme that someone owns. A four-phase structure usually works: discovery, design, execution, and monitoring, each with a small set of metrics that show whether legacy data risk is actually coming down.
  1. Discovery: build a credible inventory
    Over the first few months, your data, technology, and business teams should identify where personal data sits across core systems, data warehouses, SaaS tools, file servers, and backups, including shadow IT. The output should be a register of major datasets, tagged by source, purpose, volume, risk indicators, and any known consents or legal obligations, with an accountable owner named for each.
  2. Design: set policy, lawful basis, and retention
    Design turns the inventory into policy. Legal and compliance teams, working with business owners, determine the lawful basis for each class of data, set retention periods with explicit references to sectoral rules where they apply, decide which datasets will be re-noticed, where fresh consent is needed, what will be anonymised, and what can be safely deleted. External counsel is often needed to resolve tensions between DPDP expectations and sectoral record-keeping mandates.
  3. Execution: remediate datasets and update tooling
    Execution is where technology and operations carry the load. This phase includes building or enhancing capabilities for data discovery and classification, implementing or upgrading consent and notice management so that you can log when and how each Data Principal was informed, putting in place deletion and anonymisation workflows that are auditable, and running the actual re-notice and clean-up campaigns across legacy datasets.
  4. Monitoring and reporting: move into business-as-usual
    In the final phase, responsibilities and metrics move into business-as-usual. Periodic reviews of the data inventory, dashboards showing coverage of DPDP-grade notices across your base, trends in data deletion and anonymisation, time taken to close grievances and rights requests, and regular testing of breach response plans all become standard reporting. When the Data Protection Board asks how you manage legacy data, being able to evidence this ongoing operation matters as much as the initial clean-up design.

Managing trade-offs between data value and compliance risk

Legacy data decisions are ultimately trade-offs between the incremental value of retaining a dataset and the cost, risk, and complexity of keeping it. In many boardrooms, the default reflex has been to retain everything because storage is cheap and someone might need the data later. Under DPDP, that reflex becomes expensive: every additional year of retention adds rights-handling workload, increases the impact of a potential breach, and raises the probability that you are processing data with an unclear lawful basis.
A simple way to structure the trade-off is to look at three variables for each major dataset – business value, regulatory and security risk, and effort to remediate – and then decide whether to keep, minimise, or retire it. The examples below illustrate how this plays out for common categories of legacy data.
Illustrative trade-offs for typical legacy datasets by value, risk, remediation effort, and preferred treatment.
Legacy dataset type Current business value Regulatory and security risk Effort to remediate Preferred treatment and priority
Customer marketing lists (5+ years old) Low to moderate. Many contacts are dormant; impact on current revenue is limited. Moderate to high. Consent quality is uncertain, data may be outdated, and volumes are often large. Moderate. Requires DPDP-grade notices, handling opt-outs, and cleaning up unreachable or unresponsive contacts. Re-notice active contacts and seek fresh, specific marketing consent; delete or anonymise dormant and unreachable segments as an early-wave action.
Transaction histories used for fraud, credit, or core analytics High. Critical for fraud detection, credit models, and understanding long-term customer behaviour. High. Often includes sensitive financial and behavioural data; regulators expect strong safeguards and clear purpose limitation. High. Requires careful design of retention windows, aggregation or anonymisation for older data, and updated notices explaining these uses. Retain a defined window of identifiable data for risk and analytics; aggregate or anonymise older records, with tight access controls and governance. Prioritise in design, not in deletion.
HR and payroll archives Moderate. Needed for employment history, disputes, audits, and compliance with labour and tax laws. High. Contains identifiers, salary information, performance data, and sometimes health-related information on employees and dependants. Moderate. You must align with statutory retention periods and introduce segregation and access controls, but wholesale deletion is often not an option. Retain only fields necessary for legal and operational purposes, segregate and lock down access, avoid repurposing for secondary analytics, and delete after statutory windows close.
System logs and backups containing personal data Low to moderate. Valuable for troubleshooting and forensic investigations, but limited day-to-day business value beyond a defined period. High. May contain full records, credentials, and sensitive fields, often replicated across multiple environments and storage locations. Moderate. Requires tuning retention policies, encrypting archives, reducing redundant copies, and building deletion or anonymisation into backup lifecycles. Apply aggressive minimisation and short retention windows aligned to security and sectoral guidance, with strong encryption and a focus on early clean-up of aged backups.
Cost of inaction is an important lens. If you defer all hard decisions until just before a regulatory deadline or until after a breach, you will be making choices under pressure, in public, and under the scrutiny of the Data Protection Board. That often leads to abrupt, disruptive cuts that damage analytics and operations more than a planned, staged clean-up would have. By treating legacy datasets as a portfolio today – deliberately retiring low-value, high-risk data and investing governance effort in the limited number of high-value, high-risk datasets – you not only reduce penalty and breach exposure, you also create a leaner, better understood data estate that is easier to audit, secure, and explain to customers, regulators, and investors.

Governance, accountability, and reporting for legacy data

Legacy data remediation is not an IT side project. Under DPDP, accountability sits with the Data Fiduciary’s leadership, and regulators will expect to see governance that mirrors the seriousness of financial, cyber, and operational risks. For larger organisations, that often means board-level oversight through a risk, audit, or technology committee, with DPDP compliance – including legacy data – as a standing agenda item.[2]
A workable ownership model typically combines a senior executive sponsor, a data protection lead or Data Protection Officer, and clear roles for the CISO, CIO or CTO, legal and compliance, and business unit heads. The sponsor, often a COO, CRO, or CIO, is accountable for delivery of the legacy data programme. The data protection lead designs the framework, interprets DPDP obligations with legal support, and coordinates implementation. The CISO ensures that security controls over legacy systems and archives meet the standard of reasonable security safeguards, and that breach response plans incorporate DPDP-era notification duties. The CIO or CTO is responsible for the tooling and architecture needed for data inventory, consent logging, and deletion and anonymisation. Business leaders act as data owners: they sign off on purpose, lawful basis, and retention for datasets they rely on, and they are answerable if teams continue using legacy data in ways that diverge from agreed policies.
Effective reporting gives the board and senior management a way to judge progress without getting lost in technical detail. Useful metrics include the proportion of high-risk systems covered in the data inventory; the percentage of legacy records for which DPDP-compliant notices have been delivered; the share of legacy data volumes that have been deleted or anonymised; the number and age profile of unresolved grievances; average response times for access and erasure requests; and the number and severity of personal data breaches involving legacy datasets. These indicators should sit alongside more traditional risk metrics, and they should explicitly cover alignment with sectoral record-keeping rules: for example, the percentage of statutory-retention datasets that have an up-to-date legal citation and a documented plan for what happens when the mandated retention period ends. When the Data Protection Board, sectoral regulators, or auditors ask how you manage legacy data, being able to point to this governance structure and evidence-based reporting can significantly influence their assessment of your diligence.

Executive checklist and common questions on DPDP legacy data

At the end of any internal review, you need a quick way to test whether your organisation’s approach to legacy data is credible. A practical method is to pose a short set of yes-or-no questions and insist on evidence-backed answers for every “yes”.
  • Do we have a documented inventory of our major legacy datasets, including those in shadow IT and backups, with accountable owners named for each?
  • Have we classified these datasets by purpose, lawful basis, risk, and business value, rather than just by system name?
  • Has the board or a designated committee reviewed and approved a legacy data strategy that explicitly covers DPDP obligations, sectoral retention rules, and a 12–18 month remediation roadmap?
  • For each significant dataset, do we know whether we intend to retain with notice, seek fresh consent, anonymise, or delete – and by when?
  • Can we show, for legacy consents, when and how DPDP-grade notices will be or have been provided?
  • Have we agreed which classes of data we will not attempt to re-use because reaching Data Principals is impractical, and instead plan to anonymise or retire them?
  • Can we respond to a representative set of access and erasure requests across legacy systems within the statutory timelines?
  • Do we understand how vendors and processors fit into this picture, including whether our contracts require them to support deletion, anonymisation, and rights handling for legacy data they hold on our behalf?
  • Is there a budgeted, staffed programme with clear milestones and reporting to leadership, rather than a loose collection of projects and policy documents?
If the honest answer to several of these questions is no or uncertain, the risk is not only theoretical. As notices and compliance deadlines under the DPDP Rules take effect, individuals will increasingly expect to manage their data, sectoral regulators will sharpen their own privacy expectations, and any breach involving unmanaged legacy datasets will attract tougher scrutiny. Leadership teams that treat legacy data as a strategic, time-bound portfolio decision now will be better placed to defend their position before the Data Protection Board and other regulators, while also avoiding hurried, value-destructive decisions later. A helpful next step is often to commission a cross-functional diagnostic focused specifically on legacy data and to align its findings with advice from counsel who track both DPDP developments and the evolving guidance of regulators in your sector.
FAQs

The DPDP Act applies to the processing of digital personal data after its commencement, regardless of when that data was collected. In practice, this means that if you continue to store, use, analyse, or share personal data that was collected years ago, those activities must now comply with DPDP requirements. The Act is not retrospective in the sense of penalising you simply for having collected data under earlier laws, but it does govern all ongoing processing. Section 5(2) provides a transition for legacy consents: if a Data Principal voluntarily provided data for a specified purpose before commencement and has not indicated otherwise, that consent can continue, provided you give them a DPDP-compliant notice as soon as reasonably practicable and in line with the Rules. From a risk perspective, you should assume that everything you are doing today with personal data – old or new – may be examined under the DPDP lens if there is a complaint, breach, or investigation.[1]

No. DPDP does not require blanket deletion of all historical personal data. What it demands is that any personal data you retain has a clear, current lawful basis, is used for specified and legitimate purposes, and is not kept longer than necessary for those purposes or for compliance with law. In many cases, you will need to continue holding certain records to meet tax, corporate, employment, financial sector, or other statutory obligations. For high-value datasets that support fraud prevention, risk management, or essential analytics, you may be able to justify retention under DPDP when combined with appropriate notices, access controls, and defined retention windows. The real target for deletion or strong anonymisation is low-value, high-risk legacy data: old marketing lists with poor consent trails, redundant copies of customer records, unstructured archives that are rarely used, or aged logs that no longer serve a clear operational or legal purpose. Where it is impossible or impractical to reach Data Principals to provide DPDP-grade notices – for example, because contact details are obsolete – you should be cautious about continuing to use that data for discretionary purposes such as marketing, and should seriously consider anonymising or retiring it instead.

Current interpretations of the notified DPDP Rules indicate a general compliance window of around twelve months from notification for most obligations, with scope for phased commencement by sector, size, or risk profile. Within that window, Section 5(2) and the Rules expect Data Fiduciaries to provide DPDP-compliant notices to individuals whose data was collected earlier and whose consent is being carried forward. In practice, a medium to large enterprise will often need the full 12–18 months to inventory, classify, and remediate legacy datasets in a structured way. Missing the window does not mean that all your processing becomes automatically unlawful overnight, but it does increase your exposure: if you have not provided required notices, cannot demonstrate a lawful basis for ongoing use of legacy data, or are unable to respond to rights requests involving older datasets, you make it easier for the Data Protection Board to conclude that you have failed to meet your obligations, raising the likelihood of monetary penalties and binding directions to change or stop processing.[3]

Legacy data is particularly exposed to several DPDP contraventions. Failure to provide proper notices for historic consents or to ensure that ongoing processing is covered by a valid lawful basis can amount to non-compliance with foundational obligations on notice and consent. Because legacy datasets are often large and widely replicated, they are common sources of security incidents; inadequate safeguards or delayed breach notification can attract separate penalties. If your systems cannot locate and act on a Data Principal’s access, correction, or erasure request across legacy archives, you risk violating their rights. Unmanaged data held by processors and vendors on your behalf can lead to vicarious exposure if contracts and controls are weak. The Act’s penalty schedule provides for significant rupee-denominated monetary penalties per type of contravention, with the Data Protection Board expected to consider factors such as the nature and gravity of the breach, the volume and sensitivity of data affected, the duration of non-compliance, and the steps you took to prevent and remedy harm.[1]

DPDP sits alongside, not above, sectoral and general laws that impose record-keeping duties. If RBI, SEBI, IRDAI, TRAI, or tax statutes require you to retain certain records for a defined number of years, you generally must continue to hold that data even if a Data Principal withdraws consent or asks for erasure, to the extent necessary to meet those legal obligations. Under DPDP, this can operate as a lawful basis under the category of compliance with law or legitimate uses. However, that does not give you a free licence to use the same data for unrelated purposes, such as marketing or broad profiling, once consent is withdrawn or the original purpose has been served. The practical approach is to build and maintain a retention schedule that clearly maps each category of data to its governing laws and minimum retention periods, ensure that fields kept solely for statutory purposes are segregated and access-controlled, and delete or anonymise non-essential fields when they are no longer needed. Where sectoral regulators issue their own privacy or cyber guidelines, align your legacy data programme with those expectations as well, so that you can show both the Data Protection Board and your primary regulator that you have reconciled the two regimes in a thoughtful, documented way.[2]

Sources
  1. Digital Personal Data Protection Act, 2023 - Ministry of Electronics and Information Technology, Government of India
  2. DPDP Rules, 2025 Notified – A Citizen-Centric Framework for Privacy Protection and Responsible Data Use - Press Information Bureau, Government of India
  3. Decrypting India’s New Data Protection Law: Key Insights and Lessons Learned - Bird & Bird
  4. Top 10 operational impacts of India’s DPDPA – Individual rights - International Association of Privacy Professionals (IAPP)
  5. Summary of DSCI’s submission to MeitY on Draft Digital Personal Data Protection Rules, 2025 - Data Security Council of India