Data protection by design

Some of the principles of the General Data Protection Regulation (GDPR) look nice on paper, but it can be hard to implement them.

The principle of "data minimisation", for instance, states that personal data must be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed" (Art. 5(1)(c) GDPR). It is a principle that applies at every stage of the lifecycle of personal data: only collect the data that you need, only analyse or use the data that you need, only store the data that you need – and only as long as you actually need it (the sister principle of "storage limitation" has strong ties with data minimisation). But how can you identify what you actually need?

Likewise, the principle of "integrity and confidentiality" states that the processing must ensure "appropriate security of the personal data" (Art. 5(1)(f) GDPR), and the GDPR further mentions "the pseudonymisation and encryption of personal data" as possible measures to comply with this principle (Art. 32 GDPR). So then: what is appropriate security?

All of those data protection principles come together in the general obligation for controllers to implement "data protection by design" and "data protection by default" (Art. 25 GDPR). While the legal obligation only applies to controllers, nothing prevents them from contractually requiring their processors to comply with these principles as well. So again: what does this actually mean?

To give an indication of what these concepts mean and how to implement them, we will look at guidance from the EDPB (the European Data Protection Board), the Norwegian Data Protection Authority and ENISA (the EU Agency for Cybersecurity).

NOTE: some of the guidance discussed hereunder is technical and requires some knowledge of information security or software development/management. We have therefore summarised in general terms the guidance itself, and have included at the bottom of the article technical notes that delve further into the technical content of the guidance in question. If these areas are not your strength, we suggest sharing those sections with your information security or IT colleagues to determine how you as an organisation can make the most out of the guidance.

1. EDPB guidelines on data protection by design & by default: almost practical

The EDPB has been working on new Guidelines on data protection by design & by default ("DPbDD" in their lingo), and has published them in the context of a public consultation. Here are some of the key takeaways of these guidelines:

Organisations need to bear the cost of implementing DPbDD ("incapacity to bear the costs is no excuse for non-compliance"), but effectiveness of solutions is relevant in determining what cost is indeed "necessary" (the guidelines state that low-cost solutions can sometimes be just as or even more effective than expensive ones);
A risk assessment is required as part of the process for determining the relevant measures (controllers must take into account "the risks of varying likelihood and severity for rights and freedoms of natural persons posed by the processing"). This "risk-based approach" is consistent across Articles 24 [responsibility of the controller], 25 [DPbDD], 32 (security) and 35 [DPIAs] of the GDPR, so DPIA risk assessments are relevant for assessing risk in the context of DPbDD. However, warn the guidelines, toolboxes should take into account the processing at hand (i.e. you cannot simply copy over a generic risk assessment);
DPbDD is a continuous obligation, starting at the very beginning, and regular re-evaluation of measures and safeguards is required (also at the level of processors – the controller should regularly review and assess its own processor's operations);
"Data protection by default" means limiting staff's access to data and minimising processing "out of the box";
Retention of data "should be objectively justifiable and demonstrable";
While anonymisation helps limit the risk in relation to certain processing (and is a good application of both DPbDD and the security measures under Art. 32 GDPR), it should not be viewed as the end of the story in terms of compliance, as re-identification might still be a risk as techniques evolve and other datasets are created – therefore, a regular assessment of the likelihood and severity of risk (including the risk of re-identification) is still required;
The guidelines contain useful input on how to implement key principles. For instance, universal design and accessibility are mentioned as examples of key design and default elements (with an explicit reference to machine readable languages) in relation to the principle of transparency. Non-discrimination is mentioned as an example of a key design and default element for the principle of fairness. Drop-down menus are also mentioned as a way to improve the accuracy of personal data.
DPbDD is a factor to be taken into account by supervisory authorities in determining the level of fines.

The guidelines provide a useful framework for organisations to understand how the principles of DPbDD work, but there is still little in the guidelines that is immediately actionable.

2. Norwegian Data Protection Authority guidance on data protection by design: actionable checklists

While Norway is not part of the European Union, it is part of the European Economic Area (EEA) and is subject to the rules of the GDPR since July 2018 by virtue of the EEA Agreement. In this context, the Norwegian Data Protection Authority, Datatilsynet, is a member of the EDPB, although it does not have voting rights within the EDPB.

The Norwegian DPA has maintained and regularly updated over the past few years a guide on software development with data protection by design and by default, available online in English, which it prepared in cooperation with security experts and software developers.

While it focusses on software development, the Norwegian DPA's guidance is relevant in many non-software circumstances. In addition, it anticipates many of the recommendations contained in the EDPB's DPbDD guidelines, making it a useful baseline for the practical implementation of the principles of data protection by design and data protection by default.

The guidance covers seven stages or activities (training, requirements, design, coding, testing, release and maintenance), and for each of these activities the guidance includes a practical and actionable checklist.

We have included at the bottom of this article a summary of what the checklists cover in relation to each of these topics (see Technical Note 1).

As a result, the guidance can serve as a technical baseline for teams working on implementing the principles of data protection by design and by default, e.g. with the checklists as an annex to a more general (and less technical) "data protection by design policy".

3. ENISA recommendations on pseudonymisation techniques and best practices

On 3 December 2019, ENISA published a new report, "Pseudonymisation techniques and best practices – recommendations on shaping technology according to data protection and privacy provisions".

This report starts with a discussion of a number of pseudonymisation scenarios and analyses various adversarial models and attacking techniques used against pseudonymisation (e.g. brute force attack, dictionary search, guesswork). It also presents the main pseudonymisation techniques (e.g. counter, random number generator, cryptographic hash function, message authentication code and encryption) and pseudonymisation policies (e.g. deterministic, document-randomised and fully randomised pseudonymisation) available today.

The practical significance of this report comes, however, from the examination in practice of these pseudonymisation techniques in specific scenarios, in particular e-mail address pseudonymisation.

As with the guidance of the Norwegian DPA, we have included a summary of this ENISA guidance at the bottom of this article, due to its technical nature (see Technical Note 2).

The report concludes that the best approach to pseudonymisation involves applying pseudonymisation to all data values, taking the whole dataset into account and ensuring that the resulting dataset keeps only the type of utility necessary for the purpose of processing.

The report is a useful addition to the toolset of the teams involved in assessing whether and how to deploy pseudonymisation techniques, and can assist organisations in determining how best to minimise their processing of personal data while improving the security of the personal data in question.

4. Conclusion

What then does data protection by design and by default mean? In non-technical language, it means embedding data protection into the culture of the organisation and its product development process. In practice, the best way forward for technical teams involves drawing up a list of requirements for every stage of development, and the checklists of the Norwegian DPA and the pseudonymisation guidance of ENISA can serve as a useful baseline for those requirements. These requirements must also be discussed with other roles within the organisation (business, operations, legal, data protection, …) to ensure all appropriate controls are in place.

Whatever you decide, document it properly. After all, data protection by design and by default also includes the principle of accountability, and it will also make it easier for you to demonstrate that you are indeed compliant.

TECHNICAL NOTE 1: summary of the checklists of the Norwegian Data Protection Authority:

Training: the guidance recommends training on the GDPR itself, on related legislation (e.g. e-Privacy), on information security frameworks (e.g. ISO 27001), on the framework for software development (e.g. Microsoft Security Development Lifecycle), on security testing (e.g. OWASP Top 10), on threat and risk assessment documentation requirements (e.g. Microsoft Threat Modelling Tool). It moreover recommends differentiated training based on individuals' roles: a basic understanding of privacy and information security is crucial for all employees, while developers must be competent in e.g. the topic of secure coding. The checklist for training even includes a reference to metaphors and mnemonics, such as XKCD's "Little Bobby Tables" cartoon.
Requirements: Organisations should define the data protection and information security requirements for any given project. The checklist for requirements contains an impressively detailed (but non-exhaustive) list of action items on e.g. what needs to be done before the requirements are set, requirements for meeting the principles of data protection, requirements to protect the rights of data subjects etc. In relation to security in general, the checklist mentions five security principles: confidentiality, integrity, accessibility, resilience and traceability (C, I, A, R, T). The specific security requirements will then typically be linked to one or more of those security principles (e.g. identification of users in the context of access control = T; strong password requirements = C, I, A). The checklist mentions the OWASP Application Security Verification Standards as a useful illustration of security requirements for use in software development, as well as ISO 27034 as an example on how to find an acceptable level of risk.
Design: The design-related checklist refers to the subdivision introduced by ENISA (in its 2014 report on privacy and data protection by design) between data-oriented design requirements ("minimise and limit", "hide and protect", "separate", "aggregate", "data protection by default") and process-oriented design requirements ("inform", "control", "enforce", "demonstrate"), with practical implementation examples. In addition, the checklist recommends (i) analysing and reducing the attack surface of the software under development and (ii) threat modelling, with notably a reference to the STRIDE (spoofing, tampering, repudiation, information disclosure, denial of service and elevation of privilege) and DREAD (damage, reproducibility, exploitability, affected users and discoverability) methodologies.
Coding: The coding checklist focusses on four main areas: (i) the use of approved tools and libraries, (ii) scanning dependencies for known vulnerabilities or outdated versions, (iii) manual code review and (iv) static code analysis with security rules. The checklist includes useful recommendations on e.g. what to include in a list of tools and libraries, as well as examples of tools for static code analysis.
Testing: At the testing stage, the checklist includes general test recommendations as well as specific guidance on security testing (dynamic testing, fuzz testing, penetration testing or vulnerability analysis; testing in multiple instances; automatic execution of test sets before release). In addition, the checklist stresses the importance of reviewing the attack surface of the software under development.
Release: At the release stage, the focus should lie on (i) an incident response plan, (ii) a full security review of the software and (iii) a process involving approval of release and archiving. In relation to the incident response plan, the checklist sets out detailed recommendations on the life cycle of deviations and related procedures for detecting, analysing and verifying, reporting and handling incidents, followed by the need for normalising (restoring management, operation and maintenance to their normal state).
Maintenance: In relation to maintenance, the key recommendation relates to incident response (the previous checklist expands upon this point in further detail). For the surplus, the checklist mentions topics such as continuous assessment of vulnerability detection measures, metrics etc.

TECHNICAL NOTE 2: summary of the takeaways of the ENISA report on pseudonymisation:

Counter and random number generator techniques are considered strong as long as the mapping table is secured and stored separately from the pseudonymised data, but counter techniques are deemed weaker than random number generator techniques as they allow for predictions (due to their sequential nature);
Cryptographic hash functions are considered to be a weak technique for e-mail address pseudonymisation (EAP hereunder) because a dictionary attack is trivial (due to the number of e-mail addresses used today);
Message authentication code (MAC) techniques: compared to hashing, MAC presents significant data protection advantages also for EAP, as long as the secret key is securely stored, and as long as no recovery is needed (i.e. de-pseudonymising data – which is difficult in the case of MAC). MAC is also suggested as a possible technique for interest-based display advertising, where unique pseudonyms are used but advertisers do not need to know the user's original identity;
Assymetric (public key) encryption is not recommended for EAP, because of the availability of the public key and because of the possibility of dictionary attacks;
Format preserving encryption allows pseudonymised data to retain some utility (which is notably useful for EAP), but it is important to avoid the emergence of patterns (and therefore to be careful in the configuration of the format preserving encryption mechanism);
The best approach to pseudonymisation involves applying pseudonymisation to all data values, taking the whole dataset into account and ensuring that the resulting dataset keeps only the type of utility necessary for the purpose of processing.