GDPR – A Practical Guide For Developers

You’ve probably heard about GDPR. The new European data protection regulation that applies practically to everyone. Especially if you are working in a big company, it’s most likely that there’s already a process for getting your systems in compliance with the regulation.

The regulation is basically a law that must be followed in all European countries (but also applies to non-EU companies that have users in the EU). In this particular case, it applies to companies that are not registered in Europe, but are having European customers. So that’s most companies. I will not go into yet another “12 facts about GDPR” or “7 myths about GDPR” posts/whitepapers, as they are often aimed at managers or legal people. Instead, I’ll focus on what GDPR means for developers.

Why am I qualified to do that? A few reasons – I was advisor to the deputy prime minister of a EU country, and because of that I’ve been both exposed and myself wrote some legislation. I’m familiar with the “legalese” and how the regulatory framework operates in general. I’m also a privacy advocate and I’ve been writing about GDPR-related stuff in the past, i.e. “before it was cool” (protecting sensitive data, the right to be forgotten). And finally, I’m currently working on a project that (among other things) aims to help with covering some GDPR aspects (namely – secure audit logs).

I’ll try to be a bit more comprehensive this time and cover as many aspects of the regulation that concern developers as I can. And while developers will mostly be concerned about how the systems they are working on have to change, it’s not unlikely that a less informed manager storms in in late spring, realizing GDPR is going to be in force tomorrow, asking “what should we do to get our system/website compliant”.

The rights of the user/client (referred to as “data subject” in the regulation) that I think are relevant for developers are: the right to erasure (the right to be forgotten/deleted from the system), right to restriction of processing (you still keep the data, but mark it as “restricted” and don’t touch it without further consent by the user), the right to data portability (the ability to export one’s data in a machine-readable format), the right to rectification (the ability to get personal data fixed), the right to be informed (getting human-readable information, rather than long terms and conditions), the right of access (the user should be able to see all the data you have about them).

Additionally, the relevant basic principles are: data minimization (one should not collect more data than necessary), integrity and confidentiality (all security measures to protect data that you can think of + measures to guarantee that the data has not been inappropriately modified).

Even further, the regulation requires certain processes to be in place within an organization (of more than 250 employees or if a significant amount of data is processed), and those include keeping a record of all types of processing activities carried out, including transfers to processors (3rd parties), which includes cloud service providers. None of the other requirements of the regulation have an exception depending on the organization size, so “I’m small, GDPR does not concern me” is a myth.

It is important to know what “personal data” is. Basically, it’s every piece of data that can be used to uniquely identify a person or data that is about an already identified person. It’s data that the user has explicitly provided, but also data that you have collected about them from either 3rd parties or based on their activities on the site (what they’ve been looking at, what they’ve purchased, etc.)

Having said that, I’ll list a number of features that will have to be implemented and some hints on how to do that, followed by some do’s and don’t’s. Note that (as pointed out in each feature) they don’t necessarily have to be automated – you could just have a manual process in place. But for bigger systems it would be much better to have them automated.

  • “Forget me” – you should have a method that takes a userId and deletes all personal data about that user (in case they have been collected on the basis of consent or based on the legitimate interests of the controller (see more below), and not due to contract enforcement or legal obligation). It is actually useful for integration tests to have that feature (to cleanup after the test), but it may be hard to implement depending on the data model. In a regular data model, deleting a record may be easy, but some foreign keys may be violated. That means you have two options – either make sure you allow nullable foreign keys (for example an order usually has a reference to the user that made it, but when the user requests his data be deleted, you can set the userId to null), or make sure you delete all related data (e.g. via cascades). This may not be desirable, e.g. if the order is used to track available quantities or for accounting purposes. It’s a bit trickier for event-sourcing data models, or in extreme cases, ones that include some sort of blockchain/hash chain/tamper-evident data structure. With event sourcing you should be able to remove a past event and re-generate intermediate snapshots. For blockchain-like structures – be careful what you put in there and avoid putting personal data of users. There is an option to use a chameleon hash function, but that’s suboptimal. Overall, you must constantly think of how you can delete the personal data. And “our data model doesn’t allow it” isn’t an excuse. What about backups? Ideally, you should keep a separate table of forgotten user IDs, so that each time you restore a backup, you re-forget the forgotten users. This means the table should be in a separate database or have a separate backup/restore process.
  • Notify 3rd parties for erasure – deleting things from your system may be one thing, but you are also obligated to inform all third parties that you have pushed that data to. So if you have sent personal data to, say, Salesforce, Hubspot, twitter, or any cloud service provider, you should call an API of theirs that allows for the deletion of personal data. If you are such a provider, obviously, your “forget me” endpoint should be exposed. Calling the 3rd party APIs to remove data is not the full story, though. You also have to make sure the information does not appear in search results. Now, that’s tricky, as Google doesn’t have an API for removal, only a manual process. Fortunately, it’s only about public profile pages that are crawlable by Google (and other search engines, okay…), but you still have to take measures. Ideally, you should make the personal data page return a 404 HTTP status, so that it can be removed.
  • Restrict processing – in your admin panel where there’s a list of users, there should be a button “restrict processing”. (The user settings page may also have that button with a dropdown to select from the Article 18(1) options). When clicked (after reading the appropriate information), it should mark the profile as restricted. That means it should no longer be visible to the backoffice staff, or publicly. You can implement that with a simple “restricted” flag in the users table and a few if-clasues here and there.
  • Export data – there should be another button – “export data”. When clicked, the user should receive all the data that you hold about them. What exactly is that data – depends on the particular usecase. Usually it’s at least the data that you delete with the “forget me” functionality, but may include additional data (e.g. the orders the user has made may not be delete, but should be included in the dump). The structure of the dump is not strictly defined, but my recommendation would be to reuse schema.org definitions as much as possible, for either JSON or XML. If the data is simple enough, a CSV/XLS export would also be fine. Sometimes data export can take a long time, so the button can trigger a background process, which would then notify the user via email when his data is ready (twitter, for example, does that already – you can request all your tweets and you get them after a while). You don’t need to implement an automated export, although it would be nice. It’s sufficient to have a process in place to allow users to request their data, which can be a manual database-querying process.
  • Allow users to edit their profile – this seems an obvious rule, but it isn’t always followed. Users must be able to fix all data about them, including data that you have collected from other sources (e.g. using a “login with facebook” you may have fetched their name and address). Rule of thumb – all the fields in your “users” table should be editable via the UI. Technically, rectification can be done via a manual support process, but that’s normally more expensive for a business than just having the form to do it. There is one other scenario, however, when you’ve obtained the data from other sources (i.e. the user hasn’t provided their details to you directly). In that case there should still be a page where they can identify somehow (via email and/or sms confirmation) and get access to the data about them.
  • Consent checkboxes (or yes/no options) – “I accept the terms and conditions” would no longer be sufficient to claim that the user has given their consent for processing their data. So, for each particular processing activity there should be a separate checkbox on the registration (or user profile) screen; or clear yes/no buttons. You should keep these consent checkboxes/buttons in separate columns in the database, and let the users withdraw their consent (by unchecking these checkboxes from their profile page – see the previous point). Ideally, these checkboxes should come directly from the register of processing activities (if you keep one). Note that the checkboxes should not be preselected, as this does not count as “consent”. Another important thing here is machine learning/AI. If you are going to use the user’s data to train your ML models, you should get consent for that as well (unless it’s for scientific purposes, which have special treatment in the regulation). Note here the so called “legitimate interest”. It is for the legal team to decide what a legitimate interest is, but direct marketing is included in that category, as well as any common sense processing relating to the business activity – e.g. if you collect addresses for shipping, it’s obviously a legitimate interest. So not all processing activities need consent checkboxes.
  • Re-request consent – if the consent users have given was not clear (e.g. if they simply agreed to terms & conditions), you’d have to re-obtain that consent. So prepare a functionality for mass-emailing your users to ask them to go to their profile page and check all the checkboxes for the personal data processing activities that you have. Update: since we’ve been swarmed with useless consent and privacy policy emails: this is ONLY needed if your previous consent was no clearly given. In many cases it has been, so don’t overdo it.
  • “See all my data” – this is very similar to the “Export” button, except data should be displayed in the regular UI of the application rather than an XML/JSON format. I wouldn’t say this is mandatory, and you can leave it as a “desirable” feature – for example, Google Maps shows you your location history – all the places that you’ve been to. It is a good implementation of the right to access. (Though Google is very far from perfect when privacy is concerned). This is not all about the right to access – you have to let unregistered users ask whether you have data about them, but that would be a more manual process. The ideal minimum would be to have a feature “check by email”, where you check if you have data about a particular email. You also need to tell the user in what ways you are processing their data. You can simply print all the records in your data process register for which the user has consented to.
  • Age checks – you should ask for the user’s age, and if the user is a child (below 16), you should ask for parent permission. There’s no clear way how to do that, but my suggestion is to introduce a flow, where the child should specify the email of a parent, who can then confirm. Obviously, children will just cheat with their birthdate, or provide a fake parent email, but you will most likely have done your job according to the regulation (this is one of the “wishful thinking” aspects of the regulation).
  • Keeping data for no longer than necessary – if you’ve collected the data for a specific purpose (e.g. shipping a product), you have to delete it/anonymize it as soon as you don’t need it. Many e-commerce sites offer “purchase without registration”, in which case the consent goes only for the particular order. So you need a scheduled job/cron to periodically go through the data and anonymize it (delete names and addresses), but only after a certain condition is met – e.g. the product is confirmed as delivered. You can have a database field for storing the deadline after which the data should be gone, and that deadline can be extended in case of a delivery problem.
  • Cookies – cookies are subject of a different regulation (a Directive that will soon become a Regulation). However, GDPR still changes things when tracking cookies are concerned. I’ve outlined my opinion on tracking cookies in a separate post.

Now some “do’s”, which are mostly about the technical measures needed to protect personal data (outlined in article 32). They may be more “ops” than “dev”, but often the application also has to be extended to support them. I’ve listed most of what I could think of in a previous post. An important note here is that this is not mandated by the regulation, but it’s a good practice anyway and helps with protecting personal data.

  • Encrypt the data in transit. That means that communication between your application layer and your database (or your message queue, or whatever component you have) should be over TLS. The certificates could be self-signed (and possibly pinned), or you could have an internal CA. Different databases have different configurations, just google “X encrypted connections. Some databases need gossiping among the nodes – that should also be configured to use encryption
  • Encrypt the data at rest – this again depends on the database (some offer table-level encryption), but can also be done on machine-level. E.g. using LUKS. The private key can be stored in your infrastructure, or in some cloud service like AWS KMS.
  • Encrypt your backups – kind of obvious
  • Implement pseudonymisation – the most obvious use-case is when you want to use production data for the test/staging servers. You should change the personal data to some “pseudonym”, so that the people cannot be identified. When you push data for machine learning purposes (to third parties or not), you can also do that. Technically, that could mean that your User object can have a “pseudonymize” method which applies hash+salt/bcrypt/PBKDF2 for some of the data that can be used to identify a person. Pseudonyms could be reversible or not, depending on the usecase (the definition in the regulation implies reversibility based on a secret information, but in the case of test/staging data it might not be). Some databases have such features built-in, e.g. Orale.
  • Protect data integrity – this is a very broad thing, and could simply mean “have authentication mechanisms for modifying data”. But you can do something more, even as simple as a checksum, or a more complicated solution (like the one I’m working on). It depends on the stakes, on the way data is accessed, on the particular system, etc. The checksum can be in the form of a hash of all the data in a given database record, which should be updated each time the record is updated through the application. It isn’t a strong guarantee, but it is at least something.
  • Have your GDPR register of processing activities in something other than ExcelArticle 30 says that you should keep a record of all the types of activities that you use personal data for. That sounds like bureaucracy, but it may be useful – you will be able to link certain aspects of your application with that register (e.g. the consent checkboxes, or your audit trail records). It wouldn’t take much time to implement a simple register, but the business requirements for that should come from whoever is responsible for the GDPR compliance. But you can advise them that having it in Excel won’t make it easy for you as a developer (imagine having to fetch the excel file internally, so that you can parse it and implement a feature). Such a register could be a microservice/small application deployed separately in your infrastructure.
  • Log access to personal data – every read operation on a personal data record should be logged, so that you know who accessed what and for what purpose. This does not follow directly from the provisions of the regulation, but it is kinda implied from the accountability principles. What about search results (or lists) that contain personal data about multiple subjects? My hunch is that simply logging “user X did a search for criteria Y” would suffice. But don’t display too many personal data in lists – for example see how facebook makes you go through some hoops to get a person’s birthday. Note: some have treated article 30 as a requirement to keep an audit log. I don’t think it is saying that – instead it requires 250+ companies (or companies processing data regularly) to keep a register of the types of processing activities (i.e. what you use the data for). There are other articles in the regulation that imply that keeping an audit log is a best practice (for protecting the integrity of the data as well as to make sure it hasn’t been processed without a valid reason)
  • Register all API consumers – you shouldn’t allow anonymous API access to personal data. I’d say you should request the organization name and contact person for each API user upon registration, and add those to the data processing register.

Finally, some “don’t’s”.

  • Don’t use data for purposes that the user hasn’t agreed with – that’s supposed to be the spirit of the regulation. If you want to expose a new API to a new type of clients, or you want to use the data for some machine learning, or you decide to add ads to your site based on users’ behaviour, or sell your database to a 3rd party – think twice. I would imagine your register of processing activities could have a button to send notification emails to users to ask them for permission when a new processing activity is added (or if you use a 3rd party register, it should probably give you an API). So upon adding a new processing activity (and adding that to your register), mass email all users from whom you’d like consent. Note here that additional legitimate interests of the controller might be added dynamically.
  • Don’t log personal data – getting rid of the personal data from log files (especially if they are shipped to a 3rd party service) can be tedious or even impossible. So log just identifiers if needed. And make sure old logs files are cleaned up, just in case
  • Don’t put fields on the registration/profile form that you don’t need – it’s always tempting to just throw as many fields as the usability person/designer agrees on, but unless you absolutely need the data for delivering your service, you shouldn’t collect it. Names you should probably always collect, but unless you are delivering something, a home address or phone is unnecessary.
  • Don’t assume 3rd parties are compliant – you are responsible if there’s a data breach in one of the 3rd parties (e.g. “processors”) to which you send personal data. So before you send data via an API to another service, make sure they have at least a basic level of data protection. If they don’t, raise a flag with management.
  • Don’t assume having ISO XXX makes you compliant – information security standards and even personal data standards are a good start and they will probably 70% of what the regulation requires, but they are not sufficient – most of the things listed above are not covered in any of those standards

Overall, the purpose of the regulation is to make you take conscious decisions when processing personal data. It imposes best practices in a legal way. If you follow the above advice and design your data model, storage, data flow , API calls with data protection in mind, then you shouldn’t worry about the huge fines that the regulation prescribes – they are for extreme cases, like Equifax for example. Regulators (data protection authorities) will most likely have some checklists into which you’d have to somehow fit, but if you follow best practices, that shouldn’t be an issue.

I think all of the above features can be implemented in a few weeks by a small team. Be suspicious when a big vendor offers you a generic plug-and-play “GDPR compliance” solution. GDPR is not just about the technical aspects listed above – it does have organizational/process implications and many questions to be answered. But also be suspicious if a consultant claims GDPR is complicated. It’s not – it relies on a few basic principles that are in fact best practices anyway. Just don’t ignore them.

84 thoughts on “GDPR – A Practical Guide For Developers”

  1. Hi Bozho,
    Excellent article. I was wondering what your thought were on how to handle historical backups when implementing “Forget me”. Would every backup containing data on the subject need to be restored in order to delete the relevant data and then subsequently backed up again? This could be a nightmare scenario for a large company with a lot of data and a lot of forget me requests
    Kind Regards,
    Darren

  2. I really like your article! I have just one comment which I think is worth to mention, you do not have to implement everything, if you could with high probability assume that for example right to portability will be used very rarely you could define manual process for extracting personal data from database and use it when it will be needed. I think GDPR put a requirement on data Controller to provide possibility to do so, the way is up to the controller.

    Kind regards,
    Albert

  3. Further to Darren’s comment/question about right-to-erasure and backups: A similar problem also occurs when using the event sourcing architecture, if personal data is stored in an immutable event log. One option for these scenarios is to use cryptographic erasure: encrypt personal data field upfront, with a key specific to the data subject, and deleting the key when needed to enforce deletion of the data. This is something we’ve implemented for Java. More info here: https://axoniq.io/events/2017/11/gdpr-webinar.html

  4. Excellent article ! Is it possible to have backlinks references to the officiel GDPR pdf ? Like article/paragraph for each recommandation ?

    Thanks a lot.

  5. @Darren I added a little more about backups. Basically, you keep a list of forgotten user IDs and re-delete them on restore.

    @Frans yes, that’s a good approach. In some cases events (in event sourcing) can be deleted or modified/anonymized without affecting anything else, so that’s also an option (slightly easier, but potentially breaking)

    @Albert – that’s right. It better be automated, but it doesn’t have to be. I’ll add a clarification
    @Dawn – yup

  6. Nice to read article, but I don’t think it is as easy as this.

    Basically you assume, that you already have perfect data quality and have identified all persons with some account id. But the regulation never mentions some id, it requires to identify natural persons, not accounts.

    Some example from my real live experience with data we have seen at almost every customer companies. You have an contract with an ISP for your internet and another contract with the same ISP for your mobile phone. What we have seen is, that most of the companies create TWO seperate accounts for this and don’t get the data connected. Especially, if there are some company fusions or just different departments.
    The result is currently, that you might get two ad mails for a new product of the ISP.

    For the GDRP it would NOT be sufficent, to make some buttons after the account login, if the natural person has two accounts. You have to find ALL data regarding this one natural person. So the buttons are good, but if you don’t control your data quality you could get into trouble.

    So, you are right, GDPR is not THAT complicated, but it isn’t THAT easy as you say. The basic implementation for some features might only require some weeks, but only if you already have solved some very hard problems. Maybe it is quite easy for small or “new” companies, which only have ONE (at most two) database, but for most companies we talk to, this is not the reality.

    Just my thoughts (I work at a company with heavy experience with data(-quality))
    Greetings Marcel

  7. Well, yes, data deduplication is something I do expect to have happened already. But I can add it to the list of best practices.

  8. Hello Bozho,
    first of thank you for a comprehensive and an exhaustive article.
    I have a bit of an obscure question.
    How about third parties who generate user interaction data which is used for ROI, conversion and such measurements?
    Especially where they don’t explicitly or implicitly know the user ?
    do those 3rd parties need to provide data export for the specific user?
    I am asking because in order to offer an export, they’d need to be able to bind the actual app user to their user agnostic tracking system.

  9. If they can’t deduce the user, they cannot do any of the above.
    However, they should follow the e-privacy directive and the upcoming e-privacy regulation which defines how cookies and other tracking mechanisms are used

  10. Thanks for this very nice article!
    I have only one question : Is it an exhaustive list of gdpr development or do you think there are more to do?
    Stephan

  11. I think it covers the most common use cases . There will certainly be edge cases depending on the business needs that are not covered above, though. The other day we got such a question – “what to do in case we get the data of the user and their consent over the phone”. Seems like the proper thing is to just mark the consent in a CRM on behalf of the user, but it is not yet clear – maybe some call archiving will be needed in case of sensitive data? Can’t say at this point without consulting with legal experts.

  12. Thanks for sharing your analysis.
    I’ve spent some days in 2017 to scan official ressources, including the original GDPR text, and for some points, I came to a slightly different conclusion.
    Basically, almost every time you write “must” (encrypt data base content; provide a data download button; allow direct personal data editing; etc…), on my side, went to the conclusion that this is an option, not a requirement.
    What is required is to grant each individual access to their personal data; the how (is it automated or manually) is not enforced. Thus, a snail mail process would meet the requirement.
    Regarding encryption, the text states “shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk”.
    The notion of “level of security appropriate to the risk” is key here : whether data are usual ecommerce data (postal address) or personal insurance data (history of failures, …) does matter, and measures are to be adapted.
    By the way, it is not only a technical point, but also a process point : what about the developer who would code the encryption of the data : how do you ensure that he will not be able to access/decypher all data ?

  13. Hi Bozho

    Thanks for a great article.

    I was curious if GDPR only applies to client data or if it also applies to employee/admin user data as well.

    For example with event sourcing or access logs, would have something like “Employee X changed Customer Y’s address on 01/01/2018”. Can the employee/admin ask for their data to be forgotten? (eg when employee leaves company)

    What would you recommend for people who are both customers and employees?

    Thanks

  14. It applies to employee/admin data as well, yes, BUT it is based on contract, rather than consent. So the employee can’t ask to be forgotten. You just have to define a data retention period for that kind of audit data (it shouldn’t be “forever”)

  15. You are correct. It is more fuzzy than “must” vs “must not”. I’ve listed the general good practices that would make you safer, but whether a compliance audit will absolutely require them – depends on many factors.

  16. Great article. Helps our developers really grasp the concepts i’ve been trying to get across on our implementation journey. We are now more focused. Our real challenge is in implementing a solution for data at rest that avoids having to encrypt the whole database.

    Any advice on that will be appreciated.

  17. Great article, I’m just in the process of ensuring GDPR compliance in our reporting databases, and one issue that we’re having is with our main Datawarehouse, in which we’ve got data from about 5 legacy systems combining, we’re finding that realistically we need to obfuscate the personal details identically for the systems, so that I’ve got the same fake name and postcode in each system (for instance) to be able to match or throw up anomalies in an exception report.

    Ideally, the best approach would be to start afresh with empty systems and populate each with specific test data but getting the diversity, volume and historical issues would mean the data was hugely unrealistic and at the point of going live we’d hit new unforeseen issues that the clean, sensible test data didn’t expose.

    Another challenge is when updating the data with live deltas, the obfuscation needs to be similarly consistent – so, for example, we’ve got me, Mr Smith, first appearing in our CRM system as a lead, so that creates a record in the warehouse and after anoymisation, I’m ‘Mr Jones’ (along with obfuscated email, phone, address etc) – then I sign up as a customer in the sales system, we have to go back to the CRM system and find my pseudonym and use that, whereas if I’ve just appeared directly in the Sales system, they’ll need to come up with a new pseudonym, randomly generated, and then later, if I appear in the CRM system (if they did a customer mailout for instance) they’ve got to do the same. Essentially, the first time I appear in any system I’m given my pseudonymous values, and appearances in subsequent systems must tie back to that first appearance. Also this needs to be done on a field by field basis, as not all systems have all the same fields – (one might have email, another not).

  18. Hi Bozho,
    I am not developer but a “manager” 🙂 However I would like to ask a developer type question. Is it possible that access to data within a database is granted via an API that ensures you should have access. That way developers cannot use PHP encoded into the webpage to see all data without logging their access.
    Thanks
    James

  19. Dear Bonzo,

    For me there is a contradiction between the “forget me” functionality and when you were saying you can restore the database with a backup and then erasing those users’ personal data.
    In my understanding this should not be enough to do compile with the regulations. Same goes for encrypting your backup, I just fail to see how is that compatible with GDPR.
    My understanding is that you have to delete EVERY personal information you storing about the specific person in your system. Doesnt matter if its in a log file or database or happen to be in a backup file.

    But please correct me id im wrong.

  20. Yes, but that just shifts the responsibility to the developers of the API. Ultimately someone will have to write queries

  21. “Yes, but that just shifts the responsibility to the developers of the API. Ultimately someone will have to write queries”

    Well, yes and no. The company can get audited, so it’s not really the developer responsibility of the API. It’s the whole company, and the developer has to make the requirement efforts to compile with the regulations. And again, I don’t think leaving the user personal data in backups is compatible with GDPR.
    Although I don’t know a better solution neither, since every company has incremental backups and it just makes it close to impossible to do such a thing – removing personal data from those backups too -.

  22. About the API – from organizational point of view it is of course better to limit the number of people (and applications) that have direct access to the database. No doubt about it.

    As for backups – since eventually old backups are discarded (even in the case of incremental backups, full backups are performed), then I think you are fine with having an encrypted backup + a separate table with forgotten users. Apart from that, I agree, you can’t delete personal data from backups. It’s sufficient to acknowledge that, to protect the backups (encrypt, limit access to them), and have them expire. I guess..

  23. Hi! Thanks for the article. I wonder about the needs of adding a chexbox to express consent vs showing the text “By submitting, I accept…”. I’ve been doing some research and Recital 32 of EU GDPR says:

    > This could include ticking a box when visiting an internet website, choosing technical settings for information society services or another statement or conduct which clearly indicates in this context the data subject’s acceptance of the proposed processing of his or her personal data.

    > Silence, pre-ticked boxes or inactivity should not therefore constitute consent.

    By “statement or conduct which clearly indicates in this context the data subject’s acceptance of the proposed processing” I understand that the current legal solution “By submitting, I accept the terms and conditions…” works.

    What do you think?

  24. Hi Bozho,

    we’re developing a complementary product to your LogSentinel: we’re scanning arbitrary storages (local or remote) for personal and sensitive data, at pii-tools.com.

    Drop me an email if interested in joint opportunities for enterprise clients.

  25. I think the spirit of the regulation is that consent should be clearly given. “By clicking I accept” would be frowned upon to say the least.

  26. Thanks again for this article.

    How about pictures of people? If a person wants to us to delete all his personal data, would we be required to delete all pictures that person is part of?

  27. Hi Bozhidar, thanks for the tips. I have a question, you said :

    Restrict processing – […] That means it should no longer be visible to the backoffice staff, or publicly. […]

    But if our staff is no longer able to see (for instance in an e-commerce environment) previous orders, addresses, phone numbers, e-mails or even registered issues, we will not be able to offer our services. If that would be the case, could we block that user data processing action to make it mandatory?

    Thanks!

  28. The user has to be clearly informed that he won’t be able to get the service. Also, if there are any pending orders, restriction of processing can be delayed until they are fulfilled

  29. Are IP addresses considered publicly identifiable information? If we store IPs in usage logs, etc, can they be retained?

  30. IPs are peresonally identifiable information, however you can store them in logs as soon as you have some rotation policy. They are unstructured information used for diagnostics. And I believe they fall under the “legitimate interest” of the controller. So just mention in your privacy notice that you collect the IP and it should be fine

  31. Do you have to delete the User record? Or is it OK to just nullify all of the identifying fields (or all of the fields apart from the primary key and a ‘This user is deleted’ flag if you want to be sure)?

    The user may remain identifiable by pattern. “This is a diabetic in North London who likes size 38 pink trousers” – based on all the Orders that have the same User ID?

    Alternatively delete the User record and switch foreign keys to a special “The Forgotten User” instead?

  32. The definition of personal data is not clear to me:
    “every piece of data that can be used to uniquely identify a person or data that is about an already identified person.”
    Does that mean that an IP address or a Mac adress are personal data or customer_id?
    Is it allowed to cross data from different systems according to this fields ? for statistics on services for example?

  33. if we delete the contact info and/or anonymize the user, how do we prove in an audit that we adhered to the request? how do we link back to the data subject and the request?

  34. Does any of this also apply to emails that a company has received from and sent to its customers?

    Also, what’s to stop this from becoming the next target for the scum of the world to prey on companies, like patent trolls and ransomware authors already do. Imagine a new industry of ill repute, where bad guys intentionally interact with company websites, apps, and take other actions to get their personally identifiable information in the company systems. Then, bad guy enacts some provision of the GDPR to test whether said company is in compliance, and when they see the isn’t , e.g. one piece of information they know they are provided wasn’t reported on during a request for their stored information…lawsuit! Rinse and repeat for infinite profit.

  35. Both may be okay. You’d have to assess how likely it is to be able to identify a particular person (and not just one, accidentally, but many) by the data. Probably not very likely, especially if you replace full addresses with a larger area.

  36. IPs and MAC are personal data, explicitly noted in the regulation. customer_id is an identifier and is personal data only in combination with the rest of the personal data.
    You can have statistical data.

  37. You can store an identifier and even a tiny piece of personal data (e.g. email) in a log in order to prove you have followed the request for erasure. This is in line with the accountability principle.

  38. I think the regulators are not supposed to allow such trolling behaviour and will be mild to first-time offender companies.

    (Not sure about emails – my hunch is “yes”, but there might be an exception)

  39. While deduplication might be important to have correct data, deduplication might also be against the data minization principle. Suppose a customer orders two times but with seperate addresses and you are allowed to store the data, because you still owe a warranty obligation to the customer, there is no need – and often no right – to link both purchases to the same id.
    If users can have multiple accounts, you do not have to link them together in order to comply – rather to the contrary. This can be against the data minimization requirement of the GDPR.
    If a user called “Jack Smith” wants to know, what data you know about him, you do not have a requirement to use a unique id for this Jack Smith in your whole company. You can ask the user, to provide his prior addresses in order to fullfill his information or deletion request.
    Art. 15-18 are no excuses to join datasets that do not have to be linked together otherwise.
    The same is true for large companies. If there is a request from an indivdual, rather than centrally collecting the data at one place you should send the request to all relevant systems, so they can individually comply with the request.

  40. Thank you for your effort.
    Let me ask common questions, but with a concrete scenario.
    To pose this very question, I was asked to provide my email address. Will the email address alone be “personal data” subject to GDPR?
    I suppose that WordPress here has sent you an email notification reporting my question and my email address.
    So, if the email address were subject to GDPR and I should ask you to delete it, should you remove the notification email from your mailbox? Should you also make sure that WordPress and your email provider do the same on their archives?

  41. Yes, the email is personal data. (You don’t have to give consent, because I’m processing it for a legitimate interest – to verify you and to notify of follow-ups.
    As for erasure – if you send me an email “delete all my comments”, I’d have to. Whether I’ll have to delete my emails is a good question. I think so (can’t think of a reason not to)

  42. Excellent article, thanks a lot for sharing!

    We are a Spanish based company and would like to use Contactually CRM, I have not found nothing about GDPR (maybe they are marketing mainly the US). I found this in their privacy policiy statement. Does this mean we could not use Contactually as our CRM? Thanks a lot, Ramon

    “Consent To Processing In The United States. By providing any Personal Information and/or Content to Contactually, all users, including, without limitation, users in Canada and the member states of the European Union, fully understand and unambiguously consent to this Privacy Policy and to the transfer of such Personal Information across international borders in accordance with Contactually’s standard operations, including the collection, storage, and processing of such information in the United States of America or other countries in which our employees and contractors may be located.”

  43. @Ramon you can ask them directly if they are GDPR compliant and keep their response in your documentation to present to a regulator in case of an inspection.

  44. Hi Bozho,

    this article was one of the reasons why we started our work on GDPR SDK – I’m a developer, and it’s easier for me to understand something looking into a source code than into legal stuff.

    We’ve developed and recently published open source client SDK, which can be used as a starting point to make application GDPR compliant. Some of the things covered in this SDK are data subject rights (inform, object, rectification, erasure, …). Since there is no one universal solution for the GDPR, our approach was to create and document interfaces (with explanations and links to specific GDPR articles) and default implementations. It’s on a developer to implement actual code for deletion of personal data or code for rectification, etc. – but – I believe that guidance that this SDK provides can be helpful.

    The project is in BETA and in C#, we’re working on other languages. Url is: https://github.com/gdprhq/GdprHq.Io.ClientSdk

    Hopefully, other developers will find this useful too 🙂

    Nino

  45. Great article Bozho! Re: backup strategy: our current thinking is (this is a brand new application) to store any personal data in a PersonalData table, and have a trigger-updated version of it called PersonalDataBackup that is automatically updated whenever the first changes, but in this latter table all actual personal data is stored encrypted (there will be a few fields that are not encrypted, mostly FKs; there will be a process (stored procedure) that will allow the restoration of this data to the PersonalData table, if you have the PersonalDataKey key). The encryption key is stored in a separate PersonalDataKey table. Now, PersonalData will never be backed up. PersonalDataKey will only be backed up e.g. for a two-week moving window only. PersonalDataBackup will be backed up with the rest of the database with standard backup policies. When a “forget me” request arrives, we will delete the personal data part of the PersonalData record and will delete the key part of the PersonalDataKey table. This way it is guaranteed that within two weeks we will completely forget the personal data and we still maintain our PKs that may be FKs elsewhere. What do you think?

  46. What about database transaction logs? When you delete or null a field that obviously doesn’t “delete” the data, it is still in the transaction logs from the create/insert/update statements.

  47. It’s important to think about how this new regulation can be abused.

    The “export data” feature can be abused by someone with illegitimate access to the subject’s account. Controllers should require additional authentication before exporting personal data in bulk. A validation code/link sent by e-mail is a great idea, even when background processing is not necessary.

    Deleting shipping information too quickly after delivery can be abused for mail fraud. A data subject can dispute charges up to six months after accepting delivery, depending on the card issuer’s jurisdiction.

  48. What makes you think that transaction logs do not have to be cleaned? Art 17:1 sounds pretty straightforward:
    “The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay where one of the following grounds applies”
    “Mark as deleted in database” is not the same as “erasure”, AFAIK. What text would you say support the view that transaction logs do not have to be cleaned? (and I really truly do hope you are right on this one, to be clear!)

  49. Cleaning up transaction logs is disproportionate and technically infeasible (and that’s a principle in the regulation).
    How long do you keep transaction logs? 30 days? a year? The transaction log will eventually get rotated and the data will be gone, right?

  50. Hello, Bozho! Thank you for the comprehensive article! I have a question about the Export data part – we have invoices and warranty policies attached to customers. Whit the Export data functionality should we implement exporting the whole document (as pdf for example) or it is enough to export just a csv file with a small amount of information (number of invoice, number of invoice’s order for example)? Me and my team are worried that csv file is not well-known by users and they will not be satisfied with this kind of export. Thank you in advance and keep up with the excellent work!

  51. PDFs are generally not considered “machine readable format”, and the requirement for data portability is about machine readability. The point of the right is to be able to export it to other controllers. The user doesn’t necessarily need to be able to make sense of the exported data.

  52. Thank you for this article!

    Question: How do we reconcile “Don’t log personal data” with “Log access to personal data” via the “User X did a search for criteria Y” example provided? If the search is by full name, Y would possibly contain personal data.

    Is this seen as an OK exception since it’s just a factual record of a text search someone performed?

  53. Hi Bozho, On the matter on the right to be forgotten do you think its ok to keep an audit log of accounts that have been deleted. At the moment we use the user email address as the key to their account details. We were thinking that as requests for deletion of account come in that we have a deletion audit log to say as an when an account has been deleted. Then in the audit log we still use the user’s email address and timestamp and of course nothing else of there details.

    We are basically thinking that at some time in the future we may need to prove/confirm when an account was requested to be deleted. But somehow we need to store a tiny bit of the user’s data to be able to identify them so that we can confirm this action took place?

    Thanks Nick

  54. @BrianC – First, it’s a different log – the audit log vs the regular debug/info/system log. There are two aspects here. The first is, you log employees data when they use the system, and that stems from your contract with them, so it’s fine (and you shouldn’t be able to forget them). The second aspect is “defence of legal claims”. You could store a minimal information in order to be able to prove at a later point what exactly happened in case there’s litigation. Obviously, don’t log all the PII about a user, but “user X did Y” is completely fine.

  55. @Nick Teagle absolutely correct – you should keep the deletion audit log with a minimal amount of data that’s able to identify the deleted person.

  56. Great article. Do you know anything about how long to keep an audit log according to GDPR? As mentioned in comments, keeping it forever doesn’t make sense, but wonder if GDPR mentions anything about this.

  57. I deleted them. But btw GDPR doesn’t institute a process of confirmation. I have to comply with your request, but notification is optional.

  58. One of the principles of the GDPR is that data is “accurate and, where necessary, kept up to date…”.
    As organisations are unaware of a change is personal information until it is offered (such as a change in marital status) is it then a requirement for an organisation to send out reminders that personal data has not changed since the last reminder or is it more about accurately processing a change in the customers information when the customer offers the information to the organisation?

    I.e. Do we need to pro-actively ask customers if their information has changed or is it about processing a change of status when the customer tells us that it has changed?

    Thanks,

    Patrick

  59. @Patrick no, unless it is necessary for your business model or for compliance with other legislation.

  60. If I write an SDK that allows other developers to upload files to a remote location (FTP, Amazon S3, etc.) but I never see the data personally am I still a Processor?

    No data is saved by my SDK, and no information about the data transmitted by the SDK is recorded. There is no information to report or retrieve about personal data. It’s possible personal data could flow throw my SDK but I would never know it or record any details about it. What are my responsibilities?

    Thank you very much for your time.

  61. You don’t have responsibilities. You (as a legal entity) don’t process data on behalf of controllers. Otherwise Apache and Nginx would be processors of every company in the world 🙂

  62. Nice to see another practical guide @bohzo working out the dev changes needed to make to existing apps after putting some basic frontend changes still feels the hard work (after training staff and customers to understand the changes) as to how much is enough. Will be a rolling process I feel.

  63. Is it true that you can’t have enabled Google Analytics by default. You need approvement by visitor? In that case, GA (and a lot of other tools) is useless.

  64. I’m not sure I agree that IP addresses are personally identifyable pieces of information. Most end users are going to be using NAT, so multiple users will share an external IP. AIUI this is a defence oft used in copyright infringement cases, ie IP != Person, you can’t prove who it was.

    Also it’s a given that most ISPs use DHCP to allocate and reallocate addresses as they see fit.

    We only store data required to process an (online) order, “not due to contract enforcement or legal obligation” suggests that “The Tax Man” is adequate reason not to delete order processed data, as any inspection would require the presentation of all the order information on request for validation purposes.

    Do you think this is the case?

    There seems to be a little conflict. If I walk into a shop and pay cash, they record the sale, but not who I am and I suspect “The Tax Man” doesn’t care who it was.

  65. Does any mention of the person’s name have to be expunged from all of your systems? What about ‘notes’ fields where someone is able to type anything they like, including names? Are these supposed to be searched for any incidence of the name to be removed and the name replaced? If so what formats of the name? (Mr Smith, J Smith, Mrs S Smith)

  66. Thank you for the detailed explanation.
    Question: How about an “anonymous” forum, similar to the one below your articles, but without email field. There is a nickname field, which might contain real name in some cases (maybe less than 5%). Should we change the form because of GDPR? Should we honor delete requests and how can we check if they are legit?

  67. Thank you very much to share this precious informations !
    I would like to ask other questions focus on developer working for e-commerce companies.

    1. Statut, Liability and data breach
    a) What is our statut regarding GDPR, Are we subcontractor of e-commerce company?

    b) Who is reliable if a developer is working for an e-commerce platform, provided through another company (umbrella company).

    c) If we maintain a server for an e-shop, and this server gets hacked which results in lost data. Who is reliable for the loss of personal data of my client and potentially the loss of personal data of my client’s customers? Me, the server host or the client?

    3 Privacy by design
    GDPR tells us not to keep personal data longer than neccessary and clients can ask for the removal of their personal data of our systems, but how is it manageable in practice with potentially hundreds of backups of customer data and the need of keeping invoices for instance ?

    4. Risk analyse compliance
    GDPR asks to analyse processing and identify risks toward rights and liberties but how to measure this without any experience about it ? Should we imagine different scenarios likely to happen ?

    Thanks again !

  68. Hello again, Bozho! Thank you for the first quick answer! Now me and my team have run into another issue. We are working for an online shop, so we are storing user’s invoices, warranties, insurances. What should we do with these documents once someone wants to be forgotten? Would it be okay if we keep them for a specific period of time since for example we need invoices to do our accounting? Thank you in advance!

  69. @Gabriela – yes, you should keep those. They are in most cases requried by specific legislation (either accounting legislation or user protection legislation for warranties). But define a retention period and delete them once the warranty expires or the accounting legislation period expires

  70. Super interesting article that has allowed me to get to know better what GDPR is about without using much legal wording, but a more technical approach, which is what I understand best as a Web Developer.

    Thanks!!

  71. Hi Bozho! Thanks for the great writeup. Very easily consumed and actionable.

    Two quick question about how about personal data:

    1. If you have less structured fields (like a notes area) and a user enters their name or address in notes, how do you know? How do you test for that? Do you need to?

    2. What about data that might not be about the user, but is PII on another person? What if I enter a list of my friends’ email addresses and names? Is it possible for my friend, Jose, jose@jose.jose, to come to a site, while not being a “user” of the site, and say “Do you have personal information about me?” Since I (a user) just put Jose’s name and email into the site, there is personal information about Jose there.

    How could the site handle that?

    Thanks!

  72. Very good questions.

    1. No, you don’t have do much. It would be considered a good practice to scan for SSNs and email addresses and warn the user about that, but the fact that it’s unstructured means it’s not prone to easy large-scale processing.

    2. You have to inform the non-registered user that you have data about them. This would only apply if the information is provided in a structured way, e.g. a field where the user enters the emails one by one/comma separated and then you store them in separate fields. In that case, if Jose comes, you have to reply “yes, we have your email and we got it from user X”. This being an unlikely scenario (especially if you don’t use the email for “neughty things”), it can be a manual process

  73. @Bozho Thanks for the reply.

    From a technical perspective, if you have private data about a non-registered user, how would you validate it’s the right person?

    If we know it’s Jose with the email jose@jose.jose, then we can implement email confirmation to make sure it’s the right person before we send that data out.

    But what if instead all we have is Jose’s name, SSN, and political affiliation? (As an example. I chose that data because it’s obviously well protected by GDPR.) How do we validate you’re the real Jose before telling you what data we have about you?

    Also, can Jose request to be forgotten? We have data about Jose, but that is data another user has entered as notes. If we delete a registered user’s notes, that seems likely to make them very unhappy.

    (Aside: let’s assume for all these questions that we aren’t doing anything naughty with the data. It’s there for the registered user solely as a notepad or contact list / address book.)

    Thanks!

  74. I am having tough time understand where you came up with some of these.

    I looked at https://gdpr-info.eu/chapter-3/ (is that where you got most of your technical requirements from?)

    Where does it say shipping information needs to be removed after its shipped? Also where does it say you need to mass mail existing users to check consent boxes? Confusing.

  75. @Tom indeed, I assumed the case where you have email and are thus able to identify the subject. If you don’t have an email, you can use other means (have them send their SSN, but only for comparison – don’t store it). SSN is bad for authentication, as we know, but it’s something.. As for deleting data that other people have entered – you don’t have to do that automatically. You have to balance the consequences. If the note is public and contains Jose’s SSN number – go ahead and delete it 🙂 If it’s just some useful information for his friend – keep it.

    @Dark web – it doesn’t say you have to mass-mail them, but the recitals say that consent should be re-requested in accordance with GDPR requirements. Which practically means either mass-emailing, or waiting for their next login and marking them as inactive until then. I didn’t get the “shipping” part . You don’t have to delete data after exporting it. Only if the user requests so.

Leave a Reply

Your email address will not be published. Required fields are marked *