“Forget me” and Tests
Your users have profiles on your web application. And normally you should give them a way to delete their profiles (at least that’s what the European Court has decided).
That “simply” means you need to have a /forget-me
endpoint which deletes every piece of data for the current user. From the database, from the file storage, from the search engine, etc. Apart from giving your users at least partial control over their own data (whether you can have it or not is their decision), it is also a benefit for developers.
Apart from your isolated unit tests, containing a lot of mocks, you have other sorts of tests – integration test, acceptance tests, Selenim tests. All of these need a way to leave the database in the same state that it was before they were executed. In some cases you can use read-only transactions (e.g. with spring-test you get that automatically), or you can use an in-memory database and hope it will work the same way as your production one, or you can drop the database and recreate it on each run. But these are partial solutions with some additional complexity.
The best way, I think, is to just reuse the “forget me” functionality. From your acceptance/selenium tests you can call the /forget-me endpoint at the end of the test (tearDown), and for your integration tests y. If you distribute client-side APIs (or a third-party is building them against a test deployments of your system), you can again call the forget-me endpoint.
That, of course, doesn’t cover non-user-related data that you need in the database. If you have such data (apart from enumarations and data that should be always there), you have to take care of it separately.
Doesn’t that bring some additional complexity as well, and the constant need to update your forget-me functionality? Isn’t having read-only transactions, or a shell script that recreates the database after each run, simpler to support? Assuming that you need to have a properly working forget-me functionality anyway – no. It’s better to reuse it. That would also make sure the endpoint is indeed working properly, and your users can be fully forgotten.
Your users have profiles on your web application. And normally you should give them a way to delete their profiles (at least that’s what the European Court has decided).
That “simply” means you need to have a /forget-me
endpoint which deletes every piece of data for the current user. From the database, from the file storage, from the search engine, etc. Apart from giving your users at least partial control over their own data (whether you can have it or not is their decision), it is also a benefit for developers.
Apart from your isolated unit tests, containing a lot of mocks, you have other sorts of tests – integration test, acceptance tests, Selenim tests. All of these need a way to leave the database in the same state that it was before they were executed. In some cases you can use read-only transactions (e.g. with spring-test you get that automatically), or you can use an in-memory database and hope it will work the same way as your production one, or you can drop the database and recreate it on each run. But these are partial solutions with some additional complexity.
The best way, I think, is to just reuse the “forget me” functionality. From your acceptance/selenium tests you can call the /forget-me endpoint at the end of the test (tearDown), and for your integration tests y. If you distribute client-side APIs (or a third-party is building them against a test deployments of your system), you can again call the forget-me endpoint.
That, of course, doesn’t cover non-user-related data that you need in the database. If you have such data (apart from enumarations and data that should be always there), you have to take care of it separately.
Doesn’t that bring some additional complexity as well, and the constant need to update your forget-me functionality? Isn’t having read-only transactions, or a shell script that recreates the database after each run, simpler to support? Assuming that you need to have a properly working forget-me functionality anyway – no. It’s better to reuse it. That would also make sure the endpoint is indeed working properly, and your users can be fully forgotten.
Uhh, you seem to be forgetting literally the most important thing about “/forget-me”.
You have to also delete a user’s photos.
(or docs, or whatever content they add to your site).
Delete the photos from your hard drive. Also, from the CDN. And the distributed mirrors of the CDN. And any caches.
(Also, from the distributed mirrors/slaves of the DB, but that’s another can of worms).
Also, you need to delete the user’s info from any shared content that identifies them (“You were tagged in X by Y”).
…And the mirrors, and the caches…
You need to leave any dependent content in a usable state. E.g. replies to your own replies in a reddit thread.
And so on, and so on.
The real reason why “/forget-me” doesn’t exist is that in any product of real complexity and scale it’s actually very very hard to maintain the complexity of such a feature.
Case in point:
About 1 in every 100 images uploaded to imgur is an image that’s uploaded via their Selenium tests. It’s literally easier to add 1% overhead to your entire infrastructure, than to implement easy deletion of stuff.
Second case in point, Google does this too:
https://www.youtube.com/channel/UCsLiV4WJfkTEHH0b9PmRklw/videos
It’s literally easier to have a “super creepy youtube channel that uploads encrypted videos” than to “delete the sample encoding videos of your test script”.
Caches and CDNs expire. The “Forget me” feature doesn’t have to be immediate. For everything else, you have pretty good control and it can be done.