Making TYPO3 GDPR-ready


(Fedir RYKHTIK) #1

Discussion Topic

Hi ! As all of You knows 25 May 2018, new European regulation about Data privacy will be active ( https://en.wikipedia.org/wiki/General_Data_Protection_Regulation ). All sites, which targets European citizens should respect it, otherwise, site owners could have big financial penalties.

The GDPR considers any data that can be used to identify an individual as personal data. It includes, for the first time, things such as genetic, mental, cultural, economic or social information.

So TYPO3 CMS should be ready for GDPR.

Sites built with TYPO3 will have to respect EU citizens rights, which include:

  • The possibility for them to view the data you collected on them
  • The possibility to rectify some data concerning them
  • The possibility to delete their data
  • The possibility to export their data
  • Tools to help to notify the local data protection authority of a data breach

Impact

There are lot’s of stuff to do :

  • IP Anonymization
  • Personal data listing by user
  • Personal data export by user
  • Personal data delete by user
  • Personal data expiration limits implementation

Personal Data management framework for the Core & Extension owners could be created.

I propose to create a working group and organize dedicated TYPO3 Code Sprints related to the preparation to GDPR, so it could be included in TYPO3 9 LTS.

Pro

  • TYPO3 will be GDPR compliant, so could be used further by any site which is created for European citizens usage.

Con

  • Some serious and well organized work should be done by Core team & Extensions owners to make TYPO3 GDPR-ready by design.

Organizational

Topic Initiator: Fedir
Topic Mentor: ?


GDPR > IP Anonymization
(Riccardo De Contardi) #2

I could be wrong, but I think that it is about FE users and not BE users, the name of a BE user does not identify him/her. I mean, I could name my user “Mathias Schreiber” or “Matteo Renzi” or “Sebastian Vettel” and none of these identify that person :wink:
We are in the region of the relationship between company and employees, I think.
It has been made the example of a fired employee that had a backend access, who could require to extract all his private data and all personal informations from the site.
Well, the only “private data” of a backend user are name and email… and, when you’re fired your account could be removed (or renamed). So… what should we extract?
The list of the content you modified or the list of action you took inside TYPO3 for the time you used the account cannot (IMO) be considered “private” or “personal” or “sensible” data. But I am not a lawyer and maybe the implications of the new law are beyond my grasp.

As far as I remember the log already tracks the IP address of BE users that log in. Am I wrong?

About FE users, that is a different topic and I agree it should be investigated.


(Markus Klein) #3

I attended quite a lot of presentations from lawyers and so on in the last months regarding this very topic.

TLDR: The whole thing has to be taken serious, but its definitions are way too lose for most use cases. (except medical stuff and other really sensitive areas)

I agree on the IP masking, which should be possible.
Regarding the core tool to extract personal data, all I can possibly imagine is some kind of simple API in the core, where extensions can register too, where a UI is available for searching “a name” and it returns all content from all extensions registered to the API.
Keep in mind that this is maybe a nice idea, but could vastly fail to work as use cases of websites vastly differ a well.
Simplest problem: It could be so much data that it exceeds the PHP time limit.

I’m honestly very undecided whether we should carry out “anticipatory obedience” or if we should rather watch out how the GDPR is actually treated in practise. Remember that this law is also new for the government agencies.


(Fedir RYKHTIK) #4

Nice & concrete article : https://techbeacon.com/15-steps-developing-eu-privacy-policy-compliant-apps lists 15 steps, based on the OWASP Top Ten Privacy guidelines (https://www.owasp.org/index.php/OWASP_Top_10_Privacy_Risks_Project).

I would like to propose following steps, to be implemented by default / with help of TYPO3 Core :

  1. Encrypt all personal data by default
  2. Destroy sessions and cookies after logout
  3. Encrypt logs by default
  4. View, export, delete data of users on a single request: for FE users only at the moment. To discuss, if it should be implemented for BE users also.

(Sebastian Michaelsen) #5

While this might often be the case I have seen multiple installations where BE users are customers of the company running the site. So I wouldn’t rule out that Backend Users could be affected by the new ruling.


(Mathias Schreiber) #6

I need some clearing up on the term “encrypt”.
The law states that decryption should only be possible from outside of the system that gathers the data.
Thus a TYPO3 installation may not be able to decrypt (aka display) any personal data.
The cases of showing such data within TYPO3 I can think of from the top of my head:

  • BE Users Module
  • List Module of BE Users
  • History Module
  • Top Bar

I’m sure there are a couple more.
So since the system may not be able to decrypt the data on its own, those places will only display the encrypted data, correct?


(Fedir RYKHTIK) #7

I suppose, You are speaking about BE Users. OK, let’s say we will not process them at the moment.

But for FE Users it will be more complicated, Yep. All private data of FE Users should not be easily viewed / exported by any TYPO3 editor, who has the access to the Backend. Private data should stay private, like passwords, but with possibility to decrypt that data by it’s owner private key/password. We need to define common strategy together how to handle such private data.


(Mathias Schreiber) #8

My question is not about FE or BE users.
I’ll create an exaggerated example to try and get my point across:

All personal data of a frontend user needs to be encrypted.
The law clearly states that the name of a person qualifies as personal data, hence has to be encrypted.
After logging into the system, the FE Login Extension shows the users name in the profile view.
Since the system may not decrypt the data on its own, what do we show instead of the user’s name on the profile page?


(Riccardo De Contardi) #9

I can’t say I have understood what this does mean…


(Mathias Schreiber) #10

When you get full access to the TYPO3 installation and grab all the data you should not be able to decrypt the data.
Every person that is able to decrypt the data using by using a decryption key (that may not be present on the TYPO3 server) has to be named explicitly.

So in (ideal) practice you need to have a hardware decryption key in your desk, I need to have on in my desk and whenever somebody wants to know the name of a user that used TYPO3 as an editor, either you or me need to manually decrypt that data on a per-case basis.


(Riccardo De Contardi) #11

the sense of all is that an administrator should not be able to read (or export) the fe_user table with all data in clear, I guess.
I still can’t believe that it could be applied to be_users where an editor could just be created with name “editor”


(Markus Klein) #12

IMHO Point 2 at https://techbeacon.com/15-steps-developing-eu-privacy-policy-compliant-apps is not based on an explicit law. The lines written there make no sense at all.

the data should be encrypted with proper and strong encryption algorithms, including hashing.

If you hash data (uni-directional mathematical function) there is NO way to retrieve the data again. This makes absolutely no sense for user data except a password. “I stored my address at that website, but since they hash it, I can’t get it out anymore”.


(Rachel Foucard) #13

Hello everybody,

I just read the various posts and I think there are some things that need to be clarified:

there is absolutely no need to anonymize personal data in production platforms by default. Personal datas may of course be needed “in plain text” in a production site, but:

the site owner must make a list of the personal datas he has stored in the platform, why, and for how long (no longer than necessary). He must also secure this data and allow users to access and delete it.

When the duration of a personal data storing is over, it is necessary to:

  • remove it
  • or archive it (i. e. outside the website)
  • or anonymize it

There is also another reason to anonymize the personal datas of a website: on testing or development platforms.
We can:

  • ask the customer to provide an anonymized dataset
  • or anonymise the personal datas

Personal datas can be found in any table of the database, it really depends on each project, and also in files, located in different places of the fileadmin.

What might be interesting is perhaps an extension that would make it possible to define the tables, fields and folders containing personal datas, and assign them a storage period, as well as the chosen treatment at the end of the duration (deletion, archiving, anonymization).

For the data of the fe_users, if they are used by the user himself, for example to display his personal space, the duration of validity may be the unsubscription + n months for example. Mails sent via a contact form and that can be stored in a database, may have a duration of 1 month for example (this really depends on each organization, and the datas goals).

I hope that answers some questions.

Cheers,
Rachel


(Markus Klein) #14

Thanks for the great summary @rakel

This a great example how complicated that matter actually is:
The contact form massages (no matter if stored in DB or sent via email) may only be stored as long as it is necessary.
Curiously the necessity is not necessarily defined by the inquiry itself. Consider a request to adjust some invoice you sent to a customer. In Austria the financial laws require you to store any business-related communication for 7 years(!).
Hence, the mailform request is not necessary anymore after the invoice has been adjusted (case close), but the law actually requires longer storage.
Conclusion: Defining a generic time span for storage is impossible, since it solely depends on the content of the message and the thereby triggered processes.


(Rachel Foucard) #15

Yes, you’re absolutely right @liayn. The more I think about an automated solution, the more I realize that this regulation leads to specific treatment for each organization.

Perhaps it is reasonable to imagine a back-end module that would allow defining tables, fields and folders containing personal data in order to provide a “global mapping” of this datas on a TYPO3 platform.

A second more complex but maybe very interesting tool would be a dump generator of the bdd for web agencies that would exclude (empty fields) these personal datas thanks to this global mapping. Same thing for a fileadmin/ or uploads/ tgz.

Anonymization in any case can’t simply encrypt datas, because, if on the development platform we need to use this datas, a telephone number must keep its format, just like an address, or a date of birth. I prefer to replace the data with an anonymous dataset, why not with a generator like this one: https://www.mockaroo.com/ ?

cheers,
Rachel


(Bernd Wilke) #16

just for clearance:
the complete regulation can be found at http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2016.119.01.0001.01.ENG (with the option of translation into other languages, even side by side)
after 173 points of information the regulation starts with 99 articles in 11 chapters

There also is a directive which “[…] lays down the rules relating to the protection of natural persons with regard to the processing of personal data by competent authorities for the purposes of the prevention, investigation, detection or prosecution of criminal offences or the execution of criminal penalties, including the safeguarding against and the prevention of threats to public security.” (http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2016.119.01.0089.01.ENG, 107 points of information, 65 artivles in 10 chapters)

I have tried to get an overview (it’s a lot of text) and can’t find the absolute neccessarity to encrypt data, or even to encrypt with the option of decryption only on other systems.
This might be a wrong conclusion from the suggestion(!) to protect data against unauthorized access by encryption.
The only protection against access if an intruder gets access to the complete server is an encryption which can be reversed only outside the system. But that would deny any processing (or displaying) of data on the system - except you transfer all data each time you need it to another system (decrypt outside) - but then the decrypted data must be imported (transfered) into your system again, which might open another vulnerability.

What I can find is the need to get an overview about all stored data for any person. Then there is the option to delete all data for a person on request. And the option to transfer all data to another system.
Aside from the transfer these should be given since years AFAIK. An example in the explanaitions (http://ec.europa.eu/justice/data-protection/reform/index_en.htm) states that especially the deletion was not given and unneccessary histories were stored for years.

Two features could help anyone when realized in TYPO3:

  • encryption and decryption of data (single fields) on the fly. so noone can access the data just by a dump of the database
  • pseudonymisation (Articel 4, definition 5) could make testing of applications easier, as you can use easily generate data in the amount of the live system, without the need to use live data for testing.

(Felix Althaus) #17

encryption and decryption of data (single fields) on the fly. so noone can access the data just by a dump of the database

I did this for a customer once who had a policy to not save unencrypted personal data in the database. My solution was an extension that does on-the-fly en- and decryption with a key outside document root. In a composer-based setup you should have a hard time to decrypt the data even with admin backend access (without direct file access), right? But this still doesn‘t meet the requirements of GDPR, does it?


(Felix Althaus) #18

I can‘t imagine that

decryption should only be possible from outside of the system

applies also to data that is needed to fulfill what you promised to your (customer‘s) customer. That would render all websites with personal profiles useless.

Anyone around who speaks legalese? I mean I‘m not a law pro thus I think we should hire a lawyer to get things straight unless we want to get lost in the fog. I‘m in with a hundred bucks :slight_smile:. We should provide a guideline for extension developers, too.

When it comes to data that aren‘t needed in frontend requests or CLI/cronjobs/scheduler tasks (e.g. incoming contact form) then that‘s a whole new ballgame. Certificate based backend login could help here.


(Fedir RYKHTIK) #19

It’s possible to find the solution in the same way, as Install Tool works.

Basically, there is standalone key, which encrypts / decrypts personal data.

So, even, if the database will be stolen / shown via SQL-injection, in will be not possible to use that data, without stealing the file with a standalone encryption key. So it will make the system harder.


(Felix Althaus) #20

I‘m totally aware of this possibility and implemented such solution in the past (see here). Still, this does not meet the requirement I quoted (ignorant of whether it‘s a strict requirement for all personal data, as mentioned).