Making TYPO3 GDPR-ready


(Mathias Schreiber) #6

I need some clearing up on the term “encrypt”.
The law states that decryption should only be possible from outside of the system that gathers the data.
Thus a TYPO3 installation may not be able to decrypt (aka display) any personal data.
The cases of showing such data within TYPO3 I can think of from the top of my head:

  • BE Users Module
  • List Module of BE Users
  • History Module
  • Top Bar

I’m sure there are a couple more.
So since the system may not be able to decrypt the data on its own, those places will only display the encrypted data, correct?


(Fedir RYKHTIK) #7

I suppose, You are speaking about BE Users. OK, let’s say we will not process them at the moment.

But for FE Users it will be more complicated, Yep. All private data of FE Users should not be easily viewed / exported by any TYPO3 editor, who has the access to the Backend. Private data should stay private, like passwords, but with possibility to decrypt that data by it’s owner private key/password. We need to define common strategy together how to handle such private data.


(Mathias Schreiber) #8

My question is not about FE or BE users.
I’ll create an exaggerated example to try and get my point across:

All personal data of a frontend user needs to be encrypted.
The law clearly states that the name of a person qualifies as personal data, hence has to be encrypted.
After logging into the system, the FE Login Extension shows the users name in the profile view.
Since the system may not decrypt the data on its own, what do we show instead of the user’s name on the profile page?


(Riccardo De Contardi) #9

I can’t say I have understood what this does mean…


(Mathias Schreiber) #10

When you get full access to the TYPO3 installation and grab all the data you should not be able to decrypt the data.
Every person that is able to decrypt the data using by using a decryption key (that may not be present on the TYPO3 server) has to be named explicitly.

So in (ideal) practice you need to have a hardware decryption key in your desk, I need to have on in my desk and whenever somebody wants to know the name of a user that used TYPO3 as an editor, either you or me need to manually decrypt that data on a per-case basis.


(Riccardo De Contardi) #11

the sense of all is that an administrator should not be able to read (or export) the fe_user table with all data in clear, I guess.
I still can’t believe that it could be applied to be_users where an editor could just be created with name “editor”


(Markus Klein) #12

IMHO Point 2 at https://techbeacon.com/15-steps-developing-eu-privacy-policy-compliant-apps is not based on an explicit law. The lines written there make no sense at all.

the data should be encrypted with proper and strong encryption algorithms, including hashing.

If you hash data (uni-directional mathematical function) there is NO way to retrieve the data again. This makes absolutely no sense for user data except a password. “I stored my address at that website, but since they hash it, I can’t get it out anymore”.


(Rachel Foucard) #13

Hello everybody,

I just read the various posts and I think there are some things that need to be clarified:

there is absolutely no need to anonymize personal data in production platforms by default. Personal datas may of course be needed “in plain text” in a production site, but:

the site owner must make a list of the personal datas he has stored in the platform, why, and for how long (no longer than necessary). He must also secure this data and allow users to access and delete it.

When the duration of a personal data storing is over, it is necessary to:

  • remove it
  • or archive it (i. e. outside the website)
  • or anonymize it

There is also another reason to anonymize the personal datas of a website: on testing or development platforms.
We can:

  • ask the customer to provide an anonymized dataset
  • or anonymise the personal datas

Personal datas can be found in any table of the database, it really depends on each project, and also in files, located in different places of the fileadmin.

What might be interesting is perhaps an extension that would make it possible to define the tables, fields and folders containing personal datas, and assign them a storage period, as well as the chosen treatment at the end of the duration (deletion, archiving, anonymization).

For the data of the fe_users, if they are used by the user himself, for example to display his personal space, the duration of validity may be the unsubscription + n months for example. Mails sent via a contact form and that can be stored in a database, may have a duration of 1 month for example (this really depends on each organization, and the datas goals).

I hope that answers some questions.

Cheers,
Rachel


(Markus Klein) #14

Thanks for the great summary @rakel

This a great example how complicated that matter actually is:
The contact form massages (no matter if stored in DB or sent via email) may only be stored as long as it is necessary.
Curiously the necessity is not necessarily defined by the inquiry itself. Consider a request to adjust some invoice you sent to a customer. In Austria the financial laws require you to store any business-related communication for 7 years(!).
Hence, the mailform request is not necessary anymore after the invoice has been adjusted (case close), but the law actually requires longer storage.
Conclusion: Defining a generic time span for storage is impossible, since it solely depends on the content of the message and the thereby triggered processes.


(Rachel Foucard) #15

Yes, you’re absolutely right @liayn. The more I think about an automated solution, the more I realize that this regulation leads to specific treatment for each organization.

Perhaps it is reasonable to imagine a back-end module that would allow defining tables, fields and folders containing personal data in order to provide a “global mapping” of this datas on a TYPO3 platform.

A second more complex but maybe very interesting tool would be a dump generator of the bdd for web agencies that would exclude (empty fields) these personal datas thanks to this global mapping. Same thing for a fileadmin/ or uploads/ tgz.

Anonymization in any case can’t simply encrypt datas, because, if on the development platform we need to use this datas, a telephone number must keep its format, just like an address, or a date of birth. I prefer to replace the data with an anonymous dataset, why not with a generator like this one: https://www.mockaroo.com/ ?

cheers,
Rachel


(Bernd Wilke) #16

just for clearance:
the complete regulation can be found at http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2016.119.01.0001.01.ENG (with the option of translation into other languages, even side by side)
after 173 points of information the regulation starts with 99 articles in 11 chapters

There also is a directive which “[…] lays down the rules relating to the protection of natural persons with regard to the processing of personal data by competent authorities for the purposes of the prevention, investigation, detection or prosecution of criminal offences or the execution of criminal penalties, including the safeguarding against and the prevention of threats to public security.” (http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2016.119.01.0089.01.ENG, 107 points of information, 65 artivles in 10 chapters)

I have tried to get an overview (it’s a lot of text) and can’t find the absolute neccessarity to encrypt data, or even to encrypt with the option of decryption only on other systems.
This might be a wrong conclusion from the suggestion(!) to protect data against unauthorized access by encryption.
The only protection against access if an intruder gets access to the complete server is an encryption which can be reversed only outside the system. But that would deny any processing (or displaying) of data on the system - except you transfer all data each time you need it to another system (decrypt outside) - but then the decrypted data must be imported (transfered) into your system again, which might open another vulnerability.

What I can find is the need to get an overview about all stored data for any person. Then there is the option to delete all data for a person on request. And the option to transfer all data to another system.
Aside from the transfer these should be given since years AFAIK. An example in the explanaitions (http://ec.europa.eu/justice/data-protection/reform/index_en.htm) states that especially the deletion was not given and unneccessary histories were stored for years.

Two features could help anyone when realized in TYPO3:

  • encryption and decryption of data (single fields) on the fly. so noone can access the data just by a dump of the database
  • pseudonymisation (Articel 4, definition 5) could make testing of applications easier, as you can use easily generate data in the amount of the live system, without the need to use live data for testing.

(Felix Althaus) #17

encryption and decryption of data (single fields) on the fly. so noone can access the data just by a dump of the database

I did this for a customer once who had a policy to not save unencrypted personal data in the database. My solution was an extension that does on-the-fly en- and decryption with a key outside document root. In a composer-based setup you should have a hard time to decrypt the data even with admin backend access (without direct file access), right? But this still doesn‘t meet the requirements of GDPR, does it?


(Felix Althaus) #18

I can‘t imagine that

decryption should only be possible from outside of the system

applies also to data that is needed to fulfill what you promised to your (customer‘s) customer. That would render all websites with personal profiles useless.

Anyone around who speaks legalese? I mean I‘m not a law pro thus I think we should hire a lawyer to get things straight unless we want to get lost in the fog. I‘m in with a hundred bucks :slight_smile:. We should provide a guideline for extension developers, too.

When it comes to data that aren‘t needed in frontend requests or CLI/cronjobs/scheduler tasks (e.g. incoming contact form) then that‘s a whole new ballgame. Certificate based backend login could help here.


(Fedir RYKHTIK) #19

It’s possible to find the solution in the same way, as Install Tool works.

Basically, there is standalone key, which encrypts / decrypts personal data.

So, even, if the database will be stolen / shown via SQL-injection, in will be not possible to use that data, without stealing the file with a standalone encryption key. So it will make the system harder.


(Felix Althaus) #20

I‘m totally aware of this possibility and implemented such solution in the past (see here). Still, this does not meet the requirement I quoted (ignorant of whether it‘s a strict requirement for all personal data, as mentioned).


(Fedir RYKHTIK) #21

Here’s an article about CMS & GDPR.

The author shows 4 Ways a CMS can support site owner GDPR efforts :

  1. Managing consents
  2. Records of given consents and processing activities
  3. Data portability
  4. Handling ‘right to be forgotten’

(Dan Untenzu) #22

Besides all possible extra tools and loops, like encryption or generic archives, there is one main concept, which is summarized perfectly by Rachel

the site owner must make a list of the personal datas he has stored in the platform, why, and for how long (no longer than necessary).

When the duration of a personal data storing is over, it is necessary to:

  • remove it
  • or archive it (i. e. outside the website)
  • or anonymize it

We therefore should start by showing integrators where TYPO3 stores data. This is not obvious at the moment. For example: When I install TYPO3 I just don’t know where TYPO3 stores IP addresses. The »sys_log« table may be obvious, but where else? Without this information I can’t provide a proper privacy statement.

Next step: Create tools to remove/anonymize the data. It is for example okay to temporarily store session data, such as the IP address of frontend users, which is stored in the database subsequently. When the session ends, all session data should be destroyed and removed from the database. This is already the case, but not documented in some privacy note.

Third step: Create a guideline on how to handle user data and urge developers to follow this guideline. It should be part of the coding guidelines, that extension developers instruct integrators whether an extension stores personal data, why and for how long.
The core could support this with tools as well, for example by providing a core method to retrieve and store an IP-address. Example: Extension »acme-news« uses a core method to retrieve and store an IP-address, by using this method the TYPO3 core automatically knows about this data usage & processing and could list this extension in an overview. We could also extend the »Table garbage collection« scheduler task, and provide an easy but required method to set an expiry date for records or fields with personal data.

In general I propose that the CMS itself faces all challenges of the GDPR and provides solutions and best practices for developers and agencies, instead of the other way around. I support Fedir’s proposal to create a working group and organize a dedicated TYPO3 Code Sprint related to the preparation to GDPR (and I would like to join this group).


(Dan Untenzu) #23

Regarding the issue, that I don’t know where TYPO3 stores IP addresses and want identify all the possible places and maybe even the code which is responsible for this: In preparation of the GDPR I assigned this task to a student of mine.

TYPO3 CMS:

  • field »log_data« in table »sys_log«
  • field »IP« in table »index_stat_search«
  • field »ses_iplock« in table »be_sessions«
  • field »ses_iplock« in table »fe_sessions«

As mentioned above the session data in the user tables are okay, since this field is cleared as soon as the user is logged out. EXT:indexed_search stores all searchwords and which users typed them into a table (why?). The »sys_log« is used by various methods to log exceptions, login actions, content editing etc… It is not quite obvious which parts of the core or extensions write into this table. So a information for the user why his IP address is stored in the log is not conclusive yet. This would be the next step. But as explained above, it is an expensive task to reverse-engineer all the places. Instead the CMS should provide an interface for this (which needs a concept, guideline, etc…).

Extensions:

  • field »IP« in table »tx_formhandler_log«

The formhandler extension is one example were integrators should be made aware of, that they store personal data in the database. It has a logger controller, which is active by default. This stores the IP address of the user filling out a form. It also offers a custom method to cut of parts of the IP address. This is not active by default however.


(Markus Klein) #24

I agree with what @pbtypo3 said.


(system) #25

This topic was automatically closed after 28 days. New replies are no longer allowed.