Generate DB scheme from TCA

Generate DB scheme from TCA

Drop ext_tables.sql and generate the DB scheme solely from the information provided by TCA. This will make the TCA array the source of truth for all DB related information.

Currently the TCA contains the following information:

  • Table name
  • Columns
  • Relations, including table names for MM tables and related fields
  • Partial information about types, allowed values, default values, nullable, etc

Missing but in the ext_tables.sql file currently:

  • Keys, including primary key (usually uid)
  • Autoincrement, column length, collate, character set
  • Engine

Proposed structure

The structure is open for suggestions.

Introduce [table][database] and [table][columns][*][database].

The former includes all table related information, including keys. The later contains all column related information.

The definitions should stay close to the features provided by the Doctrine scheme service, which is already in use by the core. The configuration keys and values should ideally match directly to the doctrine scheme object properties.

Impact

  • ext_tables.sql will be deprecated and remove in future versions
  • Database compare requires reading of TCA files

Possible Migrations

  • Automatic enriching of TCA by definitions from the ext_tables.sql file (maybe as third party/non-core script or extension)

Pro

  • One source of truth
  • Proper deprecation and migration of DB related constructs (e.g. not matching column definitions) via the already know TCA migration service
  • Remove error prone SQL scheme parsing
  • Road towards adding constraints to the database
  • Migration SQL statements can be added on-the-fly during TCA processing

Con

  • Increases TCA files (required for each run of TYPO3)
  • Adds more information to TCA instead of reducing it (I have another idea on how to handle this)
  • Some developers might do strange stuff with PHP to dynamical change DB related configuration (Already possible with hooks/signals)
  • Not easily possible to convert an existing table into TYPO3 extensions/schemes (would required TCA generation anyway)
  • Not possible to add non-TYPO3 tables and fields (developers will need to use hooks/signals or create TCA information)

Remarks and notes

The long term goal is to introduce proper database constraints to the TYPO3 SQL scheme automatically. For this to work, we need unified column definitions across tables.

Organizational

Topic Initiator: Philipp Gampe
Topic Mentor: T.B.A.

Hey Philipp,

sounds interesting. Some questions arise:

  • Should other tables (sys_log etc) then have all TCA-required and a TCA field as well?
  • When do we do “automagic” field migrations / additions? Just via install tool as we do right now? Or some TCA migration logic?

A semi yes. For this proposel only: yes. However it would be better to directly create the needed signals to add non TCA tables via the signal as objects. Preferably, the hook or signal would directly accept an doctrine table scheme as object structure.
Thinking about this, this would be needed for some extensions anyway. The separation would make it also clear, that the table is not managed by TYPO3 in any way. All tables managed by TYPO3 would need a TCA configuration. Core only tables would be an exception of course.

For now I would stick to the install tool. However the core could change the default definitions of certain fields (e.g. signed or unsigned, the list of allowed values, rename fields on the fly). Those kind of changes could be done via TCA migrations and also add the needed SQL statements for DB migrations on the fly. However I would not invest into big magic here. Rather I suggest to bring real table migrations (and code migrations) of some sort to the core, but that is another topic.

This will force everyone to define, in addition to the “type” of field in TCA, how the data should be stored physically in the database (field type), with size, … which could differ from restrictions in TCA, thus bloating TCA even more.

In addition, this will be necessary for fields which are currently processed by some user function (could be serialized data or something similar).

I don’t see the purpose of removing that file, it’s extremely simple to handle and in fact, this will have the other side effect of requiring every table to be defined in TCA, even if this is not needed since the table may be pure custom business logic.

For all those reasons, I’m quite reluctant to try to “simplify” things (removing one easy-to-handle file) with additional complexity in TCA and many new use cases to handle.

1 Like

The type field is for representation in BE forms only. The database part would reflect the real data type, which can be totally different.

In that case, you could always set set database|type => null to indicate to the API that this field is virtual and not represented as database field. That configuration would make it explicit then, rather than implicit.

IMHO, once a table is handled by the TYPO3 API, it should be described in full.

The semi-SQL file is not easy to handle. The Install Tool even fails if you format it the wrong way, not to speak of the many different SQL features which are not supported.
What kind of new use cases do you see that would need handling?

One source of truth would finally bring in the fun. Waiting for this for almost 15 years.

There is more in it. I think of auto-generation of value checks both browser side and server side. If that can be auto-generated, boring work reduces dramatically, errors go away and many, many security holes would vanish.

It’s done for the BE anyway. Needs to be extended for FE javascript. Best if BE and FE can share the same library for this.

Finally I would welcome the option to write the TCA files as YML, which would maybe require additional caching and integrity checks of the TCA itself.

Sure it’s not easy, but it’s worth to face the challenges. One of the challenges you didn’t address is the neat integration with Extbase.

@t3elmar non-php tca files would be very nice, but is not the scope of this rfc.

I do not think that extbase will need changes because of this.

My goal is to automatically create DB constraints to strength DB integrity in order to be more reliable on concurrent requests. A single source of truth is kind of an requirement for this.

Can’t we use Doctrine YAML schema files and somehow convince a library to do DB migrations from those?
That would mean we:

  • have platform independent DB definitions
  • get rid of SQL parsing
  • follow the latest fashion of using YAML
  • can define tables that are not in TCA
  • don’t bloat TCA (it’s in memory!!)

For me TCA is the definition for the Form Engine. There is never a single source of truth for a table structure. We could extract information from the Model (and add extra annotations), but then you’ll lose the option to define fields that are not directly mapped in a model. By using only TCA we lose the flexibility to use tables that are not editable in the Form Engine.

3 Likes

This is definitely not the case. TCA is a hybrid defining BE configuration (Form Engine) and general table information (eg used by DataHandler). This is for sure not optimal. In general I would say that we must actually split this information.
Record handling via BE is just one perspective of looking into data and the “model” for the FormEngine shall be described in a dedicated place.

Example: The m:n relation of two tables shall be described in a general configuration. The way I want to edit this relation (group field, IRRE, you name it) in the BE is part of a dedicated BE configuration. The general information is also useful for FE or other means, whereas the dedicated BE configuration is not interesting at all outside of FormEngine.

“A model for the ForEngline” sounds like doubling the information, meaning double maintenance and errors.

You could suggest an additional layer, adding ForEngline specific information on top of a common base. Would it hold that much additional information to justify splitting?

As Jigal pointed out, TCA is meant for FormEngine. In addition, it’s introducing another level of abstraction, and limiting instead of extending the capability of a SQL definition.

What I mean is that I agree that we currently only support a subset of table definitions but SQL is the language of choice for describing the structure of a SQL database. By moving it to PHP we bloat the definition in memory when it is not needed at all and we have to have another level of abstraction for describing the structure.

I’m fine moving to some Doctrine YAML-based definition since we moved to Doctrine, but what I really miss at the moment is the definition of views in ext_tables.sql. If Doctrine YAML allows that, then I’d be more than happy to switch to this other format.

However, and even if I might get the point of having only “one source of truth”, the fact is that in many of my extensions I actually don’t need this since source of truth but instead need multiple facets of truth for the same database. One is describing the physical requirements of my persistence layer, this is the db definition, and (one or more) others is configuring the Backend to show an edit form for my records.

Adding a new field! In SQL, in TCA, in the extbase model, in the HTML form, in JavaScript checks. Did I miss a layer?

Five times! Five times name, type, length, the same informations over and over again. I am always happy to do extensions, that don’t have forms, and get the data from non-relational sources.

A single source of truth shrinks the work to add a new field to 20%. And yes, there are 5% of cases, where a single source of truth is limited. What is the right conclusion?

A. Staying with fivefold work forever.
B. Having defaults from a single source and flexibility for the remaining 5% of cases.

1 Like

Fully agreement!

I would in general agree that SQL is description language for data storage, but it unfortunately lacks quite some details which are relevant for the business logic to serve as the single source. (eg range limits, list of allowed values, limitation of allowed related rows in a relation, etc)
This makes me believe that a source that hold more precise information, allows for generating the more general SQL from this very source. Of course this source shall rather be superset of the SQL features, but may may not provide everything.

I would just like to mention one use case which we have very commonly. fe_users is used as a storage for multiple usages like sales-contact for news records as well as login-users for customer center. Therefore we have different models on the very same table. The fe_users records sometimes are located at the same sysfolder, sometimes they are stored independently. For such cases we usually modify FormEngine via pageTSconfig to show only the relevant fields for the actual use case, which obviously either depends on a type field, or a storage folder.
A possible solution could therefore be to make the whole FormEngine configuration part of pageTSconfig (which is for BE mostly anyways) and strip that part off TCA.
TCA could then serve as the single source for general data definition.

I like the general idea to remove ext_tables.sql. But I don’t think that TCA is the right place for this kind of information. One of the biggest problems currently is the lack of proper database migrations. For example renaming a field/table is currently not possible, which blocks you from doing proper OO and DDD when the knowledge of the domain expands or the domain changes.

As we’re using already Doctrine DBAL in CMS 8 now, we should explore Doctrine Migrations. Maybe they solve the problem of missing DB Migrations. To simplify the use of migrations we might add a developer helper that generates the necessary migration classes between TCA changes. Those files can then be adjusted to needs. This way also database plattform specific migrations would be possible.

6 Likes

Just for the reference, this is how symfony handles it:
http://symfony.com/doc/current/doctrine.html#add-mapping-information (click yaml tab)
http://docs.doctrine-project.org/projects/doctrine-dbal/en/latest/reference/schema-representation.html

The thing is, we need to handle also not ORM tables, therefore we cannot directly use an ORM mapping like symfony. But it might be worth using yaml files with proper relationship annotations. The question then remains, what do we do with the TCA? Reduce it to just form engine definitions? What about datahandler than?

A proper (normalized) SQL data structure can handle all of this, but limitation of allowed related rows (you need checks for this).

I didn’t mention ORM here as the only currently usable ORM in CMS is Extbase ORM which relies fully upon TCA. So I wouldn’t care about ORM at all in the first place and propose to create a “Migration Generator” based upon TCA only. Most of the values could be derived from it. All SQL definitions would be completely defined through Database Migration Classes. No YAML or whatever needed.

1 Like

The use case is convincing, that a configuration layer on the page tree level is a must have. It’s in full compliance with TYPO3 philosophy.

The first symphony link also addresses how it is related to php annotations. In Extbase it’s about validations.

“A bundle can accept only one metadata definition format. For example, it’s not possible to mix YAML metadata definitions with annotated PHP entity class definitions.”

Hi. Christian Kuhn asked me, few weeks ago, if I could start with the described functionality. I wanted to start this week, and found this thread now^^

In a first step just the CTRL part of TCA should get checked. So fields like “hidden” or “deleted” can get created automatically in DB Schema, with current TCA structure.

Real fields in TCA (like header, bodytext, etc.) should not be implemented in this first step. So my task would not make ext_tables.sql obsolete. It would just reduce its contents.

So if I should not start, please tell me!