Introduce translation of files

Handling of files and language variants

Currently, there is no possibility to provide variants of a file in TYPO3. The often cited example is ‘I have an image with text in it’. Now this image should be used in several language variants, only the text differing. Another example is the PDF document featuring the same content, but being available in different languages.
This situation is currently solved by editors by just creating CEs in free mode (without relation to the translation parent) and relate the corresponding file in requested language.

To improve this handling, files and their corresponding metadata records can be handled and related in a different way, that enables editors to stick to actual translation of records and will respect file variants by current language.

Here is the proposal:
Please note, this approach will not follow the usual overlay approach the core follows, but will use a copy mechanism.

  1. introduce a new database field named file_source into both sys_file and sys_file_metadata tables. Additionally, sys_file will receive the field sys_language_uid.
  2. translating sys_file_metadata is already possible in the filelist module.
  3. creating a new file record (by upload or import) will set the sys_language_uid to 0, thus default.
  4. translating the sys_file_metadata record will present a new upload field. Here the editor may upload the language variant to be used in this specific language. Transparently in the background, the original files uid will be saved into the variants file_source field. The same data will be saved for the sys_file_metadata record. So if a language variant file exists, the file field of sys_file_metadata will point to this variant, not the original file anymore.
  5. the editor creates a CE with FAL field in default language and references some files. The selection will only present files that are in default language as well, no variants are displayed.
  6. then a translation of that CE is initiated, copying the references in the background. Overlaying fields from sys_file_metadata (of that language) are still editable.
  7. rendering of the language variant in frontend will look for a difference in file_source and file fields of the sys_file_metadata field of the referenced file. If one is detected, the language variant of the file is used, otherwise the original.

Impact

  • Usage of free mode for creating translations of CEs, that only differ for their related files, can be avoided.
  • File variants become an option for editors, working experience will be enhanced.

Possible Migrations

  • database migration takes care of copying data into the newly introduced database field.
  • a switch will preserve the current behaviour for exisiting installations
  • backend module will support resolving free mode translations into translated records.

Pro

  • more consistent behaviour of translated record
  • enhanced user experience for editors

Con

  • behaviour change
  • existing instances need to be migrated
  • manual action might become neccessary for existing instances which wish to use the new behaviour

Remarks and notes

A possible migration step would be a switch for enabling the current behaviour, even with the feature in place. That is needed during transition, until the current situation using free mode and independend file records is resolved and the relation between files is established. There is no way to do that automatically, but the user has to provide information.
This work should be supported by a backend module the core provides. Here is how it should work:

  1. Walk all pages and find free mode CEs in different languages, that contain different sys_file_references records and display them side by side.
  2. Provide a selection mechanism to relate the original files to the language variant.
  3. In background, the sys_file_records will be updated to the structure mentioned above.
  4. the CEs will be integrated to be translated again, no longer free mode.
  5. CE can be waifed from display, if the free mode should stay as is.

Organizational

Topic Initiator: Anja Leichsenring
Topic Mentor: Anja Leichsenring

3 Likes

Thus this would only be a new feature and does not effect existing installations in any way?
If so, why would we need a switch to preserve existing behavior?

to ease the upgrade path. If you need to switch from old behavior to the new one, it is easier to do it independently from the upgrade itself. It should be no blocker.
Manual work is included, no matter how precise the migration will detect possible fields to change.

I still fail to see where the behavior would differ? As far as I understand the proposal, it would introduce new possibilities, but not block you from sticking to the old scheme.
Please clarify where the switch would kick in.

please look at the description of the migration step including the BE module. Here CEs can be related into their proper language relation. As long as there are not all file variants in place, the old behavior should be preserved, otherwise the FE output for that particular CE would change. It would display the defaults language files instead of the overridden ones.

This switch is a very minor detail in implementation, it might happen that during development it is discovered to be not needed. I would actually prefer to discuss the behavior change of file handling as a whole instead supportive details, that might or might not become needed during transition.

This whole story reminds me of my remarks about language handling of relations in general.
See https://notes.typo3.org/p/Notes_on_translation_docs the first few paragraphs, which is about free vs connected mode on records in general.

I’m not sure about the concept presented here. I would rather say that the correct solution would be like described in the note above that sys_file_reference must be translatable (connected mode), hence also a different file may be assigned.
So, if a tt_content record is translated (connected) then also a translation for the sys_file_refs must be generated. The sys_file_refs need to show a UI element to select the file associated with the reference.
If a tt_content is copied, then also the refs must be simply copied.

As I mention in the note, the whole translation process MUST be able to cope with different types of records/tables. A ProjectStatus (active, archived, stalled,…) record may never be run in free-mode but only in connected-mode. It makes no sense for such records to have independent values in different languages (free mode). So the Core must prevent the creation of a (lanuage-) copy by all means. Furthermore, a relation to such a record (i.e. the status column of a Project) may only refer to the default language and on project translation (which can be free or connected), the ProjectStatus may of course not be copied or anything else.

So it heavily depends on the meaning (semantics) of a table, how translation has to be done. In consequence the translation behaviour has to be defined in TCA per table or even per type of record (TCA type-field).
The same concept must be applied to sys_file_reference. The correct settings in TCA could look like:

sys_file_reference: Translate if referring record (eg tt_content) is translated as well, copy if referring record is copied as well

I have a question to understand the scope here, I hope you can clear things up for me.

From what I understood so far:

  1. You are referring to a label, not a record. In that case it’s meta information dealt with XLIFF
  2. You are referring to a record referenced by the ProjectStatus property of, say, a content element.
    If 2 is the case, then my question would be whether the core should not create a copy or whether the core could create a copy as long as it is 110% sure the data is kept in sync?
    As a personal note: as long as my data is consistent content-wise, duplicating data is no issue.

No no label at all. Not talking about XLIFF by any means. I’m using ProjectStatus as a normal table as example, which is 1:n to Project.
So has a “status” field, which is an int(10) referring to a single record in the ProjectStatus table.

I try to explain the highlevel data structure, where I strictly distinguish translate/free mode.
I’m a big fan of actually copying the data in translate mode, so at low level the data is actually “unrolled” to ease data selections. This has to be transparent and data must of course be in sync then.
But on the high-level and concept-wise we need to really think about any kind of record that might have any kind of relation (1:n, m:m, 1:1) and what possible usecases we have with translation in either mode (free or translate).

In the case of FAL we have a m:m (file_reference), where one side (sys_file) has another 1:1 (sys_file_metadata), we start with translating the other side (tt_content or anything else).
We have to be able to define (depending on the “other side table” or even more fine-grained, e.g. CE type) how translation has to happen. For instance a custom CE (for a company logo) might disable translation for the assoc. file completely, while we might have a file collection which contains PDFs with localized text.

So in that regard we must actually make sys_file translatable, where I can define a translated file for a sys_file (metadata being translated then as well of course).
Having this dataset then allows us to deal with tt_content being translated, where the file_reference is translated as well (for metadata), but the sys_file and the associated tt_content are always the default-language ones.
This is for the connected-mode and does NOT allow to define extra relations on a translation of a tt_content.

This is what Anja wants to do, with a slightly different angle.
I try to rephrase since the topic is really hard to grasp… at least for me :slight_smile:
What I take from your last post is that we need to define some sort of “this is the master-table” for the FAL usecase.
From what I’ve thought of so far is that FAL is the only case where we have two reference tables for the same thing.

Let’s say we’d move all metadata into sys_file.
Then sys_file would behave exactly like any other table within TYPO3, right?
You have a base-record (parent, master, default-language… the term is interchangeable here) and then overlay certain fields.
So if sys_file would be able to not only overlay title, description etc. but also the folder and filename, that would solve the issue with overlaying. The base-record would always be references and the ability to overlay parts of the information would still be given.

I guess Anja wants to achieve the same goal, but by using sys_file_metadata as that base record.
Which one we choose (sys_file or sys_file_metadata) is pretty much irrelevant (although sys_file might be more straightforward, eliminating another DB table in the process).

What do you think?

In general the topic is very hard to not overlook any detail, I fully agree.

Using sys_file and sys_file_metadata interchangeably is not quite correct currently. It looks like it is a straight 1:1 relation, but taking the overlay-ability of sys_file_metadata into account we actually have a 1:n relation here.
Making sys_file translateable would indeed make this a true 1:1 relation again and we could use them interchangeably.

I’m not sure if we need the additional field in both tables, actually. Rephrase file_source of sys_file to l10n_parent and we have all we need.

I write an example DB state covering usecases we have, so we talk about the same. This is how I would imagine those to be stored.

  • menu_en.pdf:

    • sys_file: uid=123, sys_language_uid=0, l10n_parent=0
    • sys_file_metadata: uid=678, file=123, sys_language_uid=0, l10n_parent=0
  • menu_de.pdf:

    • sys_file: uid=145, sys_language_uid=1, l10n_parent=123
    • sys_file_metadata: uid=687, file=145, sys_language_uid=1, l10n_parent=678
  • steuernAT.pdf:

    • sys_file: uid=146, sys_language_uid=1, l10n_parent=0
    • sys_file_metadata: uid=88, file=146, sys_language_uid=1, l10n_parent=0
  • bossAtWork.jpg:

    • sys_file: uid=150, sys_language_uid=0, l10n_parent=0
    • sys_file_metadata: uid=54, file=150, sys_language_uid=0, l10n_parent=0, title=“boss at work”
    • sys_file_metadata: uid=55, file=150, sys_language_uid=1, l10n_parent=54, title=“Chef in seinem Element”
  • tt_content EN:

    • tt_content: uid=932, sys_language_uid=0, l10n_parent=0, header=“This week’s menu”
    • sys_file_reference: uid=256, sys_language_uid=0, l10n_parent=0, sys_file=123, uid_foreign=932
  • tt_content DE (connected mode):

    • tt_content: uid=933, sys_language_uid=1, l10n_parent=932, header=“Menü der Woche”
    • sys_file_reference: uid=257, sys_language_uid=1, l10n_parent=256, sys_file=145, uid_foreign=933
  • tt_content DE (free mode):

    • tt_content: uid=943, sys_language_uid=1, l10n_parent=0, header=“Steuern in AT”
    • sys_file_reference: uid=259, sys_language_uid=1, l10n_parent=0, sys_file=146, uid_foreign=943
  • tt_content EN:

    • tt_content: uid=952, sys_language_uid=0, l10n_parent=0, header=“Boss at work”
    • sys_file_reference: uid=286, sys_language_uid=0, l10n_parent=0, sys_file=150, uid_foreign=952
  • tt_content DE (connected mode):

    • tt_content: uid=953, sys_language_uid=1, l10n_parent=952, header=“Chef in seinem Element”
    • sys_file_reference: none (= fallback to english file above, but with metadata translated)
1 Like

Another con:

  • Significantly increased complexity in an area that is not well streamlined and complex already.

As far as i can see, it is also left open where the language variant files are actually stored. Are they prepended with something?

How would the indexer deal with those? Are changes needed in this area, too?

I can not really judge on this proposal with my current limited knowledge of language internal details, especially in the inline / fal scenarios.

This whole thing is a big dilemma. IMO we desperately need translatable files, like last year. But I would really appreciate if we could get record relation handling - specifically in combination with localizaion - right first, before we use this concept for FAL.

I guess we have to find a good compromise on how to implement this, but IMO this is a must-have, no matter how breaking this is.

1 Like

I have the same feeling here, that we should streamline the generic handling first. Last week I tried to visualize the dependencies between entities - but failed since I could not define for 100% the correct behavior for all TCA types for both localization modes (connected & free).
This topic is continued inside relations in FlexForms and their sections…

To me personally the point of the discussion should be

  • Identify problems
  • Figure out solutions together

I have to admit that I have a hard time dealing with the no-movement-attitude which basically just says “don’t touch it”. This will not get us anywhere.
All approaches listed here have been tried earlier with no outcome.
Listing them again will just leave us in a stale state not bringing anything new to the table.

Let’s try to be solution oriented rather than trying to find yet another reason not to touch things.

My point is: Please don’t introduce new stuff, that potentially causes inconsistencies that have to be cleared up later on. The fact, that we still have to deal with separating the connected mode from the free mode in the core (interpreting l10n_parent, sys_language_uid and a potential new field l10n_source) one year after that had been introduced should be avoided.

I’ll try to collect the missing parts for this very topic during this week.

1 Like

First of all, thanks Anja for picking up this long-neglected topic. I also like the concept, although I don’t agree with some details.
Actually, I had a concept for file localization back during the initial FAL development, that I discussed with several people and also presented e.g. at the camp in Stuttgart. I did not manage to get anything done code-wise, for a number of reasons, one of them being my general fa(l)tigue after the 6.0 release.

What I don’t get about the current concept: why do we need new fields in both sys_file and sys_file_metadata? IMHO it should be enough to change the file pointer in the metadata record of the translation. This would happen automatically behind the scenes when uploading a new file in a translated metadata record.
In all other cases, if an editor chooses a different file (uploaded and indexed before) for a translated metadata record, the metadata records of that very file should be thrown away.

IMHO this is also much more consistent with our general approach of separating (physical) file-related and content-related data into the two tables sys_file and sys_file_metadata. (Alas, this distinction has already been diluted by the width and height fields added to _metadata, although these are properties of the physical file).

Another big obstacle I saw back then (and still do now) was with multi-tree sites, where each individual language tree uses sys_language 0. If we still use sys_language records for all of the file languages, how do we match them to the individual trees? FE rendering only uses sys_language 0 in this case.

As far as I see, there would be no changes required to the storage. The files are still stored as regular files.

Indexing the files however might be a bit tricky, as we would have to have a mechanism to determine the language of the file on indexing either automatically or by a user choice. Even if done during indexing, a user would also need to be able to change that later on again. As you point out, this seems to be totally unclear concept-wise.

+1

tl;dr: I like the concept, but I think it can be radically simplified.

From my understanding these are necessary to keep multiple translated files “together” somehow.
Because you want to keep the “original languages” file references with a content element and have the frondend overlay these as necessary.

I get that, but still: why isn’t the combination of metadata:file and metadata:l10n_parent sufficient here?

If file and metadata became more like a 1:1 relation with the metadata.file pointer being able to be NULL then that could work… or we drop the metadata table altogether :smiley:
I guess Anjas original intent was to make the change in behavior more explicit to easy the transition period.

Regarding the question of “where do we store the files”, in the end it doesn’t matter, right?
So we could put the “translated” files into the same folder/bucket next to their originals, as long as we don’t show them anywhere in TYPO3s interface.
So either an explicit flag Hide in User Interface or implicitly by checking l10n_parent would work fine for me - in the end it’s just another “enablefield”

To fully get the intended approach of Anja I just started to write some stuff down that came to my mind. And during this process I came to the conclusion that the basic approach makes sense but there are only a few small implementation details that need some adjustment IMHO :slight_smile:

So here my write-up:

The idea behind the FileAbstractionLayer “currently” implemented in TYPO3 is that of every fycical file present in the storage a sys_file record is present and that record basically only holds a small set of data to indentify a file and have these info the same regardless of the storage type the file is in. So basically for every file in your fileadmin folder a record is present in the sys_file table.

The sys_file_metadata record is there to add/keep some extra info about the file. And as these a mostly text values these can be multiple versions of the metadata, one for each available language (sys_language_uid). But this sys_language_uid doesn’t say anything about the file itself. It just tells us in what language the metadata is.

So IMO we shouldn’t touch the metadata translations. It’s just some extra descriptions of a file where these descriptions can be available in every sys_language_uid available.

So now we have to find a way to bind a sys_file record to a certain language saying this file is in language x (because some title is in the picture or because it is really a text document in that language) so basically some extra property of sys_file_metadata. Let’s call it contents_language. This is a value that shouldn’t be translatable just like the file dimensions etc.

Next we need a way to group files together.

You can not use the translated sys_file_metadata record for this as this is only a translation of the original metadata belonging to the real file (sys_file record). The file with a different contents_language has again it’s own metadata and translated metadata. Because who says that the metadata of both files are the same. Probably both have different authors etc. So both need to have there own metadata and translated metadata. Hiding these files in the filelist would just make it harder for the editors.

The UX part could work, having the option to upload a different file when editing the metadata but that should only be a entry point to easily upload a file and link them together. You should also be able to just select a file that’s already present to link them.

To link them together the file_source field could make sense here. But then again just like contents_language only on sys_file_metadata and not on sys_file. Maybe we need to name file_source a little different so we know it’s a language variant relation.

Impact:

  • filelist and filepicker need to be adjusted so you can see in what language the file is (contents_language).
  • FE file selection needs to be adjusted so you get the correct file variant when present (this will be very hard in extbase… but there it already doesn’t work…)

Indexing doesn’t need to get changed for this.

1 Like