In another thread, I have been putting forward the idea of using a serialization library to transition the global
TYPO3_CONF_VARS structure into injectable objects. The benefits of that are many, but mainly boil down to “PHP syntax does 90% of the work for us, eliminating a huge host of potential bugs and ugly code that tries to account for those potential bugs.” It also allows for configuration objects to be injected, rather than read via globals, which makes anything configurable more unit testable.
One of the points raised in that thread is that while such an improvement is all well and good, the overall configuration story of TYPO3 is a bit of a mess anyway and such a change should take place in the context of a broader redesign of the concept of configuration. Fair enough, and I can’t disagree with that position.
To that end, I want to take a survey of the current configuration story in TYPO3, as of v11.5. That will help us get our collective heads around the status quo. This survey may have missed some bits, but I have tried to be as complete as possible. Thanks to Benni Mack and Helmut Hummel for their input and review.
I warn you, this is a bit long, because there’s that many different configuration systems to worry about today…
TYPO3 has, by my count, six distinct and only partially overlapping configuration systems.
The global conf-vars array is TYPO3’s main configuration mechanism. It consists of one gigantic global, mutable array, and goes through several stages to get built up. It contains all sorts of different kinds of configuration, some of which should be environment-dependent and some not. The following is a simplified, high-level picture of how it works.
sysext/core/Configuration/DefaultConfiguration.php is loaded, which provides a default definition for most configuration.
- It returns an array, which becomes
- Users SHOULD NOT edit this file, ever.
- This happens on every request.
Second, an install-specific file,
typo3conf/LocalConfiguration.php, is loaded. It is populated on a new install by copying the
- This file also returns a deeply nested array of the same structure as in
DefaultConfiguration.php. Some sites implement custom utilities that get used in this file.
- The system does a deep merge of this file with
$GLOBALS['TYPO3_CONF_VARS'], resulting in a single combined array.
- Users SHOULD edit this file to suit their needs.
- This file SHOULD be stored in Git.
- This happens on every request.
Third, an install-specific file,
typo3conf/AdditionalConfiguration.php is loaded. It is populated by the installer, and by default contains environment-specific overrides.
- This file does not return anything, but modifies
$GLOBALS['TYPO3_CONF_VARS']directly. Some sites implement custom utilities that get used in this file.
- Users SHOULD edit this file to suit their needs.
- This file SHOULD NOT be stored in Git.
- This happens on every request.
Fourth, extensions MAY declare an
- These files get concatenated together by the system once at build time (give or take some tweaking to make that concatenation possible).
- These files do not return anything, but modify
$GLOBALS['TYPO3_CONF_VARS']directly. Some sites implement custom utilities that get used in these files.
- The combined file is loaded on every request.
- Users SHOULD NOT edit these files to suit their needs.
- These SHOULD be stored in Git, or downloaded along with the extension from TER or Packagist.
TYPO3_CONF_VARS array is not self-describing. Arrays are inherently not type safe, and keys are frequently non-obvious in intent. Documentation is provided out-of-band via the
sysext/core/Configuration/DefaultConfigurationDescription.yaml file, which is kept in sync with the code (or not) manually.
One tricky part of the
TYPO3_CONF_VARS array is that, because it contains a variety of different types of information, some parts of it are required for loading future parts. Therefore, there are portions of the bootstrap process that rely on it being already available and still mutable. This is for example the error handling configuration and the logging configuration.
Extension code reads from this array directly, and is responsible for its own default and type safety handling at each call site.
There is no built-in environment specific “switch” for dev/test/prod or similar for configuration that does need to vary in that regard. Different extensions roll their own in various ways.
Some of the configuration specified here is callbacks or other executables that get used at random places in the code. This includes the old “hooks” system, but not exclusively that, technically.
Configuration here is mostly not editable from the UI, except for some basic settings in the Install Tool section. Those require the install tool writing back to the
- Both extensions and site admins get “free reign” to adjust the system configuration as they see fit.
- There’s a single, relatively simple mechanism to consider. (Simple in the naive sense, not in the resulting usage sense.)
- It’s extremely hard to document.
- It’s extremely hard to learn; the developer has no built-in way to “learn as they do” through, eg, IDE autocomplete or inline documentation.
- It’s not self-documenting or self-enforcing, meaning data in it is never guaranteed to be even remotely close to the structure or type that is expected.
- The previous point means error handling must be implemented at every single read-point. (The majority of PHP 8.0 compatibility issues were caused by this point.)
- Because it mostly happens at runtime, the cost of building the array is born on every request.
- Because it mostly happens at runtime, the memory overhead of a giant global array is born on every request.
- Environment-specific information (DB credentials, API keys, error reporting, etc.) is comingled with environment-agnostic information (backend configuration, site name, password hashing, form engine configuration, etc.)
- Information that most sites will want to customize (site name, log configuration, etc.) is comingled with information extremely few sites will want to customize (form engine configuration, HTTP clickjacking protection, etc.)
- Global variables hinder testing.
- Global variables reduce flexibility by making it impossible to have two instances of a service with the same code but different configuration.
- Because the global array is mutable, nothing prevents an extension from altering the global array at any time, making all data in it unreliable. The system is built to enable “Spooky action at a distance” (SAAAD).
This file is an alternative way for extensions to provide their own extension-specific configuration. Each extension MAY define this file, which is in a proprietary schema format (see the docs). It allows an extension to define a single logical configuration object, consisting of one or more properties grouped into one or more categories.
This schema is used to auto-generate an admin form for editing the configuration object in the UI.
The resulting configuration object is actually an array, which gets written into
$GLOBALS['TYPO3_CONF_VARS']. When that happens, the
LocalConfiguration.php file regenerated.
Extensions can read their own configuration through
- Extensions get a simple way to define their needed configuration
- Extensions get an admin UI for free.
- Everything described above for
- Proprietary, one-off schema format syntax.
- Writing back to
LocalConfiguration.phpmeans that file has to be mutable at runtime, which is incompatible with many cloud-based hosts or good deployment practices.
The TCA serves several parallel purposes. Primarily, its purpose is to define the entire formal data model of the system. It does so at multiple different levels of abstraction, all in a single global array blob.
The TCA definition is used for:
- Extracting an SQL table definition to auto-create SQL tables.
- Defining what fields in those tables have special meaning for the system (eg, they’re used for human-readable labels, they’re the primary key, etc.)
- Defining what fields in those tables carry data.
- Defining what fields in those tables should have additional validation restrictions beyond what SQL provides (eg, ranges, restricted values, etc.)
- Defining processing rules on those fields (trimming strings, etc.)
- Define how a given field should be displayed in the admin UI.
- Defining complex UI structures (i.e., palettes).
- Opt-in to advanced functionality like workspaces.
- And probably other things I am forgetting.
TCA definition is handled by a series of files, and the specifics have varied over time, with some older APIs still remaining. The current version works as follows:
First, a TCA table is defined in
Configuration/TCA/<tablename>.php within an extension, and is a giant nested array that gets returned. This file is loaded on every request. Sometimes utility functions are used that will pull from the
TYPO3_CONF_VARS configuration to define portions of the array dynamically. (This creates some interesting dependency challenges.)
Second, extensions MAY define a
Configuration/TCA/Overrides/<tablename>.php file. This file is loaded if and only if the corresponding table is defined elsewhere. It does not return anything, but is expected to modify the TCA array directly. Usually it does not access the global itself but relies on a series of static methods on the
ExtensionManagementUtility class, which in turn will modify the global array. This happens on every request.
There is also an
ext_tables.php file for direct modification of the TCA that should be considered deprecated and vestigial.
The TCA data is used for most automation to dynamically build database tables, admin forms, and so forth.
- An abstracted way to define the data model has a number of advantages, in particular the ability to auto-generate storage and forms.
- Allowing extensions to enhance each other’s tables offers a horizontal extension mechanism, which traditional ORMs (such as Doctrine) cannot easily support.
- Most of the drawbacks of
TYPO3_CONF_VARSapply here as well: Untyped, hard to document, not self-documenting or self-teaching, no meaningful error handling, etc.
- Direct mapping of TCA to database tables and fields is both too flexible and too limiting. It makes any kind of table refactoring an automatic API break. At the same time, because the meaning of all fields is dynamic, writing raw SQL becomes inherently unreliable in many to most cases.
- While horizontal extensibility of data objects at a logical level is a useful feature, doing so at the raw storage level is very brittle. Upgrading data types may sometimes break the data in exciting ways.
- Uninstalling extensions may leave data flotsam around the database in other extensions’ tables.
- Information on the data model, the admin form, and the front-end display is all mixed in together with little organization or regard to how it would logically group. That makes the self-documenting problem even worse.
- Because rendering and admin form information is stored in a single array, there is no way to provide multiple alternate formats for different contexts.
In TYPO3 v11, site configuration is handled by a one-off YAML file. That YAML file is editable through the UI via a custom editing page, but also hand-editable on disk. As it does not use the
TYPO3_CONF_VARS array, it is insulated against arbitrary extensions altering it. However, the fact that it is editable both from UI and code creates potential synchronization problems, especially with cloud-based hosts. (More on that later.)
Extensions are free to setup their own tables and write to them themselves if they wish, and some do. Details here vary so it’s difficult to say anything more detailed.
In particular, extensions MAY ship an
ext_tables.sql file. This file contains MySQL-specific syntax for table creation (and only table creation). It is not executed directly, however, but custom-parsed and converted into Doctrine table definitions, which are then used to populate the database. The Doctrine table representation is used for automatic migrations and schema updates where possible.
If a table is mentioned in the TCA array and also in
ext_tables.sql, a series of additional control columns will be added to the table definition by TCA. (See the docs for the full list.)
Extensions may also include an
ext_tables_static+adt.sql file, which is a raw SQL dump that will be imported directly without processing.
- Extensions are free to do whatever works for them.
- Doctrine table updates make some (but not all) extension updates straightforward and often no-effort in simple cases.
- This can be a lot of work for extension authors setting up custom tables, forms, and workflows that they shouldn’t need to spend.
- Extension authors need to create a lot of files to create or extend database tables (either managed via TCA or not), which need to be kept in sync (
- Configuration stored in the database is not deployable from dev/staging to production.
- This information is entirely opaque from any common automation mechanisms.
Both TYPO3 core and any extension may include a
Services.yaml file, which is used by the Symfony Dependency Injection Container to configure the container. This is a different form of configuration than the others listed here, but is still technically a form of configuration. In some systems (such as Symfony framework itself), the main configuration system feeds into the Container configuration as additional constructor arguments to various services.
- As the container is compiled, that has the advantage of practically no runtime overhead to configuration.
- A cleanly injected system is, by definition, highly reconfigurable. It also encourages highly-testable code.
- The “inject into container” approach has scaling limitations when done with primitive values.
- Reconfiguring the container, outside of specific injection points as noted above, can be a very cumbersome and error-prone process, especially if multiple extensions try to manipulate the same definition.
The above status quo is highly suboptimal. From people I’ve spoken to about it there doesn’t seem to be much disagreement about that. How to turn it into something more optimal is a more interesting question. It starts, of course, with determining what would be more optimal.
In abstract terms, I want to lay out the different categories of configuration, as they apply to any system. As one would expect, there is some overlap between different categories because software development is hard. (This list is heavily based on my earlier talk on “Building a Cloud-Friendly Application,” (slides, video) and the experience of watching Drupal build a configuration system ground up from a very similarly disjointed status quo.)
Application configuration is configuration that applies to a given install, and to all copies of that install. Examples include the site name, the template engine in use, the form engine configuration, etc.
Depending on the system, sometimes this information is configured by a site administrator via code, and other times via a UI of some kind. Both have their pros and cons, but the major difference is that code-based configuration is very easy to deploy via Git, and UI-based configuration is extremely hard to deploy via Git. “Deploy via Git” in this case includes cloud-based web hosts, which are an increasingly significant part of the market and so compatibility with them is a must-have for any modern system.
While there are ways to “have your cake and eat it too” in this regard, they tend to be highly complex. Drupal, for instance, has an elaborate system of importing and exporting configuration between a database key/value blob and YAML files. The YAML files are Git-deployable, and then synchronized with the database blobs. At runtime, the code reads from the database blobs only. The code to manage that is highly non-trivial.
This class of configuration has most of the same properties as global configuration, but may appear an arbitrary number of times. For example, the site name is a global configuration value, as it appears only once. Configuration of a particular language is an instantiated configuration, as there may be any number of languages defined on a particular install, each with its own unique identifier and settings.
All deployment questions that apply to global application configuration also apply to instantiated configuration just the same.
I would argue that content type definitions should be in this category, although currently TCA is effectively in the previous category.
Environment-specific configuration is what it says on the tin: configuration that is specific to a given environment, and thus MUST change from production to staging to local-laptop environments. Examples here include database credentials, API keys, search server or cache server credentials, etc.
Environment configuration that lives in Git is an error. Always. Doing so hinders compatibility with cloud-based hosts, as in those situations the number and configuration of different environments is dynamic. The industry standard way to manage such configuration is via
.env files, for which there are ample existing libraries. The
.env file is not committed to Git but used on development environments only. In test, stage, and production environments the Unix environment variables are read directly instead.
(There are a few systems that work by storing the
.env file in Git, and providing an additional
.env.local or similar file to override it, because the
.env file contains some non-env-specific configuration as well. Symfony is one such system. Such systems are wrong.)
A popular alternative way to provide environment-specific information is a dedicated executable file (PHP in our case), which is not committed to Git and can either contain hard coded values or bridge to env vars, as appropriate. That is effectively what
AdditionalConfiguration.php is today, although somewhat clunkily implemented.
I would argue that such an executable override file is mandatory for dealing with cloud-based hosts. Cloud hosts often have their own environment variable format, so some degree of glue code to shuffle that into the format the application expects is necessary.
An interesting intersection here is environment-specific instantiated configuration. For example, sites configuration includes a domain name or path root for each site. However, that information necessarily varies from one environment to another.
The information currently exposed by the
Environment class falls into this category, but is only a subset of this category.
This isn’t really a type unto itself per se. There are many cases where application-level configuration should also vary by environment, but not because of details external to the environment (eg, connected services). Examples here include
- Disabling caching in dev but enabling it in staging/prod.
- Appending “DEVELOPMENT” to the site name in non-prod environments to remind you of which site you’re on (so you don’t do something on prod you don’t intend to).
- Adding debugging symbols to generated output, such as templates.
- Changing error reporting from very verbose (dev) to log-only (prod).
- Enabling deprecation warnings (dev) or not (prod).
The most common approach here is to have configuration defined in “chunks”, and allow each chunk to be overridden per-environment with extra config files. The chunk size can vary from an individual property to a large portion of the configuration depending on the system. Symfony is the most commonly recognized example here, with a common configuration directory of YAML files and then separate
prod directories that can per-property override some value. The env-directory values override those in the base configuration.
This is not really configuration, but is sometimes conflated with it so I mention it for completeness. It is common to have non-content state data that is distinct from configuration, and is specific to a given copy of an application. The most natural example here is flood control, where event data has to be tracked in a given instance at reasonably high speed, but that information should absolutely not be replicated to other instances (dev, staging, etc.), and deployment is irrelevant.
Putting all of that together, then, we end up with a series of permutations of configuration.
- GUI editable, Admin editable
- Git deployable, not Git deployable
- Environment-specific, Environment-type-specific, Environment-global
- Instantiated, global
That is 2x2x3x2 or 24 possible categories of configuration. Supporting all of them is, I would argue, both unwise and unnecessary. Some combinations are just impractical to implement, and others are possible, but unnecessary in certain types of systems.
In a perfect world, we would be able to have all of the following attributes in a configuration system:
As a general rule, the more validation you can push to the language syntax the less code you need and the fewer opportunities there are for bugs to appear. The difference between an integer and an array is quite significant, and random nulls popping up in unexpected places is the source of a huge number of bugs.
Ideally, by the time you are reading a piece of configuration your code should be able to rely on it existing, of the right type, and have a valid value, even if that value is a default.
By necessity, this requires explicitly pre-defining the structure of configuration.
In addition to type safety, ideally configuration structures are self-documenting. That applies both for those setting configuration and those reading configuration (i.e., code).
On the read side, explicit typing covers this fairly well. Combined with any reasonable IDE, explicit types with a few well-placed code comments make this a solved problem.
On the authoring side, if configuration is done through the GUI, the GUI is responsible for providing good documentation. If configuration is done through files, it becomes a more difficult question as most config file formats are not particularly self-documenting. XML is least-bad here because it can have a schema. YAML is the worst as it has no official schema, although JSON Schema can sometimes be used in a pinch. Not many IDEs read that, however. (There is https://www.schemastore.org/, which is used by VS Code and PHPStorm, but it’s still less than ideal.)
Configuration, by design, changes rarely. Therefore it’s write speed is not particularly important. Read speed, however, is critical, as whatever the read cost is will be born on every request. (This is more of an issue in PHP than other languages thanks to its shared-nothing design.)
That means while configuration may be editable in some human-friendly format like YAML, at runtime it needs to be read from something faster. Hard-coded PHP itself is the fastest option, if that can be pulled off.
Configuration data is data that takes up memory in the process. The more of it you have, the more memory it consumes. Memory usage should be kept as low as possible, but not to the point that it hinders other factors.
Of note here, objects are substantially more memory efficient than PHP arrays (about twice as efficient, in fact). Arrays stored completely statically in code are even more memory efficient, as their cost is born only once rather than per-process, if and only if their data is never duplicated into process memory, even accidentally. (That’s easy to do accidentally.)
While configuration state is to an extent naturally global, that is undesirable from a code perspective. A given code component (class or otherwise) should never be reading from global values directly. Instead, it should have meaningful values passed to it, dependency-injection style. That has two benefits:
- It’s trivial to create two or more copies of a class with different configuration, even if only one of them comes from the global configuration definition.
- It’s trivial to test a given component under a variety of configuration settings.
This point holds regardless of whether the data is clustered into objects or passed as primitives.
Spooky action at a distance is something to avoid, generally speaking. That means it’s best to avoid allowing a particular piece of code at runtime to modify the configuration for its request only. That breaks all sense of encapsulation, memoization, caching, and basic predictability. While this is rarely done, it’s something that should not be allowed as it can create all kinds of exciting race conditions.
The easiest way to change settings is through a GUI, with nice user-friendly messages, documentation, and validation.
If configuration is edited via files on disk, this becomes trivial. There may be a compile step to turn an editable file into a fast-read format (eg, container configuration, perhaps even code generation), but as long as that can be reproduced from the sources on demand that’s a minor issue.
As noted above, this attribute is, usually, mutually exclusive with “Easily editable.”
If you’ve actually read this far, you have my thanks and appreciation. It’s been a ride. My intent is to ensure that everyone is on the same page and speaking about the same problem space the same way.
As this is already rather, um, long, I am going to pause here. In a day or two I’ll post another follow up with possible directions we can take and my recommendations around them. Stay tuned.
(In the meantime, if there’s nuance or detail you can add to the above, or wacky things you do in extensions that you’d like a better solution for, this is the place to mention it!)