TYPO3 caching framework migration to PSR-6 and Doctrine caches


(Claus Due) #1

This thread is to gather feedback and advise in the currently ongoing process of converting TYPO3’s caching framework to use a PSR-6 implementation, as well as improve the caching framework API and behavior. The current status of this work can be seen at https://forge.typo3.org/issues/81432 and is divided into neat little sub-tasks, some of which can and are being done separately and benefit us regardless of the PSR-6 migration.

The article is rendered as Gist over here, so please read it there - but feel free to comment on the Gist or here, or both places, as you wish - I will of course read both.

What I’m looking for in this thread is:

  • Possible flaws in the vision and strategy (not theoretical ones please, argue concrete use cases)
  • Any experience you may have and want to add about real-life PSR-6 implementations.
  • Your thoughts pro/con about the suggested strategy for deprecation
  • Your thoughts on the chosen strategy for replacement (preserve frontend public contract, replace backends, use bridges)
  • Advise you might have about the concrete implementation of the yielded features; standards we could adopt, interfaces that could make sense, API that I did not think of, and so on.

What I would like to avoid in this thread is:

  • Opening a design by committee (please don’t argue “I think it should be so and so instead” unless you have a real life use case that conflicts with the described choices)
  • Focusing on specific details instead of how the bigger picture fits together (please consider all the yielded features along with the described choices for migration etc.)
  • Edge case objections that are not based on real use cases
  • Going into details about semantics in provided code examples or implementation details - please reserve this for the code review :wink:

The mic is yours! :slight_smile:


(Benni Mack) #2

Hey Claus,

sorry for the late reply, here are my thoughts on your approach. My thoughts do focus on specific details, although you stated you wanted to avoid details in this thread, but I could not help myself but wanted to answer:

TLDR
In general, I’m for using PSR-based libraries which let’s us focus on Content Management, and not framework specific implementations like cache backends ;). OTOH our caching framework has proven to be reliable and really stable, but I do miss proper two-level caching mechanisms and the available different “cache frontends” seems strange.

File-based caching by default
Although you’ve run your stats on several machines (dev and prod), I’m not sure about the impact on having file-based caching by default, actually. I also have some NFS-based installations lying around. Especially setting up cache_rootline is a pain on FS by default, I’d say. A classic example I’ve seen in several companies: On each deployment, you flush all caches, and the rootline cache on the first page with 200 links on it needs to be rebuilt, taking a lot of time in an installation with 20K pages. I actually have some special cache warmup scripts running to overcome this issue, which is already needed for DB only already, but I’ve seen really bad FS things, not too long ago (see below). I would need to test this case with FS based caching, try with Blackfire on a new DO server with latest master.

This case hits especially hard when doing content management (BE work which permanently clears cache when changing or moving pages) and FE requests with high traffic at the same time - not takling about distributed systems. I’m sure @liayn has some things to add here.

So, you asked for a real life example. Helmut and me spent several weeks to get rid of the file-based “class loader cache” in TYPO3 6.2.8 which just did a lot of FS work, which is IMHO unnecessary bloating any server. After clearing the system cache, a new FE request took 10secs because of disk IO to create several hundred classes necessary to be used. Of course, we don’t have the class loader cache anymore but going FS first is IMHO not feasible (maybe for cache_pages and cache_pagesection). So, you claim you have the horror stories, I claim that I did and do too. What now? :confused:

The main benefits I see of going FS-first would be to put down the work of the DB load and to keep DB dumps small :wink: but I’d rather not trade that for performance - by default.

Default Configuration For All Caches
Going further, the “default configuration” approach is exactly the opposite of what I proclaim to most of the projects I do consulting on - think of how big your single cache will be and how much data should go, then see what fits and you have available:

  1. Use php-apc for small but fast caches (and FS backup when doing custom 2nd level caching), same should go for fluid caches IMHO.
  2. Use redis for larger caches, think in memory (which is still faster than FS or DB today, as far as I know), and fallback to DB instead, so that’s kind of tricky to handle. However I could see something like “presets” available which could then be applied to the cacheConfigurations.

I’m still unsure how to properly override everything without having side effects. Example:

  • A site owner can configure a default configuration in LocalConfiguration.php
  • An extension (no TER extension, but a project-specific one) brings an additional cache with it (e.g. “cache_benni”) which is set to be stored in DB
  • How would the site owner configure to override this configuration then?

Can we deal with that “by design” to ensure what the desired behaviour is?

Re-reading your Gist document again, I actually worry about performance of “FS first”, and I do have real life examples at hand, either with distributed systems or local file systems running TYPO3 v7 mostly. Maybe we just need a good detection in the install tool what would fit best (“TYPO3’s best guess for your performance”). Also, let’s share the “horror stories” of DB caches at a beer or too :wink:

Cache Frontends
I revised the part of unifying the Cache Frontends, and your approach seems more than logical to me. This is IMHO actually the most confusing part when explaining the current Cache Framework concept. However, we might could go a more drastic way to actually have a “Cache” object which is the replacement for the VariableFrontend, so we could just deprecate the old Frontends and use a new Cache class that replaces them. Would that make sense? So we drop the content of the “frontends” completely (see below for “PSR-6”) - of course with a proper deprecation strategy.

Tags
Dropping the getByTag() method does make sense and I’m sure we could find a way to replace this functionality (which only exists for AdminPanel within extGetNumberOfCachedPages()) in a better way - just don’t have a good idea at hand.

Locking
Personally, I would not put locking into caching itself - it’s a separate topic. I do understand your use-case but I’ve never seen mixing locking with caching within a library (in 2017). Separation of concern could and should go in the callers code - where it is needed. Especially when writing 500 entries into cache_rootline in one request would make it really slow when having lock/release for each call. I do miss the use-case apart from “Page is being generated” - because we do locking and we could use libraries like https://github.com/symfony/lock.

Questions like “What happens if I hit “lock()” but get a connection timeout? Will it ever be unlocked? What is the state then?” - There would be backends that do not support locking properly (FS-based caching with NFS + locking - enjoy) - or what would the approach be?

PSR-6 vs. whatever implementation we use
Now, I also need to give some clarification on the PSR-6 topic itself, and this is actually crucial and I took that as a misunderstanding for quite some time after reading your docs:

  • The standard PSR-6 does not support tags, it is NOT intended to do so - http://www.php-fig.org/psr/psr-6/meta/#non-goals everything we’d do with libraries would need tagging support! I did not study the library you chose, PHP-Cache, in depth, you did, and if that does tagging, that is just something else than “let’s use PSR-6 and we can plug in everything” - however, I do see the possibility to use a third party library with tagging support.

  • PSR-6 is quite complex with CacheItemInterface which are intended to be objects. Basically our “Cache” / “CacheFrontend” object would need to put our existing key/result value into CacheItems and handle that to the CachePools, same goes when retrieving data. The reason why PSR-16 came to life because of that rather complex PSR-6 setup. Nowadays, frameworks offer adapters to both PSR-6 and PSR-16, which sense IMHO. PSR-16 is however closer to our current Caching Framework (no ->commit()) method etc.).

PHP-Cache / Communicators etc
I think I did not fully understand one part - Using the proposed library we’d have “CacheManager calls getCache fetching the CacheFrontend -> converts to CacheItems => calls PSR-6 CachePool (instead of CacheBackend) => calls Communicator (written by the library) or Doctrine Bridge” - is that right?

OK. Now you have my thoughts. Hope some more will chime in and we can find the best solution out there.


(Claus Due) #3

Thanks for the reply Benni. Just a few comments and clearing up of misconceptions:

We go for the most common use case being most well supported in terms of performance. Is the most common use case NFS shared typo3temp? I’d say a loud and resounding no, it is not. I also argue about this in the article I linked to - we can’t (or shouldn’t) design our out-of-box experience based on what the most “enterprise level” setups require. And all things being equal of course we should provide easy ways to switch (which among other things includes eliminating our different types of cache frontends). You touch on this several times but all examples are, imho, not the common use case.

Let the profiling and reasoning of most common use case decide this one, I say.

This would be a misconception that is very important to clear up. Fallback and default configuration does not imply any of those things:

  • It does not mean that a single cache is shared - it means that configuration can be shared but each cache (as returned from CacheManager->getCache) is still fully unique and has its own name, like now.
  • It literally only means that the currently hardcoded fallbacks that exist in CacheManager as a protected property, becomes possible to configure - see https://review.typo3.org/#/c/53154/ (patch is ready to be merged imho).

If you take a look at that patch you will also see the real number of caches which currently use the exact same configuration by default (currently DB caches as is also the current hardcoded fallback backend). It is this duplicated configuration we get rid of - you, as site administrator, still have all the freedom you did before to select individual backends. Just as a bonus you can also change what backends get used when someone does not explicitly define one.

This is where it gets in danger of adding additional requirements and inventing new features. My vision for this (not stated in the article) is a true cache setup segment of the install tool, one that also is capable of doing rudimentary tests to determine which would be the best engine to use on the site and let you change/configure each cache. I would be happy if we could save that for after all this is done because it isn’t a pre-requisite, but I do have that in mind.

IMHO this would break far too much for no real benefit. It’s easily done to introduce a unified cache frontend that is fully compatible (drop-in replace by configuration!) with every existing cache frontend. The one hurdle I face there is described in patch https://review.typo3.org/#/c/53116/ (need to program to interface not implementation). I’ve of course tested this locally and it is indeed drop-in compatible including for caches that generate PHP code (which is not something the “php-cache” PSR-6 library does natively but something I added as a custom PSR-6 pool for CMS).

Re: locking I realise that can be a controversial topic and one that would invite a lot of academic discussions about the nature of locks. It isn’t essential to the solution and will be made as a separate patch that can be considered on its own. Basically it consists of an interface you add which communicates that you can call simple locking for cache entries. Now, I’m not saying we must store locks along with cache records - I’m just saying it could be easier to handle distributed locks by not detaching the locking from the cache. Anyway, that’s not essential.

It does, but they are used for invalidation - not retrieval of entries. Supports every bit of tagging we currently do with the one exception that you’re not able to read a list of identifiers based on a cache (you may look into php-cache to see how they handle that, TL;DR is they handle a separate list of identifiers and have the list named like the tag, so the technical ability to read a list of identifiers by tags is there, it just is not public API).

And that is how I did it, because that’s the way you need to do it with PSR-6. And the chosen library has an adapter for both Doctrine and PSR-16 (simplecache) so we’re good on that. :wink:

Re: communicators the best way to describe this is to ask you to look into the php-cache implementations to see the constructor argument required by each type of cache pool. If you then perceive this constructor argument as “a communicating object which needs to be initialised and passed when cache gets created” you should see the idea behind decoupling these from the cache framework and make them simple “components” that we configure in the framework and use a very simple API to retrieve. For example: configure a memcached communicator and the API can return an initialised Memcached object based on your configuration. It is, in not so many words, simply removing the configuration of that connection, from caching framework and putting it as a separately configured component (that you can then also use in other code that doesn’t do caching but uses memcached or redis or whatever, for other purposes).

So yeah, kind of but not completely :slight_smile:

CacheManager->getCache still returns a frontend that internally uses a PSR-6 cache pool, which may need to be constructed with a constructor argument that is then configured as a “communicator” (thing that TYPO3 can communicate with and configure with user available options in site config). The communicator itself can then be a Memcache, Memcached, Redis, Predis, Flysystem or any other single API object needed by the specific cache pool. Of course you the admin still has to be aware of this (but I do plan on adding sensible defaults to use default ports and hostnames when not configured).

Hopefully that explains the vision in the darker areas!


(Kasper Ligaard) #4

@namelesscoder: I think we should go with Symfony’s Cache Component, because TYPO3 already uses other Symfony components. The Cache Component states to support both PSR-6 and PSR-16 caching and has many cache adapters - including a ChainCache.


(Benni Mack) #5

TBH I did not check out your patches nor the library, nor tests (as mentioned), I’m also aware we shouldn’t go with NFS-first in mind - so no worries here.

I also need to dig into whether PSR-6 or PSR-16 makes more sense. This is what I have in mind: The “single cache frontend” / “cache” (from CacheManager->getCache) would get an additional interface which is PSR-16 + tags, and the cache backends will go with a PSR-6 based approach, something along those lines (still investigating) - of course keeping the B/C way by adding a new interface, changing the interfaces and dropping the old methods/interfaces in v11 or so.


(Claus Due) #6

The symfony and php-cache components are not so different but there is a key difference in the API, where I personally vastly prefer the API of the php-cache library. For one, the SF library requires a lot of additional constructor arguments when creating cache pools (to govern things that in php-cache are controlled via the way the cache pool is used, not via the cache pool itself).

It is a matter of taste in the end but I can tell you that it would be significantly more difficult to integrate all the different SF adapters due to the rather unpredictable constructor methods. As such, it would also be much more difficult for people to write different cache pools because those too would (probably) choose to use the same complex constructors.

Lastly, it may not fit with the plan to make the Redis/Memcached etc. clients possible to configure separately from the caching framework.

I personally would not choose the SF solution here (although in general I agree with you about preferring SF components).


(Claus Due) #7

The chosen lib speaks to PSR-16 via PSR-6 so by choosing it, I’ve covered both of those standards without writing two implementations. But by all means please do check if you’d consider it making more sense to make our default cache backend a PSR-16 one. I don’t think it does, though - the PSR-6 ones are quite simple to configure and support the same features in the library I chose. No offence meant here but you really do need to read those libraries and patches to get the idea I’m trying to communicate! :wink:


(Markus Klein) #8

I can only fully sign this!

In general those are completely different topics, two tools. But of course you may need both to achieve your application’s goals.
In our case we need locking to regulate access to our shared (between parallel requests on the same server) resources. A cache is just one of those shared resources.

I fully agree that the 90% use case is the single-instance.
I would sort the caching backends in this order in terms of performance: RAM, FS SSD, DB, FS HDD.
So for the 90% use case DB caching seems to be the most feasible one.

A short word on the TYPO3 page caches (and maybe others too) for multi-server setups: Keep caches as local as possible!
This prevents a lot of headache with locking and cache consistency.
(so sharing typo3temp is actually not the greatest idea on earth)


(Claus Due) #9

Full ACK, sometimes it’s cheaper to let each slave simply generate temp files as it needs. Comes with all the usual caveats about matching DB and FS contents if you share the DB, but I’m sure we’re all aware of that.

We will have to let profiling results guide us if we’re going to discuss performance of one over the other. I’m pretty sure you will find that FS (even on standard HDD) performs as well or better than an SQL based cache (which btw also has to run from a HDD of sorts). Yes, the DBMS does optimisations but the filesystem should definitely do some as well if it isn’t a completely stone-age one. And, I’d expect, it does that significantly faster than the DBMS would do. Side info: our caches use MEDIUMBLOB which if I’m not mistaken gets stored as… a file. So in addition to the DBMS overhead, there is still a file to spool, memory-map, buffer and so on.

Anywho, I’ll contribute some profiling results at some point. That should settle this :slight_smile:


(Markus Klein) #10

Sure. I did not state my assumption: I presume a DB server on an average hosting is better equipped than the actual webserver.

Profiling this could get really tough, we would need to do that on the majority of popular hosting services.


(Kasper Ligaard) #11

@namelesscoder : I still think Drupal uses Symfony’s caching and I see that as a plus too. Bringing the two major enterprise focused PHP based CMS’es closer together around Symfony as worthwhilef. Thus, I see a few extra arguments in Symfonys cache implementation as a fair tradeoff.

Following Symfony here would make release planning a bit simpler, since only their release schedules would have to fit. Having many different projects increases the likelihood that releases become incompatible - with all the headaches that introduces.

I am also a bit wary of a project with just a few contributors - escp. when it seems that Symfony is the only framework they have integrated with.


(Benni Mack) #12

Good point here. Although I tried to keep the “which library to use” out of my topics about the approach, the library itself is just 1.5ys old and has no stable release seen yet (sitting at 0.4.0 currently). OTOH Tobias and Magnus are quite prominent PHP devs + good doc making, but that would not justify taking that over a stable lib like doctrine/cache or symfony/cache.

To be complete:

  • Tagging possibility
  • Performance
  • PSR compatibility
  • Track record / stability / future maintenance of the lib
  • License

should be taken into account here, otherwise just replacing our caching framework with an external library isn’t worthwhile. For all of the above topics our caching framework has been proven to be working so far.


(system) #13

This topic was automatically closed after 28 days. New replies are no longer allowed.