Adding a "smart database cache" for xautoload.
xautoload , similar to the Symfony ClassLoader component, has different cache decorators that remember class files between requests: ApcClassLoader, WinCacheClassLoader, XCacheClassLoader and very recently ApcuClassLoader.
All of them use a key/value store that allows immediate reading and writing of single values.
E.g. on a request with an empty APC cache and 100 classes to be autoloaded, xautoload will call apc_store() 100 times, and then the cache is hot.
The next request that needs the same 100 classes will not call apc_store() at all.
The 100 calls to apc_store() are no problem, because apc_store() is cheap, and it only has to process one value (or rather, one class and its file path).
Since APC or similar solutions are not always available, it would be nice to have a cache alternative that only uses the database.
This has one problem, however:
Single read / write operations, or queries over multiple rows, easily get more expensive than the actual class lookup that we want to avoid.
Ideally:
- We want to store all classes in one big serialized array, instead of having a separate database row per class.
- We want to read from the cache not more than once per request.
- We want to update the cache not more than once per request.
- We want the cache to be "hot" after the first request (for the same page).
The 3rd and 4th requirements force us to wait with the cache update until the last class in the request is loaded.
But there is no way to know which is the last class.
There is register_shutdown_function(), but even then we don't know for sure that there are no further classes to be loaded.
This is why until now, xautoload had no database cache.
O(log n) to the rescue
Since 7.x-5.0-beta2, xautoload actually does have a DbCacheClassLoader. So, how does it work?
The trick was to soften up the requirements, but just a bit:
- We want to store all classes in one big serialized array, instead of having a separate database row per class.
- We want to read from the cache not more than O(log n) times per request.
- We want to update the cache not more than O(log n) times per request.
- We want the cache to be "hot" after the first O(log n) requests (for the same page).
(where n is the number of classes)
In Computer Science we really like O(log n). You can almost treat it like a constant. E.g. some time ago, computers were built with the assumption that every byte of memory can be addressed with 32 bit. It was the approximation that log n is always <= 32. Which is true, as long as n is <= 2^32 == 4.294.967.296.
Or, if the prices for n cars were log n, then you could buy all cars in the world for 30x the price of a single car.
With the modified requirements, an implementation becomes realistic.
Let's have a look at AbstractQueuedCachedClassLoader, the base class where the O(log n) magic happens.
The idea is to queue up cache updates, with a growing queue size. Cache updates happen on the 1st, 3rd, 7th, 15th, 31st, 63rd, or more generally, (2^n - 1)th class that needs an update.
We always load the cache immediately before writing, to not discard updates from other processes.
The result:
- The number of cache write operations is no more than O(log n) per request.
- The number of cache read operations is no more than O(log n) per request.
- The number of classes still waiting in the queue is always smaller than the number of classes already written to the cache during the request.
- Therefore, every request halves (or better) the number of leftover classes.
- Which means, the cache will be "hot" after O(log n) requests.
The overall number of write operations across requests has an upper limit of O(log n) * O(log n) = O(log n * log n) = O(log (n+n)) = O(log n).
The cost for each write operation is O(n), which is the size of the serialized array.
So the overall cost across requests is O(n) * O(log n) = O(n log n).
This is totally nice and ok.
Ta-daa!