Interfaces? Composition? Cake person, or cookie person?
I was going to post this as a response to #2208429-145 on drupal.org, but then figured this is waaaay too long. And too full of myself, possibly.
I then decided to instead post a shorter comment, and put the rest here.
Since I am too lazy to write "real" blog posts, maybe my blog is still good enough as a "dumping ground" for stuff like this.
What is this about?
I was following the issue, Extension System, Part III: ExtensionList, ModuleExtensionList and ProfileExtensionList for a long time. Or rather, I was getting email notifications. But did not really read much of it.
Until a few days ago, where I had a look at the patch #129.
As with most of the code I see anywhere, except my own (of course), I found the classes are too big. And they lacked interfaces.
Maybe this is just how the rest of the world operates. But I am not happy with it. So I made a patch (#144).
I am not saying that I am right, and everyone else is wrong. We all have evolving opinions. What seems right today might seem like poor judgement tomorrow.
Except that I am right, and everyone else is wrong.
It turns out we have different ideas when and how to use interfaces, and what is "too big".
Can you please elaborate why you think its helpful to add interface for every small bit, even if stuff will never be replaced in its implementation? Its harder to follow through multiple layers than one, as long each individual method is still simple. I mean no question for something which is meant to be flexible I totally get that kind of architecture, but this is more even like a
tldr
Besides "real-world" swappability, components should depend on interfaces not implementations because:
- The implementation of the dependency can be swapped out for a mock object in unit tests. (Yes you can also mock implementations, but it is cleaner / more transparent with interfaces)
- Refactoring can replace one implementation with another.
- To understand the depending component, you don't need to understand every dependency's implementation, only their interfaces.
- Partial interfaces reduce the requirements for a mock or a replacement implementation.
Longer version
I will attempt to approach this from multiple angles. The "book" angle, the "personal experience" angle and the and the "practical" angle.
Anyone who is not familiar with it, should read about SOLID on Wikipedia or elsewhere. Especially the D about depending on interfaces ("abstractions"), not implementations ("concretions"). And the I, saying that multiple smaller interfaces are preferable to one bigger interface. The S, about why we should split things up. And "Composition over inheritance", why splitting things up is preferable to inheritance hierarchies.
I assume most people reading this are already familiar with these ideas, but have their own ideas how and where they should be applied in practice.
There are also some interesting questions on programmers.stackexchange.com, such as Do I need to use an interface when only one class will ever implement it?. Spoiler: There is more than one valid answer / opinion. The discussion is not over :)
For me personally, the main question is:
- Looking at a piece of code, how much other code do I need to look at to completely understand what it is does?
- How much other code could possibly suffer, if I refactor this piece of code?
- How many possible states / configurations do I need to consider, that would cause a different behavior and alter the effects of this piece of code?
- What is the smallest piece that I could test in isolation, if I wanted to? (yes I really should unit-test what I can. But even if I don't, knowing that something is easily testable in isolation is already an indication of quality)
- Could I unit-test this in my head ?
In a lot of Drupal core and contrib, the answer might be one of:
- All of this Drupal installation, because of some global state dependencies and side effects, the order in which stuff is initialized on bootstrap, etc.
- Any possible Drupal installation now or in the future, with any possible combination of modules and module versions.
- The entire function, which just happens to be 200 lines and has 20 local variables.
- The entire class (hierarchy), which just happens to have 20 methods and 11 properties, and a bunch of subclasses with access to all the properties.
- This class and a number of other classes which are tightly coupled to this one.
- This class, with every combination of "property $a initialized, but property $b not yet initialized."
But to be sure, one needs to scroll a lot, and look at all the methods. Because within the class hierarchy, everything is visible to everything else. Also thanks to the "protected" specifier, which I prefer to avoid.
In all these cases, the amount of code one needs to look at to understand snippet X is more than fits on one screen, and more than what fits into one "page" in the brain. My brain, at least.
This leads to bugs being easier to occur, and harder to spot.
On the other hand: Code that follows the SOLID principles (esp. S, I, D) allows looking at snippets / small components in isolation. And if we do it right, these components will be small enough to fit into one page of the brain.
Of course you have to allow yourself to focus at this one component and stop worrying about the rest for a moment. If you continue trying to understand all the components at once, there will be no benefit.
We have to think of small components not as a part of a whole, but as a standalone thing.
In a traditional design, we might have a class "Münchhausen's mansion" with a method dealing with the main door. The method is fully aware that this is the main entrance to the mansion of Baron Münchhausen.
With a "SOLID" design, the door would be a separate class / object. This gives us the possibility to install this door in any place we want. But we don't care. We only ever want to install this door in Baron Münchhausen's mansion. The real benefit is that we can think of the door without thinking of the mansion. And we can think about the mansion without considering the internals of the door. This is already a benefit which justifies making it a separate thing.
Of course, other benefits would be that we can test the door separately. Or in a metaphorical sense, we could uninstall the door, and take it to the workshop for maintenance. Or we could replace it with a different door. All of this without "hacking" the mansion.
Back to #2208429 ExtensionList, ModuleExtensionList and ProfileExtensionList
Back to the original problem. I found that ExtensionList from patch #2208429-129 is too big.
It has 11 protected properties. Or more, if we look at the sub-classes.
It has 20 public or protected methods. Or more, if we look at the sub-classes.
All of these are completely accessible to the entire class hierarchy, thanks to the "protected" keyword.
This means we really need to look at the entire class hierarchy at once, to see what is going on.
Doing this, we find out that actually there is some system in the madness.
Methods do not depend on arbitrary other methods or properties. Instead there is some kind of dependency graph, which is somewhat tree-like. I think it is not strictly a tree, but it is getting close. Information flows from ExtensionDiscovery towards the public methods of ExtensionList, along a clearly defined graph. In between there is some processing, caching and buffering.
The subclasses, so far only ModuleExtensionList, only override the parts that deal with ExtensionDiscovery, and that process the Extension objects.
Unfortunately, this graph of dependencies or information flow is not immediately apparent in the architecture. You need to look at each method implementation, to identify it.
Doing this is possible, but it exceeds one brain page.
By splitting things up, as in #2208429-144, we build clearly visible walls around and between the different steps of the dependency graph, and end up with much smaller components.
The new components are pleasantly small, and they do not care that they are part of something bigger.
E.g. ExtensionListCache is small, and does only one thing.
To understand what it does, one only needs to look at the code itself, and two or three quite simple interfaces.
It does not care where the extensions in $this->decorated->listExtensions() are coming from, and where they go. It also does not care which cache id it is being configured with. This is all Somebody else's problem.
It is also quite similar to other cache decorators.
In fact the ExtensionListCache, the ExtensionListBuffer, and the ExtensionListInterface itself could easily be implemented with generics, if only PHP had those.
But even without generics: Classes and interfaces like this are super fast to write, once you get used to it.
Some other classes introduced in the patch #144 are a bit bigger than the ones above. But they are still smaller than the ExtensionList proposed in ~#129.
Refactoring and modifications
Simple classes and interfaces like ExtensionListBuffer, ExtensionListCache and ExtensionListInterface are also unlikely to change in the future. Functionality is extended or modified by constructing instances with different constructor arguments, by wrapping them into other decorators, or by providing alternative implementations. There is usually no need to change the code itself.
Well, the one thing we might have to worry about is if we change the Extension class, or introduce an ExtensionInterface. But will it really be as bad?
I agree with @dawehner that the last patch has introduced too many classes and interfaces that makes it even more difficult for a developer to understand the extension system. Additionally, the introduction of so many new interfaces bring along the risk of not being able to refactor in future (if people begin to use those interfaces - which may happen even when they are @internal) when the architecture has not been fully completed.
This is a valid concern, and I do not claim that I can completely dismiss it.
But in general, "depend on an interface" is not as bad a kind of dependency.
If we really want to change an interface that is already in use, we can instead create a second interface, and create an adapter in between.
But: Maybe we can really design interfaces in a way that we don't need to change them.
A class like the originally proposed ExtensionList really does contain a lot of code that we might want to change later.
But an interface like ExtensionFilenameListInterface really does not do a lot. There is not much that we might want to change. The use case "give me a list of filenames" or "give me a list of extension objects" is probably going to survive.
And what if people don't have an interface to depend on? Then they will depend on an implementation (a class), via parameter type hint, constructor call, or, even worse, inheritance.
Caveat / Disclaimer
This is mostly an attempt to defend the general habit of introducing separation and interfaces early, even for "internal" architecture.
Of course there can be strategical or other reasons to postpone this, or not do it at all.
One problem that just came to mind was the "reset overkill".
Whenever ->reset() is called on a component, this component needs to recursively call ->reset() on other components it depends on.
Since more than one component (indirectly) depend on the extension list cache, this cache would be reset more than once.
This is not really a problem, because a reset itself is cheap. Only the rebuild is expensive. But it is still shows the kind of problem that can happen.
Or for "strategical" reaons, see
I suggest we still stick to the narrow scope of this issue, which is the first step of decoupling the extension system from the system.module and also making it easier to understand. Further refactoring, based on actual use cases can be done in follow ups.
(almaudoh in #2208429-148)
He may be right.
Cake or cookie?
What was this about? Well, this refers to comment #2208429-149, where I talk about food. I like food.
The idea is that it can all be a matter of habit and personal preference.