I am planning out some work to introduce Dependency Injection into what is currently a large monolithic library in an attempt to make the library easier to unit-test, easier to understand, and possibly more flexible as a bonus.
I have decided to use NInject, and I really like Nate’s motto of ‘do one thing, do it well’ (paraphrased), and it seems to go particularly well within the context of DI.
What I have been wondering now, is whether I should split what is currently a single large assembly into multiple smaller assemblies with disjoint feature sets. Some of these smaller assemblies will have inter-dependencies, but far from all of them, because the architecture of the code is pretty loosely coupled already.
Note that these feature sets are not trivial and small unto themselves either… it encompasses things like client/server communications, serialisation, custom collection types, file-IO abstractions, common routine libraries, threading libraries, standard logging, etc.
I see that a previous question: What is better, many small assemblies, or one big assembly? kind-of addresses this issue, but with what seems to be even finer granularity that this, which makes me wonder if the answers there still apply in this case?
Also, in the various questions that skirt close to this topic a common answer is that having ‘too many’ assemblies has caused unspecified ‘pain’ and ‘problems’. I would really like to know concretely what the possible down-sides of this approach could be.
I agree that adding 8 assemblies when before only 1 was needed is ‘a bit of a pain’, but having to include a big monolithic library for every application is also not exactly ideal… plus adding the 8 assemblies is something you do only once, so I have very little sympathy for that argument (even tho I would probably complain along with everyone else at first).
Addendum:
So far I have seen no convinging reasons against smaller assemblies, so I think I will proceed for now as if this is a non-issue. If anyone can think of good solid reasons with verifiable facts to back them up I would still be very interested to hear about them. (I’ll add a bounty as soon as I can to increase visibility)
EDIT: Moved the performance analysis and results into a separate answer (see below).
I will give you a real-world example where the use of many (very) small assemblies has produced .Net DLL Hell.
At work we have a large homegrown framework that is long in the tooth (.Net 1.1). Aside from usual framework type plumbing code (including logging, workflow, queuing, etc), there were also various encapsulated database access entities, typed datasets and some other business logic code. I wasn’t around for the initial development and subsequent maintenance of this framework, but did inherit it’s use. As I mentioned, this entire framework resulted in numerous small DLLs. And, when I say numerous, we’re talking upwards of 100 — not the managable 8 or so you’ve mentioned. Further complicating matters were that the assemblies were all stronly-signed, versioned and to appear in the GAC.
So, fast-forward a few years and a number of maintenance cycles later, and what’s happened is that the inter dependencies on the DLLs and the applications they support has wreaked havoc. On every production machine is a huge assembly redirect section in the machine.config file that ensures that “correct” assembly get’s loaded by Fusion no matter what assembly is requested. This grew out of the difficulty that was encountered to rebuild every dependent framework and application assembly that took a dependency on one that was modified or upgraded. Great pains (usually) were taken to ensure that no breaking changes were made to assemblies when they were modified. The assemblies were rebuilt and a new or updated entry was made in the machine.config.
Here’s were I will pause to listen to the sound of a huge collective groan and gasp!
This particular scenario is the poster-child for what not to do. Indeed in this situation, you get into a completely unmaintainable situation. I recall it took me 2 days to get my machine setup for development against this framework when I first started working with it — resolving differences between my GAC and a runtime environment’s GAC, machine.config assembly redirects, version conflicts at compile time due to incorrect references or, more likely, version conflict due to direct referencing component A and component B, but component B referenced component A, but a different version than my application’s direct reference. You get the idea.
The real problem with this specific scenario is that the assembly contents were far too granular. And, this is ultimately what caused the tangled web of inter dependencies. My thoughts are that the initial architects thought this would create a system of highly maintainable code — only having to rebuild very small changes to components of the system. In fact, the opposite was true. Further, to some of the other answers posted here already, when you get to this number of assemblies, loading a ton of assemblies does incur a performance hit — definitely during resolution, and I would guess, though I have no empirical evidence, that runtime might suffer in some edge case situations, particularly where reflection might come into play — could be wrong on that point.
You’d think I’d be scorned, but I believe there are logic physical separations for assemblies — and when I say “assemblies” here, I am assuming one assembly per DLL. What it all boils down to are the inter dependencies. If I have an assembly A that depends on assembly B, I always ask myself if I’ll ever have the need to reference assembly B with out assembly A. Or, is there a benefit to that separation. Looking at how assemblies are referenced is usually a good indicator as well. If you were to divide your large library in assemblies A, B, C, D and E. If you referenced assembly A 90% of the time and because of that, you always had to reference assembly B and C because A was dependent on them, then it’s likely a better idea that assemblies A, B and C be combined, unless there’s a really compelling argument to allow them to remain separated. Enterprise Library is classic example of this where you’ve nearly always got to reference 3 assemblies in order to use a single facet of the library — in the case of Enterprise Library, however, the ability to build on top of core functionality and code reuse are the reason for it’s architecture.
Looking at architecture is another good guideline. If you have a nice cleanly stacked architecture, where your assembly dependencies are int the form of a stack, say “vertical”, as opposed to a “web”, which starts to form when you have dependencies in every direction, then separation of assemblies on functional boundaries makes sense. Otherwise, look to roll things into one or look to re-architect.
Either way, good luck!