Duplication, Code Re-Use and Effort.

30 August 2005

For a long time I believed what most people in the industry seem to believe - that code re-use is a good thing.

I'm not sure that's the right message any more. Of course I still believe that code re-use is a good thing to aim for, but...

Ages ago, Bruce Eckel wrote a piece about "The Ideal Programmer" which I only recently read. In it he says:

... despite many years of practicing the craft of programming, the most fundamental concept in computing – what Dave Thomas and Andy Hunt call the "DRY" principle ("Don't Repeat Yourself" – which includes but means more than "don't write the same code twice in more than one place." It means: "there should be one authoritative repository for each concept in a program.") – is practically ignored (and perhaps not even understood) by a large percentage of programmers. Of course, there are lots of other principles which are also ignored or unknown, but Scott's observation was that if we can't even get people to understand and follow the DRY principle, what hope is there for anything more sophisticated?

This is hardly a surprise, and I'm sure is down in no small part to its closeness to "duplication of effort" mantras in management.

Code Re-Use (rather than DRY) has been latched onto (in a pointy haired way) as a way of saving time - a way of reducing the effort needed by using code that's already been written. That's why you get such great inventions as Code Snippets in Visual Studio.

A colleague of mine fell into this trap a while ago and received a public roasting from an eminent architect¹ for describing "Copy & Paste" as "the first step" in code re-use.

And that's the problem, in my mind. Code Re-use is too close in meaning to "remove duplication" for many people. The focus on re-use has led many of us to either forget, or worse to never have learnt, about the issues of duplication.

I'm not talking about big pieces necessarily either. Duplication's perils lie as much in one line of code as in several hundred if not more so.

I've been working with some legacy c apps recently and the level of duplication in the code is very high, some pieces of knowledge² are duplicated hundreds of times. In most cases these are just one innocent looking line of code... That prevents the memory model for the application being changed, or the format of a log file from being altered, without a lot of work.

The other side-effect that I've seen before and will no doubt see again is that, because a lot of us find it difficult to seperate structure in the code base from structure in the deployment model, shared code often becomes a shared instance in production. Over time this leads to conceptually isolated systems that happen to share some code, such as listening on a socket, to become inappropriately entangled. In the most recent app I've been looking at this means that both applications live on the same socket, always.

Removing duplication in the first example would have lead to more code, with a one-for-one swap of the offending knowledge aware line with a "dumb" call to a knowledge aware function. This would have bee more work. Removing Duplication does not reduce the effort here.

In the second, removing duplication, re-using the socket code but allowing the two applications to be deployed in an isolated way would have been a lot more effort. Re-structuring of code, headers, libraries, file structures and source reporitories to harvest the, already written, socket and message handling code into something more generic and re-usable. Removing Duplication does not reduce the effort here.

So, while I strive for re-use, I wonder if, as an industry, we should be more focussed on duplication.

¹ Well, not eminent actually. Just more senior and that counted for a lot where we were at the time.

² Such as how to calculate the memory offset for the pointer to the current user's context relative to the global base address. I love sentences like that ;-)