Discover the significance of the Don't Repeat Yourself principle.
DRY has never been about codes. It’s about knowledge. It’s about cohesion.
It’s not the first time, and will not be the last, that I have commented on this principle that surrounds software development. In another article I talked about Don't Repeat Yourself and we went through some examples in code. The goal of this article is to show that developing software requires attention and a much deeper analysis, especially when we have to create new features that have been requested. We will address the importance of DRY and answer these questions:
What factors can lead to duplicate knowledge in an application?
What is the risk of duplicating this knowledge?
How can we avoid duplicating knowledge in software?
Is repeating code the same thing as repeating knowledge?
We’ll consider all of these topics throughout this article.
Some reasons that can result in duplication of knowledge in software!
Missing documentation 🔖
When you really venture into the world of software development, you discover that the process itself depends on detailed documentation. Without documentation of these requirements, software development teams can easily stray from their goals, occurring in which many cases they can develop functionality outside of the requirements or that is identical! This can affect the delivery deadlines that have been determined. Ok, but how can the lack of technical and functional documentation bring about duplication of knowledge in software? When you don't have either to assist in the development process, the assurances that functionality has been requested before and that the requirement is now just to adjust the feature, is very uncertain. If we cannot track by some means what has already been requested and compare it with what has already been developed, the chances of the programmer duplicating this knowledge in the software is are considerably high. The basics are to have solid documentation of the requirements gathered from the customer and what has been added or changed in each development cycle. Relying on human memory can be a serious mistake! This reminded me of a project I worked on, where each new feature was like a guessing game. My coworkers and I could never understand from start to finish all the requirements of a feature. In addition, I was new in the company and I depended on other developers to explain to me some general rules of the application. This was time-consuming and also many did not fully understand everything that the application accomplished for various factors. I often had to read already written code on my own to understand more deeply what the system was supposed to do.
Another point I would like to emphasize is that when I refer to technical documentation, I am not talking about documenting line by line the functions and describing what each method does, introducing comments to the code. Tests and the code itself should be the documentation for that. But it would be important to document the contracts between the front-end and back-end, in other words, a documentation pointing out what each side expects to receive, both in request and response. An example that helps us a lot today is using swagger to better visualize these contracts, so the teams can have documentation to analyze if the contracts match and if there are no duplicate endpoints in their applications.
Short deadlines ⏳
All software projects work with deadlines, some are realistic some are not. Regardless of this there are pressures on time that can lead to hasty decisions. So if we need to validate something in the system, but simply add this validation without considering if this rule already exists, because we don't want to waste time asking and reading code; we may be adding duplicate knowledge to the software. If this validation already happens in another class and was just implemented in a different way, i.e. written in a different way, we are duplicating knowledge and this will not be good for the quality of the software! On this see the following comment:
"DRY is about duplication of knowledge, of intent. It's about expressing the same thing in two different places, possibly in two totally different ways." - The Pragmatic Programmer: 20th Anniversary Edition, 2nd Edition.
If this becomes a routine and the team ends up duplicating knowledge in their code base because deadlines are tight, then the software will become bulky and it will soon be difficult to add new functionality. Imagine hundreds, thousands of lines that need to be changed to make the code less bulky. Furthermore, the changes can lead to the introduction of bugs if they are not done correctly and the system will be much more expensive to change and this will clearly affect the client's deadlines! What is the point here? Don't give up planning and studying better the functionalities that must be implemented! Regardless of whether the deadlines are short or not, we have to plan and understand what will be accomplished in each development cycle. When we don't plan possibly in the future duplication of knowledge will occur and this is not good.
Lack of communication among teams 💬
If in a project there are many teams and they don't communicate through some tool or even a smoke signal 😅, we will always have duplicity of knowledge, even if there is a well-structured documentation and realistic deadlines for the development of the solution for the customer.
Many may think that teams working on the same product but in different contexts (journeys) will hardly have problems with duplicate knowledge. Unfortunately, it is not quite like that, the tasks may be in different contexts, but surely at some point, they can share the same knowledge about some kind of specific rule of the system. If there is no evaluation of these functionalities or planning, we run the risk, for example, of teams A, B, and even C, in different development cycles implementing knowledge that is the same! And this can happen both on the frontend and back-end. If team A needs to implement a component on the front-end that is a modal or whatever, but team B will also need to do this in the future and they don't communicate, then duplication will happen. So we can have two classes that display the same knowledge type, and have validations and actions with the same parameters and identical messages, but in different contexts. So in this case we have the same rule that is spread over several components. This is duplicating knowledge. We are not just duplicating code. This is why communication within the software development cycle is so essential!
Inattention ⚠︎
Perhaps the most troubling reason why duplication of knowledge happens within a system. This can occur when developers become isolated in their own role and are concerned only with delivering the task, without regard for the overall integrity of the software and quality. One way to solve this problem is to establish a strong team dynamic. This is why it is important to establish regular team meetings where everyone discusses the overall view of code quality, ensuring that developers understand their role as part of a whole and how their work connects with others.
We have gone through all these crucial factors that can lead to duplication, there are others, but at the moment I believe these are the main ones and they are the ones I have experienced in the corporate environment.
Now let's answer another question: What is the risk of duplicating knowledge in components, modules, or classes inside a software?
What are the risks of duplicating knowledge?
Difficulties in performing maintenance
If we have a function, method, class or any software artifact, at some point it will need maintenance and revision. The criteria and requirements change constantly. David Thomas and Andrew Hunt quote exactly this:
"Programmers are constantly in maintenance mode. Our understanding changes day by day. New requirements come in and existing requirements evolve as we move forward in the project. Maybe the environment changes. Whatever the reason, maintenance is not a discrete activity, but a routine part of the entire development process. - The Pragmatic Programmer: 20th Anniversary Edition, 2nd Edition."
So think of software with duplicate knowledge inserted in different classes. When a criterion or requirement changes we need to perform maintenance on all these classes that contain the same knowledge spread across the code base. We can even cite an example to visualize how difficult maintenance can become when we do not apply DRY:
A requirement that has already been requested and developed by the team is that every time the user uploads a file, we need to check if it is a valid base64 file and if the type is a PDF. There are many classes that perform this same type of verification with totally different code written by different developers. The stakeholders require that before the user uploads the file, we need to check if the PDF is password protected and if it is, we should prevent this file from being uploaded, returning a screen message to the end user. Right, so this is the knowledge that currently exists within this software. At some point, the system needs an effective validation of these files so that the user can perform or continue with his journey within the application. If there are several classes that perform this validation, with code written in different ways, wouldn't you agree that this is duplicating knowledge? If we have these validations, with the same objective, spread over several classes, imagine the maintenance that will be required to add a new requirement. We have different classes, with code written in different ways, but always fulfilling the same goals, that is, the validation does not exist in just one place, but is spread over many components. This is what the DRY principle teaches us to avoid. In fact, this is what the authors of the book the pragmatic programmer address, see this excerpt from the book:
"When a single facet of the code needs to change, do you find yourself making that change in several different places and in several different formats? Do you need to change code and documentation, or a database schema and a structure that contains it, or...? If so, your code is not DRY. - The Pragmatic Programmer: 20th Anniversary Edition, 2nd Edition."
Bugs are replicated 🪲
If we have the habit of simply copying and pasting classes, methods, and just renaming the names, this can be much riskier than we imagine! If that piece of code, or class, has not been properly tested, we may be propagating bugs within the software. What might the effects be? If these bugs are not detected as soon as possible, the end user may probably have difficulty using the system. Now think of the effort the team will have to identify and fix the problem that is propagating through various components of the software. This can affect delivery deadlines! It can also delay other developers' tasks and overload everyone on the team. We can go back to the illustration of the previous topic.
New changes are required for file upload validation within the system. For some reason, the stakeholders request that the validation accepts PNG files. So in addition to validating whether the base64 file is a PDF and whether or not it’s password protected, the functionality must also accept files in the PNG format. A developer who has recently joined the team is responsible for introducing this new requirement into the code. So he introduces the new requirement into the classes but does not perform unit tests for success and error cases.
When we don't have tests, there are no guarantees that the function will perform exactly its role within the system. Now the programmer may or may not have introduced a new bug to the system, this can only be verified one way, the end user will somehow send feedback to the development team. After a while, you realize that you can no longer upload files on some routes that require this validation! The developer wonders why this is happening. The answer is simple; we have validation spread throughout the application, in different classes, but the code was written in different ways and by different developers. Just introducing a new type or adapting the logic of each component is not enough.
Now it starts all over again, debugging, fixing without testing... it’s a vicious cycle that tends to financial losses every time! If the knowledge (rule) was centralized, written in a clear way and with tests, probably this situation could be avoided!
Productivity falls
If at each new development cycle, features are delayed, taken out of production because of bugs, and are not met according to the requirements that the customer wants, you can be sure that productivity is falling. In these moments we have to understand what is behind the low productivity, otherwise it will become a snowball that will impact delivery deadlines.
Where does DRY fit into this? In everything! Productivity is very linked to code quality and functionality development planning, whether you accept it or not! Just understand that the more duplication of knowledge, the harder it’s to understand, review and change the current code base. What does this lead to? Low productivity, not being able to produce any kind of significant result within the timeframe set by the client.
Now that we've analyzed the risks, let's answer another question.
How to avoid duplicating knowledge in software
Applying DRY in our day-to-day life is a matter of analyzing and observing the progress of the software. But we can dig a little deeper into the answer. We can avoid duplicating knowledge by better understanding the requirements of the functionality and the components that already exist in the system. In addition, we can avoid it by:
Taking code reviews seriously and encouraging everyone on the team to do so.
Pair programming with developers who have been on the project longer.
Having documentation of requirements regardless of the size and deadline of the project.
Communication between teams is a fundamental factor to avoid duplication of knowledge within the software.
Planning how to implement the functionalities of each development cycle.
Now let's talk about something very important that involves the DRY subject a lot. Understand the difference between duplicating code and knowledge. Let's see this with some examples.
Code vs. Knowledge
We can quote from the book itself which brings the principle Don't Repeat Yourself:
"Not all duplication of code is duplication of knowledge.... - The Pragmatic Programmer: 20th Anniversary Edition, 2nd Edition."
There is a clear difference and it’s important to understand because if we use the DRY principle in the wrong way, we introduce too much complexity into our code unnecessarily. Let's take the first example:
public class Description {
// ...
public bool isLongEnough() {
String words[] = description.split(' ');
int numberOfWords = words.length;
return numberOfWords > 10;
}
// ...
}
Here the isLongEnough function checks whether the Description class has at least 10 words. But look at the next method:
public class ApiResponse {
// ...
public bool containsEnoughElements() {
String elements[] = description.split(' ');
int numberOfElements = elements.length;
return numberOfElements > 10;
}
// ...
}
Some may think that there is a violation of the DRY principle in these examples, but the truth is that the principle is not violated at any point. The code is duplicated, but both functions represent, very different knowledge within the software: in one case, it represents the validation rules for a user description, while in the other, it contains the validation rules for an API response. See another example:
var OS="Unknown";
if (navigator.userAgent.indexOf("Win")!=-1) OS="Windows";
if (navigator.userAgent.indexOf("Mac")!=-1) OS="MacOS";
if (navigator.userAgent.indexOf("X11")!=-1) OS="UNIX";
if (navigator.userAgent.indexOf("Linux")!=-1) OS="Linux";
console.log(OS);
console.log(navigator.userAgent);
The above code detects the user's browser and operating system, now compare it to this👇🏼:
var OSName = "unknown";
var navApp = navigator.userAgent.toLowerCase();
switch (true) {
case (navApp.indexOf("win") != -1):
OSName = "windows";
break;
case (navApp.indexOf("mac") != -1):
OSName = "apple";
break;
case (navApp.indexOf("linux") != -1):
OSName = "linux";
break;
case (navApp.indexOf("x11") != -1):
OSName = "unix";
break;
}
console.log(OSName, navApp);
Ignore the way the codes are written, they are just an example. The two codes fulfill the same purpose, but contain different structures and nomenclatures. This is an example of the violation of the DRY principle if they were in several classes or components of the software. If it were really necessary to get the information from which system user X is logged on, we could centralize it in a function or class:
getOperationSystem = () => {
const systemOperation = ['Windows', 'Linux', 'Mac']; // add your OS values
return systemOperation.find(system=>navigator.userAgent.indexOf(system) >= 0);
}
console.log(getOperationSystem())
Too much of anything can be harmful
DRY is extremely useful, but we must be careful when trying to reuse classes and methods, especially when we don’t understand yet the business rules of the software. There is no point in trying to reuse all the code or predict whether that method can be reused. The point I would like to make is this:
"You should not apply the DRY principle if your business logic has no duplication yet. Again, analyze the context, but as a general rule, trying to apply DRY to something that is only used in one place can lead to premature generalization."
The sentence above is quite interesting. Premature generalizations can lead to bugs or unnecessary complexity in the code! Let's take a closer look at a real example, look at the classes below:
/** Shipping from the store to the customer **/
class Shipment
{
public deliveryTime = 3; //days
public calculateDeliveryDay(): Date
{
return new Date(`now ${this.deliveryTime} day`);
}
}
/** Order return from a customer */
class OrderReturn
{
public returnLimit = 3; // days
public calculateLastReturnDay(): Date
{
return new Date(`now ${this.returnLimit} day`);
}
}
Would you say that we are repeating code? It may be hard to accept, but not because knowledge here is not repeated! Why not? From an e-commerce point of view, the time of delivery of a shipment to a customer, which is represented by the Shipment
class and the calculateDeliveryDay()
method, has nothing to do with the last day on which the customer can return his ordered products, which refers to the OrderReturn
class. What happens if the developer chooses to merge the methods into one? If the company decides that the end customer now has twenty days to return their products, you will have to split the method again. If you do not do this, the delivery of the shipment will also take twenty days! Imagine the impact of this in the real world. Competitors would be happy and customers probably not! It’s clear that we have different and important business rules that have nothing to do with each other. The example sums up exactly what DRY is about, it’s about knowledge, it was never about code!
Conclusion
"Don't Repeat Yourself" was never about code. It's about knowledge. It is about cohesion. If two pieces of code represent exactly the same knowledge, they will always change together. Having to change both is too risky: you might forget one of them. Also, principles like DRY itself are not rules that you should follow without thinking and analyzing carefully. They are tools to go in a good direction. It is our job to adapt them depending on each context. Make sure to be balanced in weighing pros and cons, always looking out for the overall quality of the software, both in code and architecture.
I hope everything we've covered has been helpful, thanks for reading till here!
References:
The Pragmatic Programmer: Your Journey to Mastery, 20th Anniversary Edition