Duplicate code

Duplicate code is a computer programming term for a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a number of reasons.[1] A minimum requirement is usually applied to the quantity of code that must appear in a sequence for it to be considered duplicate rather than coincidentally similar. Sequences of duplicate code are sometimes known as code clones or just clones, the automated process of finding duplications in source code is called clone detection.

Some of the ways in which two code sequences can be duplicates of each other are to be character-for-character identical, character-for-character identical with white space characters and comments being ignored, token-for-token identical, token-for-token identical with occasional variation or functionally identical.

How duplicates are created

Some of the reasons why duplicate code may be created include Copy and paste programming, which in academic settings may be done as part of plagiarism, or scrounging, in which a section of code is copied "because it works". In most cases this operation involves slight modifications in the cloned code such as renaming variables or inserting/deleting code. The language nearly always provides facilities to allow one copy of the code to serve multiple purposes, but a copy is created due to the programmer not truly knowing the language, not having the time to do it properly, or not caring about the increased active software rot.

It may also contain functionality that is very similar to that in another part of a program is required and a developer independently writes code that is very similar to what exists elsewhere. Studies suggest, that such independently rewritten code is typically not syntactically similar.[2][3]

Automatically generated code, where having duplicate code may be desired to increase speed or ease of development, is another reason for duplication. Note that the actual generator will not contain duplicates in its source code, only the output it produces.

Costs and benefits

When code with a software vulnerability is copied, the vulnerability may continue to exist in the copied code if the developer is not aware of such copies.[4] Refactoring duplicate code can improve many software metrics, such as lines of code, cyclomatic complexity, and coupling. This may lead to shorter compilation times, lower cognitive load, less human error, and fewer forgotten or overlooked pieces of code. However, not all code duplication can be refactored.[5] Clones may be the most effective solution if the programming language provides inadequate or overly complex abstractions, particularly if supported with user interface techniques such as simultaneous editing. Furthermore, the risks of breaking code when refactoring may outweigh any maintenance benefits.[6] Duplicated code does not seem to be significantly more error-prone than unduplicated code.[7] Duplications can also be reduced using open source technologies for sharing code components instead of duplicating them between SCM repositories.

Detecting duplicate code

A number of different algorithms have been proposed to detect duplicate code. For example:

Example of functionally duplicate code

Consider the following code snippet for calculating the average of an array of integers

extern int array_a[];
extern int array_b[];
 
int sum_a = 0;

for (int i = 0; i < 4; i++)
   sum_a += array_a[i];

int average_a = sum_a / 4;
 
int sum_b = 0;

for (int i = 0; i < 4; i++)
   sum_b += array_b[i];

int average_b = sum_b / 4;

The two loops can be rewritten as the single function:

int calc_average_of_four(int* array) {
   int sum = 0;
   for (int i = 0; i < 4; i++)
       sum += array[i];

   return sum / 4;
}

Using the above function will give source code that has no loop duplication:

extern int array1[];
extern int array2[];

int average1 = calc_average_of_four(array1);
int average2 = calc_average_of_four(array2);

Note that in this trivial case, the compiler may choose to inline both calls to the function, such that the resulting machine code is identical for both the duplicated and non-duplicated examples above. If the function is not inlined, then the additional overhead of the function calls will probably take longer to run (on the order of 10 processor instructions for most high-performance languages). Theoretically, this additional time to run could matter.

Example of duplicate code fix via code replaced by the method

See also

References

  1. Spinellis, Diomidis. "The Bad Code Spotter's Guide". InformIT.com. Retrieved 2008-06-06.
  2. Code similarities beyond copy & paste by Elmar Juergens, Florian Deissenboeck, Benjamin Hummel.
  3. Stefan Wagner, Asim Abdulkhaleq, Ivan Bogicevic, Jan-Peter Ostberg, Jasmin Ramadani. How are functionally similar code clones syntactically different? An empirical study and a benchmark PeerJ Computer Science 2:e49. doi:10.7717/peerj-cs.49
  4. Li, Hongzhe; Kwon, Hyuckmin; Kwon, Jonghoon; Lee, Heejo (25 April 2016). "CLORIFI: software vulnerability discovery using code clone verification". Concurrency and Computation: Practice and Experience. 28 (6): 1900–1917. doi:10.1002/cpe.3532.
  5. Arcelli Fontana, Francesca; Zanoni, Marco; Ranchetti, Andrea; Ranchetti, Davide (2013). "Software Clone Detection and Refactoring". ISRN Software Engineering. 2013: 1–8. doi:10.1155/2013/129437.
  6. Kapser, C.; Godfrey, M.W., ""Cloning Considered Harmful" Considered Harmful," 13th Working Conference on Reverse Engineering (WCRE), pp. 19-28, Oct. 2006
  7. Wagner, Stefan; Abdulkhaleq, Asim; Kaya, Kamer; Paar, Alexander (2016). "On the relationship of inconsistent software clones and faults: an empirical study". Proc. 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016).
  8. Brenda S. Baker. A Program for Identifying Duplicated Code. Computing Science and Statistics, 24:49–57, 1992.
  9. Ira D. Baxter, et al. Clone Detection Using Abstract Syntax Trees
  10. Visual Detection of Duplicated Code by Matthias Rieger, Stephane Ducasse.
  11. Yuan, Y. and Guo, Y. CMCD: Count Matrix Based Code Clone Detection, in 2011 18th Asia-Pacific Software Engineering Conference. IEEE, Dec. 2011, pp. 250–257.
  12. Chen, X., Wang, A. Y., & Tempero, E. D. (2014). A Replication and Reproduction of Code Clone Detection Studies. In ACSC (pp. 105-114).
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.