Loop splitting

Loop splitting is a compiler optimization technique. It attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range.

Loop peeling

Loop peeling is a special case of loop splitting which splits any problematic first (or last) few iterations from the loop and performs them outside of the loop body.

Suppose a loop was written like this:

 int p = 10;
 for (int i=0; i<10; ++i)
 {
   y[i] = x[i] + x[p];
   p = i;
 }

Notice that p = 10 only for the first iteration, and for all other iterations, p = i - 1. A compiler can take advantage of this by unwinding (or "peeling") the first iteration from the loop.

After peeling the first iteration, the code would look like this:

 y[0] = x[0] + x[10];
 for (int i=1; i<10; ++i)
 {
   y[i] = x[i] + x[i-1];
 }

This equivalent form eliminates the need for the variable p inside the loop body.

Loop peeling was introduced in gcc in version 3.4. More generalised loop splitting was added in GCC 7.^[1]

Brief history of the term

Apparently the term was for the first time used by Cannings, Thompson and Skolnick ^[2] in their 1976 paper on computational models for (human) inheritance. There the term was used to denote a method for collapsing phenotypic information onto parents. From there the term was used again in their papers, including their seminal paper on probability functions on complex pedigrees.^[3]

In compiler technology, the term first turned up in late 1980s papers on VLIW and superscalar compilation, including ^[4] and.^[5]

References

↑ https://gcc.gnu.org/gcc-7/changes.html
↑ Cannings, C.; Thompson, EA; Skolnick, HH (1976). "The recursive derivation of likelihoods on complex pedigrees". Advances in Applied Probability. 8 (4): 622–625. doi:10.2307/1425918.
↑ Cannings, C.; Thompson, EA; Skolnick, HH (1978). "Probability functions on complex pedigrees". Advances in Applied Probability. 10 (1): 26–61. doi:10.2307/1426718.
↑ Callahan, D; Kennedy, K (1988). "Compiling Programs for Distributed-memory Multiprocessors". The Journal of Supercomputing. 2 (2): 151–169. doi:10.1007/BF00128175.
↑ Mahlke, SA; Lin, DC; Chen, WY; Hank, RE; Bringman, RA (1992). Effective compiler support for predicated execution using the hyperblock. 25th Annual International Symposium on Microarchitecture. pp. 45–54.

Compiler optimizations
Basic block	Peephole optimization
Loop optimization	Induction variable Strength reduction Loop fusion Loop inversion Loop interchange Loop-invariant code motion Loop nest optimization Loop unrolling Loop splitting Loop unswitching Software pipelining Automatic parallelization
Data-flow analysis	Common subexpression elimination Constant folding Induction variable recognition and elimination Dead store elimination Use-define chain Live variable analysis Available expression
SSA-based	Global value numbering Sparse conditional constant propagation
Code generation	Register allocation Instruction selection Instruction scheduling Rematerialization
Functional	Tail call elimination Deforestation
Global	Interprocedural optimization
Other	Bounds-checking elimination Dead code elimination Inline expansion Jump threading
Static analysis	Alias analysis Pointer analysis Shape analysis Escape analysis Array access analysis Dependence analysis Control flow analysis Data-flow analysis

Loop splitting

Loop peeling

Brief history of the term

References

Further reading