Update 2012.02.24: changed all LiveCompare 2.6 references to LiveCompare 2.5.2 to reflect the actual release containing this feature.
I'm a fan of code analysis to help reduce defects. Tools that implement various systems have become common place in the C/C++/Java (pick your language) and I've long wanted to bring these capabilities to the ABAP developers through LiveCompare's workflow interface. With LiveCompare 2.5.2 we will introduce new workflows for automatically and objectively analyzing ABAP code complexity. The new action that makes this possible is, not surprisingly, called Analyze ABAP Complexity. Peter takes up the story…
Knowing where to concentrate testing effort can help reduce the total cost of ownership of SAP systems. This article describes how new functionality to be introduced in LiveCompare 2.5.2 provides an automatic, objective assessment of the complexity of ABAP code. The Analyze ABAP Complexity action computes the Cyclomatic Complexity metric:
The software metric now called 'cyclomatic complexity' or 'conditional complexity' was first expounded in a paper by Thomas J. McCabe Sr. in 19761. McCabe takes a graph-theoretic approach using the program control flow graph and shows how it can be used to manage and control program complexity.
He observed that we spend most of our dollars maintaining systems (today as then) and sought a technique to identify software modules that will be difficult to test or maintain.
McCabe shows that complexity depends only on the decision structure of the program and not upon its physical size. He observed that a program with 25 consecutive 'IF THEN' (FORTRAN) constructs, 50 lines of code, would have about 33.5 million distinct control paths, most of which would probably never be tested.
McCabe's approach is to measure and control the number of paths through a program. Each sequential block of code in the program is represented by a node in the control flow graph and each branch is represented by an arc. There are unique entry and exit nodes. For example, Figure 2 shows a program control flow graph for a program with one decision and one loop.
Because any program with a backward branch such as a loop could potentially have an infinite number of paths the approach uses basis paths which, when taken in combination, will generate every possible path.
If the exit point is regarded as being connected to the entry point, Figure 3, then the graph is said to be strongly connected, meaning that any node can be reached from any other node. In this case the cyclomatic number V(G) is given by the simple formula
V(G) = E – N + P
WhereE = number of edgesN = number of nodesP = number of connected components, here 1, the figure as a whole
This cyclomatic number is equal to the number of linearly independent circuits in the graph. Linearly independent means not composed of combinations of other paths. This is equivalent to the minimum number of test case required to cover all branches of the program.
In figure 3 above there are 6 nodes and 8 edges giving a cyclomatic number of 8 – 6 + 1 = 3 .
Alternatively, for Figure 2,
V(G) = E – N + 2P
Cyclomatic complexity = 7 – 6 + 2 = 3.
The Swiss mathematician and physicist Leonard Euler is credited with laying the foundations for graph theory in his classic paper on the Seven Bridges of Königsberg2,3 written in 1735.
Euler's formula for planar graphs (those in which the edges do not cross) states that n – e + r = 2 where n = number of vertices, e = number of edges and r = number of regions. Re-arranging this gives r = e – n + 2 which is equivalent to the formula for Figure 2 above.
That means we can find the cyclomatic complexity of the graph simply by counting regions, see Figure 4, regions A, B, C, complexity 3. For a Figure 3 style strongly connected graph simply ignore the region outside the figure.
For a graph with unique entry and exit nodes each predicate (decision) node adds one more edge so the cyclomatic complexity is equal to the number of predicates plus one. We can measure the complexity simply by counting decision nodes from keywords in the source code.
Suppose we have an IF statement with multiple conditions, e.g.
IF VAR1 = 'A' OR VAR1 = 'B' OR VAR1 = 'C' .
Should we count the individual logical conditions or just the whole statement? This depends whether the language supports short-circuit evaluation semantics that can lead to conditional execution of side-effects. For example in C++ we might write
if (condition && function())
If the condition is false function() will never be called. As the function might have side effects it should be treated as a separate test case and add one to the cyclomatic complexity.
Although ABAP supports short-circuit evaluation semantics it is not possible to cause side effects in a logical expression so for ABAP code the logical operators should not add to the cyclomatic complexity.
Figure 5 shows a control flow graph for a program containing a case statement with 3 cases and a default case. In Figure 6 the default case has been removed, control drops through if none of the specified cases are met. In both cases the complexity (by counting regions) is 4, 1 for the program as a whole and 3 for the case statement. The contribution of the case statement is simply the number of specified cases. This is one less than the number of branches out of the case statement node.
As ABAP uses WHEN OTHERS as the default case, strictly speaking, this WHEN OTHERS keyword should be disregarded in counting keywords.
In languages such as C++ it is possible to associate multiple case statement labels with a single SWITCH statement.
case 0: case 1:
case 2: case 3:
This should contribute only 2 to the complexity so simply counting case keywords would give an incorrect result. This problem does not arise with ABAP code because the equivalent is to use OR in a single WHEN clause.
WHEN 'A' OR 'B'.
VAR1 = '2'.
Figure 7 depicts a subprogram with multiple exit points. From a graph theoretic view point this would have a complexity of 4 – 5 + 2 = 1. But as the control flow contains a branch the cyclomatic complexity should be 2 so the additional exit point must be regarded as connected to the original exit point as in Figure 8.
However, if the complexity is measured by counting keywords, then the branch will already have been correctly counted and the additional exit point can be disregarded.
If a class has multiple methods we would certainly want to have at least one test case for each method. From a graph theoretic stand point each method is a separate connected component and adds one to P in the formula M = E – N + P. The same holds true for FORMs in a function group.
Based on the above considerations the default behaviour of LiveCompare's Analyze ABAP Complexity action is as follows.
IF and ELSEIF are counted as decisions but not the AND and OR and other logical operands within them.
Within a CASE statement all WHEN tokens are counted.
LOOP, DO and WHILE are all counted and also CHECK within a loop as this is a conditional short circuit of the loop.
FORM, METHOD and FUNCTION are counted as these are separate sections of code with distinct entry and exit points which add to P in the formula for cyclomatic complexity.
Within SQL statements a WHERE clause is a decision point and is therefore counted in SELECT, MODIFY, UPDATE and DELETE statements, however, these statements are not counted if there is no WHERE clause. Similarly a KEY clause in a READ statement is counted.
IntelliCorp's LiveCompare Analyze ABAP Complexity action offers support for a wide range of object types including: PROG, REPS, INCL, FUGR, FUNC, CLAS, INTF, AQSG, ELEM, IOBJ, ISIP, ISTS, TRFN and UPDR.
The approach used in the IntelliCorp software offers considerable flexibility. The list of keywords to be counted can be modified as required. For example, if your preference is to count CASE statements rather than WHEN tokens within them and to count AND and OR, just change the list of tokens.
Figure 9 - Default count words
Figure 10 - Modified count words
If you would prefer different banding of complexity from the default, just provide a new table of complexity bands.
Figure 11 – Default complexity categories
Figure 12 – Modified complexity categories
LiveCompare 2.5.2 will ship with new and updated workflows that incorporate the Analyze ABAP Complexity action. The ABAP analysis workflows in the Upgrade package will include the complexity as a way to help better judge the effort required. The intelligent impact analysis workflows will include the complexity metric to help better focus testing effort (complexity will be the third factor, alongside usage and degree of impact, in determining the risk of change). New workflows will be available that analyze the complexity of custom code.
McCabe's own recommendation is that the cyclomatic complexity of a single function should not exceed 10.
Many ABAP programmers will have read the three part guide by SAP Vice President Andreas Blumenthal and SAP Knowledge Architect Horst Keller of SAP AG entitled "An insider's guide to writing robust, understandable, maintainable, state-of-the-art ABAP programs"4. They too recommend that the cyclomatic number of a method should not be greater than 10.
According to some sources5, 6, 7 the Software Engineering Institute (SEI) used to specify the following cyclomatic complexity categories. Unfortunately the page specifying them has been retired by the SEI.
1 to 10
Simple program, low risk
11 to 20
More complex, moderate risk
21 to 50
Complex, high risk
Un-testable, very high risk
The Analyze ABAP Complexity action in LiveCompare 2.5.2 provides an automatic, objective assessment of the complexity of ABAP code. We know from many years of research and experience that complex code is hard to maintain and hard to verify. Armed with the information LiveCompare produces, SAP teams can work to simplify their custom code (perhaps eliminating it all together using the Find Similar Objects functionality introduced in LiveCompare 2.5) and better assess the verification effort when things change. LiveCompare's established intelligent impact analysis templates will be modified to include this new metric in time for the 2.5.2 release.
Part 1 http://www.sapwiki.cl/wiki/images/a/a1/Abap_top_part_1.pdf
Part 2 http://www.sapwiki.cl/wiki/images/9/90/Abap_top_part_2.pdf
Part 3 http://www.sapwiki.cl/wiki/images/4/4d/Abap_top_part_3.pdf
If figure 2 shows a program control flow graph for a program with one decision and one loop, why does it have a "P" value of 2?. I would think figure 2 would have a P value of 1 and figure 3 to have a P value of 2.
its interesting that figures 2 & 3 have the same cyclomatic value. Figure 3 has one additional Edge which I'd think would increase its value.
the only explanation I can think of its that in figure 2, nodes 1 and 6 are considered connectable components, whereas in figure 3, only node 1 is considered a "P".
Tim, Figures 2 & 3 are alternative representations of the same program control graph. With the benefit of hind sight perhaps I should have used a dotted line for the edge connecting the exit back to the entry, This edge makes the graph strongly connected so the cyclomatic number is equal to the number of lineraly independent circuits in the graph. The formulea follow from the underlying mathematics as given by McCabe in his paper (reference 1 above) and before him by C. Berge (see McCabe's reference 1). You might also like to see this Wikipedia articel en.wikipedia.org/.../Cyclomatic_complexity
Tim, Perhaps I could have explained that better. The extra edge connecting the exit back to the entry is added to make the graph strongly connected, every node can be reached from every other node. Once we do that we can get the cyclomatic number of the graph V(G) = E - N + P from the underlying graph theory. McCabee then uses this cyclomatic number of the graph as the cyclomatic complexity of the progarm. But the extra edge does not really exist, programs do not ususally start again as soon as they finish. So maybe we want to find the complexity number without the extra edge. As it is the same program the complexity must be the same in both cases so we have to adjust the formula to take account of the 'missing' edge. P refers to the graph as a whole, a single connected component. As there is one 'missing' edge for each such component we need to add one for each P, hence the 2P. Hope that helps.
Thank you Peter, I appreciate the explanation.