|
In computer science and in computer programming, statements in pseudocode
or in a program are normally obeyed one after the other in the
order in which they are written (sequential flow of control). Most programming languages have control flow
statements which allow variations in this sequential order:
- statements may only be obeyed under certain conditions (choice),
- statements may be obeyed repeatedly (loops),
- a group of remote statements may be obeyed (subroutines).
The use of subroutines does not normally cause any control flow problems,
but see the discussions below on early return, error recovery, and labels as parameters.
At the machine/assembly language level, it is usually the case
that the only instructions available for handling choice and/or loops are goto and
conditional goto (often known as variations of jump and/or branch). Compilers for high-level programming
languages must translate all control-flow statements into these primitives.
Primitives
Labels
In a few programming languages (e.g. Fortran, BASIC), a label is just a whole number which appears at
the beginning of a statement, e.g.
1234 X = 3
In many programming languages, a label is an identifier, which is attached
to a statement by using a colon (:), e.g.
Success:print("target has been found")
Historical note: Algol 60 allowed both whole numbers and identifiers as labels
(both attached by colons to statements), but few if any implementations allowed whole numbers.
Goto
The most common form for the unconditional transfer of control is just
goto label
Conditional transfer of control varies from language to language, e.g.
IF test THEN label
IF (test) GOTO label
if test then goto label;
if (test) goto label;
For a fuller discussion on the drawbacks of goto, see Goto. In brief,
undisciplined use of goto leads to spaghetti code
which tends to be unmaintainable; see Edsger Dijkstra's comments in
Go To Statement
Considered Harmful . However, Donald Knuth has shown in Structured Programming with goto Statements
that disciplined use of
goto may be necessary to emulate missing control-flow structures.
A number of authors have pointed out that using goto is often acceptable, provided that control is
transferred to some later statement (forward jump) and that control is not transferred into the middle of some other structured
statement. Some of the control-flow statements available in high-level programming languages are effectively disguised
gotos which comply with these conditions, e.g. break, continue,
return as found in C/C++.
Subroutines
The terminology for subroutines varies; they may alternatively be known as
routines, procedures, or sometimes methods. If they can be used in an expression and return a single result, they may also be
known as functions.
In the 1950's, computer memories were very small by current standards so subroutines were used primarily to reduce program
size; a piece of code was written once and then used many times from various other places in the program. Nowadays, subroutines
are more fequently used to help make a program more structured, e.g. by isolating some particular algorithm or hiding some
particular data access method. If many programmers are working on a single program, subroutines can be used to help split up the
work.
Subroutines can be made much more useful by providing them with parameters, e.g. many programming langauges have a built-in
square root subroutine whose parameter is the number you wish to find the square root of.
Some programming languages allow recursion, i.e. subroutines can call
themselves directly or indirectly. Certain algorithms such as Quicksort and
various tree-traversals are very much easier to express in recursive form than in non-recursive form.
The use of subroutines does slow a program down slightly, due to the overhead of passing parameters, calling the subroutine,
entering the subroutine (may involve saving information on a stack), and returning. The actual overhead depends on both the
hardware instructions available and on any software conventions which are used; excluding parameters, the overhead may range from
2 to 14 instructions, or worse. Some compilers may effectively insert the code of a subroutine inline at the point of call to remove this overhead.
In some programming languages, the only way of returning from a subroutine is by reaching the physical end of the subroutine.
Other languages have a return statement. This is equivalent to a forward jump to the physical end of the
subroutine and so does not complicate the control flow situation. There may be several such statements within a subroutine if
required.
In most cases, a call to a subroutine is only a temporary diversion to the sequential flow of control, and so causes no
problems for control flow analysis. A few languages allow labels to be passed as parameters, in which case understanding the
control flow becomes very much more complicated, since you may then need to understand the subroutine to figure out what might
happen.
Minimal structured control flow
In May 1966, Böhm and Jacopini published an article in Communications of the ACM which showed that any program with
gotos could be transformed into a goto-free form involving only choice (IF THEN ELSE) and loops (WHILE condition
DO xxx), possibly with duplicated code and/or the addition of Boolean variables (true/false flags). Can someone check for a
possible 1964 publication of this in Italian? Later authors have shown that choice can be replaced by loops (and yet more
Boolean variables).
The fact that such minimalism is possible does not necessarily mean that it is desirable; after all, computers theoretically
only need one machine instruction (subtract one number from another and branch if the result is negative), but practical
computers have dozens or even hundreds of machine instructions.
What Böhm and Jacopini's article showed was that all programs could be goto-free. Other research showed that control
structures with one entry and one exit were much easier to understand than any other form, primarily because they could be used
anywhere as a statement without disrupting the control flow.
Control structures in practice
Most programming languages with control structures have an initial keyword which indicates the type of control structure
involved (Smalltalk is an exception). Languages then divide as to whether or not control structures have a final keyword.
No final keyword: Algol 60, Pascal, C, C++, Java, PL/1.
Such languages need some way of grouping statements together, e.g. begin end for Algol 60 and
Pascal, curly brackets { } for C, C++, Java.
Final keyword: Algol 68, Modula-2, Fortran (77 onwards). The forms of the final keyword vary:
Algol 68: initial keyword backwards e.g. if fi, case
esac,
Modula-2: same final keyword end for everything (now thought not to be good idea),
Fortran 77: final keyword is end + initial keyword, IF ENDIF, DO ENDDO
Languages which have a final keyword tend to have less debate regarding layout and indentation. Languages whose final keyword
is of the form: end + initial keyword tend to easier to learn.
Choice
Choice using arbitrary tests
These are usually known as if statements. Note that if the language has an endif, then it
usually has elseif as well, in order to avoid a large number of endifs for multiple tests.
if test then statementTrue else statementFalse;
if (test) statementTrue else statementFalse;
if (test1) statementTrue1
else if (test2) statement2True
else if (test3) statement3True
else statementAllFalse;
IF (test1) THEN
xxx1True
ELSEIF (test2) THEN
xxx2True
ELSEIF (test3) THEN
xxx3True
ELSE
xxxAllFalse
ENDIF
Choice based on specific constant values
These are usually known as case or switch statements. The effect is to compare a given value
with specified constants and take action according to the first constant to match. If the constants form a compact range then
this can be implemented very efficiently as if it were a choce based on whole numbers.
case someChar of switch (someChar) {
'a': actionOnA; case 'a': actionOnA;
'x': actionOnX; break;
'y','z':actionOnYandZ; case 'x': actionOnX;
end; break;
case 'y':
case 'z': actionOnYandZ;
break;
default: actionOnNoMatch;
}
Choice based on whole numbers 1..N
Relatively few programming languages have these constructions but it can be implemented very efficiently using a computed
goto.
GOTO (label1,label2,label3), I
outOfRangeAction
case I in action1, action2, action3 out outOfRangeAction esac
Loops
A loop is a sequence of statements which is specified once but which may be carried out several times in succession. The code
"inside" the loop (the body of the loop, shown below as xxx) is obeyed a specified number of times, or once for
each of a collection of items, or until some condition is met.
A few languages do not have any constructions for looping (e.g. Lisp Scheme)
and use tail recursion instead.
Count-controlled loops
Most programming languages have constructions for repeating a loop a certain number of times. Note that if N is less than 1 in
these examples then the body is skipped completely. In most cases counting can go downwards instead of upwards and step sizes
other than 1 can be used.
FOR I = 1 TO N for I := 1 to N do begin
xxx xxx
NEXT I end;
DO I = 1,N for ( I=1; I<=N; ++I ) {
xxx xxx
END DO }
See also For loop.
Condition-controlled loops
Again, most programming languages have constructions for repeating a loop until some condition changes. Note that some
variations place the test at the start of the loop, while others have the test at the end of the loop. In the former case the
body may be skipped completely, while in the latter case the body is always obeyed at least once.
DO WHILE (test) repeat
xxx xxx
END DO until test;
while (test) { do
xxx xxx
} while (test);
See also While loop.
Collection-controlled loops
A few programming languages (e.g. Perl, Smalltalk, C#) have special constructs which allow you to implicitly
loop through all elements of an array, or all members of a set or collection.
someCollection do: [ :eachElement | xxx ].
foreach someArray { xxx }
foreach (string s in myStringCollection) { xxx }
Early exit from loops
When using a count-controlled loop to search through a table, you may wish to stop searching as soon as you have found the
required item. Some programming languages provide a statement such as break or exit, whose
effect is to terminate the current loop immediately and transfer control to the statement immediately following that loop. Things
can get a bit messy if you are searching a multi-dimensional table using nested loops (see Missing Control Structures below).
Potential problems with loops
Count-controlled loops should always use whole numbers or equivalent, since a loop such as
for X := 0.1 step 0.1 to 1.0 do
might be repeated 9 or 10 times, depending on rounding errors and/or the hardware and/or the compiler version.
Condition-controlled loops rely on the test condition being changed in some way within the body of the loop; if not, you get
an infinite loop.
Error recovery
Most programming languages have some way in which a program can detect that end-of-file has been reached when reading data,
but very few programming languages have any systematic way of handling the situation when something goes wrong or something
unusual happens.
PL/1 has some 22 standard conditions (e.g.
ZERODIVIDE SUBSCRIPTRANGE ENDFILE) which can be RAISEd and which can be intercepted by: ON condition action; Programmers
can also define and use their own named conditions. In many cases a GOTO is needed to decide where flow of control should resume.
Unfortunately, some implementations had a substantial overhead in both space and time (especially SUBSCRIPTRANGE), so many
programmers tried to avoid using conditions.
Exceptions
C++, Java, and C# have a special construct for exception
handling:
try {
xxx1 // Somewhere in here
xxx2 // use: throw someValue;
xxx3
} catch (someClass & someId) { // catch value of someClass
actionForSomeClass
} catch (someType & anotherId) { // catch value of someType
actionForSomeType
} catch (...) { // catch anything not already caught
actionForAnythingElse
}
Any number and variety of catch clauses can be used above. In Java and C#, a finally clause
can be added to the try construct. No matter how control leaves the try the code inside the
finally clause is guaranteed to execute. This is useful when writing code that must relinquish an expensive
resource (such as an opened file or a database connection) when finished processing:
FileStream stm = null; // C# example
try {
stm = new FileStream("logfile.txt", FileMode.Create);
return ProcessStuff(stm); // may throw an exception
} finally {
if (stm != null)
stm.Close();
}
Since this pattern is fairly common, C# has a special syntax that is slightly more readable:
using (FileStream stm = new FileStream("logfile.txt", FileMode.Create)) {
return ProcessStuff(stm); // may throw an exception
}
Upon leaving the using-block, the compiler guarantees that the stm object is released.
The C++, Java, and C# languages define standard exceptions and the circumstances under which they are thrown. Users can throw
exceptions of their own (in fact C++ allows users to throw and catch almost any type!) If there is no catch
matching a particular throw, then control percolates back through subroutine calls and/or nested blocks until a
matching catch is found or until the end of the main program is reached, at which point the program is forcibly
stopped with a suitable error message.
Can anyone add something helpful about these or other programming languages?
Constructs to be avoided
Some "features" in programming languages tend to result in (very) unstructured code and are best avoided. A few of these are
listed below.
Very few programming languages have all the control structures mentioned in this article, so you can reasonably use
goto to emulate the missing structures as required; see Donald
Knuth's 1974 article . You should not otherwise use
goto, due to the risk of creating spaghetti code.
For reasons of backwards compatibility, Fortran still has some arcane unstructured
features which should be avoided, e.g. ASSIGNed GOTO, Arithmetic IF (3-way branch), Logical IF (2-way branch), labels as
parameters.
Self-modifying code, i.e. code which alters itself when
executing, tends to result in very obscure code. Most assembly languages allow this, as does the ALTER verb in COBOL.
In a spoof Datamation article (December 1973), R. Lawrence Clark suggested that
the GOTO statement could be replaced by the COMEFROM statement, and provides some entertaining examples. This was actually
implemented in the INTERCAL programming
language, a language designed to make programs as obscure as possible.
Missing control structures
In his 1974 article , Donald Knuth identified two situations which were not covered by the control structures listed above, and gave
examples of control structures which could handle these situations. Despite their utility, these constructions have not yet found
their way into main-stream programming languages.
Loop with test in the middle
This was proposed by Dahl in 1972.
loop loop
xxx1 read(char);
while test; while not atEndOfFile;
xxx2 write(char);
repeat; repeat;
If xxx1 is omitted we get a loop with the test at the top. If xxx2 is omitted we get a loop with the test at
the bottom. If while is omitted we get an infinite loop. Hence this single construction can replace several
constructions in most programming languages. A possible variant is to allow more than one while test; within the
loop, but the use of exitwhen (see next section) appears to cover this case better.
As the example on the right shows (copying a file one character at a time), there are simple situations where this is exactly
the right construction to use in order to avoid duplicated code and/or repeated tests.
Multiple early exit/exit from nested loops
This was proposed by Zahn in 1974. A modified version is presented here.
exitwhen EventA or EventB or EventC;
xxx
exits
EventA: actionA
EventB: actionB
EventC: actionC
endexit;
exitwhen is used to specify the events which may occur within xxx, their occurrence is indicated by
using the name of the event as a statement. When some event does occur, the relevant action is carried out, and then control
passes just after endexit. This construction provides a very clear separation between determining that some
situation applies, and the action to be taken for that situation.
exitwhen is conceptually similar to the try/catch construct in C++, but is
likely to be much more efficient since there is no percolation across subroutine calls and no transfer of arbitrary values. Also,
the compiler can check that all specified events do actually occur and have associated actions.
The following simple example involves searching a two-dimensional table for a particular item.
exitwhen found or missing;
for I := 1 to N do
for J := 1 to M do
if table[I,J] = target then found;
missing;
exits
found: print("item is in table");
missing: print("item is not in table");
endexit;
Anecdotal evidence
The following statistics apply to a 6000-line compiler written in a private language containing the above constructions.
There are 10 condition-controlled loops, of which 6 have the test at the top, 1 has the test at the bottom, and 3 have the
test in the middle.
There are 18 exitwhen statements, 5 with 2 events, 11 with 3 events, and 2 with 4 events. When these were
first used in the compiler, replacing various flags and tests, the number of source lines increased by 0.1%, the size of the
object code decreased by 3%, and the compiler (when compiling itself) was 4% faster. Prior to the introduction of
exitwhen, 4 of the condition-controlled loops had more than one while test; and 5 of the
count-controlled loops also had a while test;
See also
References
- Dahl & Dijkstra & Hoare, "Structured Programming" Academic Press, 1972.
- Böhm, Jacopini. Flow diagrams, "Turing Machines and Languages with only Two Formation Rules" Communications of the ACM,
9(5):366-371, May 1966.
- Hoare, C. A. R. "Partition: Algorithm 63," "Quicksort: Algorithm 64," and "Find: Algorithm 65." Comm. ACM 4, 321-322,
1961.
- Zahn, C. T. "A control statement for natural top-down structured programming" presented at Symposium on Programming
Languages, Paris, 1974.
External links
|