|
This article refers to the mechanical, electrical, and software systems meaning of pipeline. For pipelines used
to transport fluids like water or petroleum, see pipeline
transport.
The term pipeline has meaning in electrical and mechanical systems, as well as in
software. In general, the term represents the concept of splitting a job into subprocesses in which the output
of one subprocess feeds into the next (much like water flows from one pipe segment
to the next).
Mechanical analogy
A mechanical example of a pipeline is a washer/dryer system for clothing. Instead of having one unit that
both washes and dries, we have two units that together form a pipeline (the output of the washer enters the drier). If washing
takes 1 hour and drying takes 1 hour, the pipeline allows us to finish a full load of laundry every hour, compared to every 2
hours if you had a single (non-pipelined) unit that washed and then dried. It still requires two hours for an item of clothing to
complete its wash/dry cycle of course.
Pipelined processors
Electrically, pipelines are used in microprocessors
to allow complex logic sequences to execute at faster speeds. Pipelines are related to
the engineering concepts of throughput and latency. See Instruction pipeline and Classic RISC pipeline for a better discussion.
Software pipelines
In computer software, a pipeline is
a command line feature prevalent in UNIX and other UNIX-like operating systems. Douglas McIlroy, one of the authors of the early UNIX command shells, noticed
that much of the time they were processing the output of one program as the input to another. The UNIX pioneers established a
means of chaining the running programs together as co-processes so that the output of the first program becomes the input to the second. This was to become the
famous pipes and filters design pattern. A pipeline may be
extended to any number of commands with the output of one serving as the input to the next.
Unix pipes
Commonly filter programs are used in a UNIX pipeline and they
usually obey a few conventions: line structured records, reading data from the standard input, and writing to the standard
output.
Below is an example of a pipeline that implements a kind of spell
checker for this page.
curl http://www.wikipedia.org/wiki/Pipeline |
sed 's/[^a-zA-Z ]//g' |
tr 'A-Z ' 'a-z\n' |
grep '[a-z]' |
sort -u |
comm -23 - /usr/dict/words
Here is an explanation of the pipeline:
- First the curl program obtains the contents of this web page.
- The contents of this page are piped through sed, which removes all characters which are not spaces
or letters.
- tr then changes all of the uppercase letters into their corresponding lowercase counterparts, and
converts the spaces in the lines of text to newlines.
- Each 'word' is now on a separate line.
- grep is used to remove lines of whitespace.
- sort sorts the list of 'words' into alphabetical order, and removes duplicates.
- Finally, comm finds which of the words in the list are not in the given dictionary file (in this
case, /usr/dict/words).
John Hartmann, a Danish engineer with IBM, extended the basic pipes and filters paradigm in a number of useful ways. His
product, a/k/a CMS Pipelines, is available on a number of IBM platforms.
Some of the salient characteristics that distinguish Hartmann Pipeline from ordinary Unix pipes are:
- Filters may have multiple inputs and multiple outputs. For example, a selection filter can send the found records
down one output pipe and the not found records down another.
- A linear notation for representing pipeline networks.
- An interface that allows REXX programs to act as filters.
- A pacing strategy in the Pipeline supervisor that allows, for example, a stream to be split, say by a selection
filter, and the records on the output legs to be processed by other filters, then merged by a join filter and have the
record order preserved in result stream.
The utility of the many filters supplied with the program is exemplified by the LOOKUP filter:
LOOKUP matches records in its primary input stream with records in its secondary input stream and writes matched and unmatched
records to different output streams. The records are matched on the basis of a key field (the contents of a specified range of
columns in the records).
LOOKUP reads records from its primary and secondary input streams and writes records to its primary, secondary, and tertiary
output streams, if each is connected. The secondary input stream must be defined and connected.
The records in the secondary input stream are the master records. LOOKUP first reads the master records into a
buffer, where records with duplicate key fields are discarded; the first occurrence of a key is retained. The records in the
buffer are referred to as the reference.
The records in the primary input stream are the detail records. LOOKUP compares detail records to records in the
reference. LOOKUP writes records to three output streams, if each is connected:
- The primary output stream contains matching records. You can specify the sequence of the master and detail records written to
the primary output stream and what is written to the primary output stream: both detail and master records, only detail records,
or only master records.
- The secondary output stream contains detail records that do not have a matching master record.
- The tertiary output stream contains master records in ascending order by their
key fields. The primary and secondary output streams are severed at the end of file on the primary input stream before records
are written to the tertiary output stream.
This arrangement allows one to use other filters to prepare the dictionary, or master records for input to LOOKUP
from whatever source is required. The many Input/Output filters, or drivers, allow a Hartmann Pipe to interact directly with a
variety data sources, from files, to the system itself, and such things as TCP/IP ports. The repertoir of filters and drivers is
rich enough that one could, for example, write a server that consisted solely of a Hartmann pipeline.
|