|
In mathematics, logic and computer science, a formal language is a set of finite-length words (i.e. character
strings) drawn from some finite alphabet. Note that we can talk about
formal language in many contexts (scientific, legal and so on), meaning a mode of expression more careful and accurate
than everyday speech. Use of a particular formal language in the sense intended here is an 'ultimate' version of that usage:
formal enough to be used in written form for automatic computation, is a possible criterion.
A typical alphabet would be {a, b}, and a typical string over that alphabet would be
- ababba.
A typical language over that alphabet, containing that string, would be the set of all strings which contain the same number
of symbols a and b.
The empty word (that is, length-zero string) is allowed and is often denoted by e, ε or λ.
While the alphabet is a finite set and every string has finite length, a language may very well have infinitely many member
strings (because the length of words in it may be unbounded).
Some examples of formal languages:
- the set of all words over {a, b};
- the set { an : n is a prime
number };
- the set of syntactically correct programs in a given programming language; or
- the set of inputs upon which a certain Turing machine halts.
A formal language can be specified in a great variety of ways, such as:
Several operations can be used to produce new languages from given ones. Suppose L1 and
L2 are languages over some common alphabet.
- The concatenation L1L2 consists of all strings of the form vw where
v is a string from L1 and w is a string from L2.
- The intersection of L1 and L2 consists of all strings which are contained in
L1 and also in L2.
- The union of L1 and L2 consists of all strings which are contained in
L1 or in L2.
- The complement of the language L1 consists of all strings over the alphabet which are not
contained in L1.
- The right quotient L1/L2 of L1 by L2
consists of all strings v for which there exists a string w in L2 such that vw is
in L1.
- The Kleene star L1* consists of all strings
which can be written in the form w1w2...wn with strings
wi in L1 and n ≥ 0. Note that this includes the empty string
ε because n = 0 is allowed.
- The reverse L1R contains the reversed versions of all the strings in
L1.
- The shuffle of L1 and L2 consists of all strings which can be written in the
form
v1w1v2w2...vnw
n where n ≥ 1 and v1,...,vn are strings such that
the concatenation v1...vn is in L1 and
w1,...,wn are strings such that
w1...wn is in L2.
A typical question asked about a formal language is how difficult it is to decide whether a given word belongs to the
language. This is the domain of computability theory and complexity theory.
|