|
User:K.lee/Programming_language_rewrite has been proposed. Please council it when
you plan to rewrite the article entirely.
A programming language or computer language is a standardized communication technique for
expressing instructions to a computer. It is a set of syntactic and semantic rules used to define computer programs. A language enables a programmer to precisely specify
what data a computer will act upon, how these data will be stored/transmitted, and precisely what actions to take under various circumstances.
Introduction
A primary purpose of programming languages is to enable programmers to express their intent for a computation more easily than
they could with a lower-level language
or machine code. For this reason, programming languages are generally
designed to use a higher-level syntax, which can be easily communicated and understood by human programmers. Programming
languages are important tools for helping software engineers write better programs faster.
Understanding programming languages is crucial for those engaged in computer science because today, all types of computation are done with computer languages.
During the last few decades, a large number of computer languages have been introduced, have replaced each other, and have
been modified/combined. Although there have been several attempts to make a universal computer language that serves all purposes,
all of them have failed. The need for a significant range of computer languages is caused by the fact that the purpose of
programming languages varies from commercial software development to scientific to hobby use; the gap in skill between novices
and experts is huge and some languages are too difficult for beginners to come to grips with; computer programmers have different
preferences; and finally, acceptable runtime cost may be very different for programs running on a microcontroller and programs
running on a supercomputer.
There are many special purpose languages, for use in special situations: PHP is a
scripting language that is especially suited for Web development; Perl is suitable for text
manipulation; the C language has been widely used for
development of operating systems and compilers (so-called system programming).
Programming languages make computer programs less dependent on particular machines or environments. This is because
programming languages are converted into specific machine code for a particular machine rather than being executed directly by
the machine. One ambitious goal of FORTRAN, one of the first programming languages,
was this machine-independence.
There are two mechanisms used to translate a program written in a programming language into the specific machine code of the
computer being used.
If the translation mechanism used is one that translates the program text as a whole and then runs the internal format, this
mechanism is spoken of as compilation. The compiler is therefore a program
which takes the human-readable program text (called source code) as data
input and supplies object code as output. The resulting object code may be machine code which will be executed directly
by the computer's CPU, or it may be code matching the specification of a virtual machine.
If the program code is translated at runtime, with each translated step being executed immediately, the translation mechanism
is spoken of as an interpreter. Interpreted programs run usually more slowly than compiled programs, but have more
flexibility because they are able to interact with the execution environment. See interpreted language for detail. Although the definition may not be identical, these typically fall
into the category of scripting programming
languages.
Most languages can be either compiled or interpreted, but most are better suited for one than the other. In some programming
systems, programs are compiled in multiple stages, into a variety of intermediate representations. Typically, later stages of
compilation are closer to machine code than earlier stages. One common variant of this implementation strategy, first used by
BCPL in the late 1960s, was to compile programs to
an intermediate representation called "O-code" for a virtual machine, which was then compiled for the actual machine. This successful strategy was later
used by Pascal with P-code and Smalltalk with byte code, although in many cases the intermediate code was interpreted rather than being compiled.
Features of a programming language
Each programming language can be thought of as a set of formal specifications concerning syntax, vocabulary, and meaning.
These specifications usually include:
- Data and Data Structures
- Instruction and Control Flow
- Reference Mechanisms and Re-use
- Design Philosophy
Most languages that are widely used, or have been used for a considerable period of time, have standardization bodies that
meet regularly to create and publish formal definitions of the language, and discuss extending or supplementing the already
extant definitions.
Data types and data structures
Internally, all data in a modern digital computer are stored simply as zeros or ones (binary). The data typically represent information in the real world such as names, bank
accounts and measurements and so the low-level binary data are organised by programming languages into these high-level
concepts.
The particular system by which data are organized in a program is the type
system of the programming language; the design and study of type systems is known as type theory. Languages can be classified as statically
typed systems, and dynamically typed languages.
Statically-typed languages can be further subdivided into languages with manifest types, where each variable and function
declaration has its type explicitly declared, and type-inferred languages. It is possible to perform type inference on
programs written in a dynamically-typed language, but it is entirely possible to write programs in these languages that make type
inference infeasible. Sometimes type-inferred and dynamically-typed languages are called latently typed.
With statically-typed languages, there usually are pre-defined types for individual pieces of data (such as numbers within a
certain range, strings of letters, etc.), and programmatically named values (variables) can have only one fixed type, and allow
only certain operations: numbers cannot change into names and vice versa. Examples of these languages are: C, C++ and
Java.
Dynamically-typed languages treat all data locations interchangeably, so inappropriate operations (like adding names, or
sorting numbers alphabetically) will not cause errors until run-time. Examples of these languages are: Objective-C, Lisp, JavaScript, Tcl and Prolog.
Type-inferred languages superficially treat all data as not having a type, but actually do sophisticated analysis of the way
the program uses the data to determine which elementary operations are performed on the data, and therefore deduce what type the
variables have at compile-time. Type-inferred languages can be more flexible to use, while creating more efficient programs;
however, this capability is difficult to include in a programming language implementation, so it is relatively rare. Examples of
these languages are: MUMPS and ML.
Sometimes statically-typed languages are called type-safe or strongly typed, and dynamically-typed languages
are called untyped or weakly typed; confusingly, these same terms are also used to refer to the distinction
between languages in which it is impossible to use a value as a value of another type and possibly corrupt data from an unrelated
part of the program or cause the program to crash, and languages in which it is possible to do this. Examples of strongly typed
languages are: Forth, C, assembly language, C++, D, most implementations of Pascal, and OCaml. Examples of weakly typed languages are: Eiffel, Oberon, Lisp, and Scheme.
Most languages also provide ways to assemble complex data structures
from built-in types and to associate names with these new combined types (using arrays, lists, stacks, files).
Object oriented languages allow the programmer to define
data-types called "Objects" which have their own intrinsic functions and variables (called methods and attributes respectively).
A program containing objects allows the objects to operate as independent but interacting sub-programs: this interaction can be
designed at coding time to model or simulate real-life interacting objects. This is a very useful, and intuitive, functionality.
Programs such as Python and Ruby have developed as OO (Object oriented) languages.
They are comparatively easy to learn and to use, and are gaining popularity in professional programming circles, as well as being
accessible to non-professionals. These more intuitive languages have increased the public availability and power of customised
computer applications.
Aside from when and how the correspondence between expressions and types is determined, there's also the crucial question of
what types the language defines at all, and what types it allows as the values of expressions (expressed values) and as
named values (denoted values). Low-level languages like C typically allow programs to name memory locations, regions of
memory, and compile-time constants, while allowing expressions to return values that fit into machine registers; ANSI C extended
this by allowing expressions to return struct values as well (see record). Functional
languages often allow variables to name run-time computed values directly instead of naming memory locations where values may
be stored. Languages that use garbage collection are free to allow arbitrarily complex data structures as both
expressed and denoted values.
Finally, in some languages, procedures are allowed only as denoted values (they cannot be returned by expressions or bound to
new names); in others, they can be passed as parameters to routines, but cannot otherwise be bound to new names; in others, they
are as freely usable as any expressed value, but new ones cannot be created at run-time; and in still others, they are
first-class values that can be created at run-time.
Instruction and control flow
Once data has been specified, the machine must be instructed how to perform operations on the data. Elementary statements may
be specified using keywords or may be indicated using some well-defined grammatical structure. Each language takes units of these
well-behaved statements and combines them using some ordering system. Depending on the language, differing methods of grouping
these elementary statements exist. This allows one to write programs that are able to cover a variety of input, instead of being
limited to a small number of cases. Furthermore, beyond the data manipulation instructions, other typical instructions in a
language are those used for control flow (branches, definitions by cases,
loops, backtracking, functional composition).
Reference mechanisms and re-use
The core of the idea of reference is that there must be a method of indirectly designating storage space. The most
common method is through named variables. Depending on the language, further indirection may include references that are pointers
to other storage space stored in such variables or groups of variables. Similar to this method of naming storage is the method of
naming groups of instructions. Most programming language use macro calls, procedure calls
or function calls as the statements that use these names. Using symbolic names in this way allows a program to achieve
significant flexibility, as well as a high measure of reusability. Indirect references to available programs or predefined data
divisions allow many application-oriented languages to integrate typical operations as if the programming language included them
as higher level instructions.
Design philosophies
For the above-mentioned purposes, each language has been developed using a special design or philosophy. Some aspect or
another is particularly stressed by the way the language uses data structures, or by which its special notation encourages
certain ways of solving problems or expressing their structure.
Since programming languages are artificial languages, they require a high degree of discipline to accurately specify which
operations are desired. Programming languages are not error tolerant; however, the burden of recognising and using the special
vocabulary is reduced by help messages generated by the programming language implementation. There are a few languages which
offer a high degree of freedom in allowing self-modification in which a program re-writes parts of itself to handle new cases.
Typically, only machine language and members of the Lisp family (Common Lisp, Scheme) provide this capability. Some languages such
as MUMPS and Perl allow modification of data
structures that contain program fragments, and provide methods to transfer program control to those data structures; languages
that support dynamic linking and loading such as C,
C++, and the Java programming language can emulate self-modification by either embedding a small compiler or
calling a full compiler and linking in the resulting object code. Interpreting code by recompiling it in real time is called
dynamic recompilation; emulators and other virtual machines exploit this
technique for greater performance.
There are a variety of ways to classify programming languages. The distinctions are not clear-cut; a particular language
standard may be implemented in multiple classifications. For example, a language may have both compiled and interpreted
implementations.
In addition, most compiled languages contain some run-time interpreted features. The most notable example is the familiar I/O
format string, which is written in a specialized, little language and which is used to describe how to convert program data to or
from an external representation. This string is typically interpreted at run time by a specialized format-language interpreter
program included in the run-time support libraries. Many programmers have found the flexibility of this arrangement to be very
valuable.
History of programming languages
The development of programming languages , unsurprisingly, follows closely the development of the physical and electronic
processes used in today's computers.
Charles Babbage is often credited with designing the first
computer-like machines, which had several programs written for them (in the equivalent of assembly language) by Ada Lovelace.
Alan Turing used the theoretical construct of a Turing machine which behaves in principle in all relevant ways like modern
computers, according to the low level program which is input.
In the 1940s the first recognisably modern, electrically powered computers were
created, requiring programmers to operate machines by hand. Some military calculation needs were a driving force in early
computer development, such as encryption, decryption, trajectory calculation and massive number crunching needed in the
development of atomic bombs. At that time, computers were extremely large, slow and expensive: advances in electronic technology
in the post-war years led to the construction of more practical electronic computers. At that time only Konrad Zuse imagined the use of a programming language (developed eventually as
Plankalkül) like those of today for solving problems.
Subsequent breakthroughs in electronic technology (transistors, integrated circuits, and chips) drove the development of
increasingly reliable and more usable computers. This was paralleled by the development of a variety of standardised computer
languages to run on them. The improved availability and ease of use of computers led to a much wider circle of people who can
deal with computers. The subsequent explosive development has resulted in the Internet, the ubiquity of personal computers, and
increased use of computer programming, through more accessible languages such as Python, Visual Basic, etc..
Classes of programming languages
Languages
The following languages are major languages used by several thousand to several million programmers worldwide:
Formal semantics
The rigorous definition of the meaning of programming languages is the subject of Formal semantics.
See also
External links
|