|
In computer science, a datatype (often simply
called type) is a statically or dynamically assigned constraint on computer programs. Since the benefits of datatype have proven to be quite important, the majority of
programming languages have type systems today.
Basis
The basic idea of typing is to give mere bits semantic meaning. Types are usually associated either with values in memory or with objects such as variables. Because any value is simply a set of bits for computers, there is no distinction in hardware even among memory addresses, instruction code, characters, integers and floating-point numbers.
Types tell you how you should treat those mere bits.
Major functions that type systems provide are:
- Safety - types make it impossible to code some operations which cannot be valid in a certain context. This
mechanism effectively catches the majority of common mistakes made by programmers. For example, an expression
"Hello,
Wikipedia" / 3 is invalid because a string literal cannot be
divided by an integer in the usual sense. As discussed below, strong typing offers
more safety, but it does not necessarily guarantee complete safety (see type-safety for more information).
- Optimization - static type checking might provide useful information to a compiler. For example, if a type
says a value is aligned at a mutiple of 4, the memory access can be optimized.
- Documentation - using types in languages also improves documentation of code. For example, the declaration of a variable as being of a specific type documents how
the variable is used. In fact, many languages allow programmers to define semantic types derived from builtin types; either composed of elements of one or more builtin types, or simply
as aliases for names of builtin types.
- Abstraction - types allow programmers to think about programs in higher level, not bothering with low-level
implementation. For example, programmers can think values as literal strings instead of a mere array of bytes.
- Modularity - types allow programmers to express the interface between two subsystems. This localizes the
definitions required for interoperability of the subsystems and prevents inconsistencies when those subsystems communicate.
Typically each value is associated with one particular type. But one type may have more than one subtypes. Other entities, such as objects, modules, communication channels, dependencies, or even types
themselves can be associated with a type. For example, a datatype is a type of a value, A class is a type of an object and a kind is a type of a
type.
A type system, specified in each programming language, stipulates the ways in which a typed program are allowed to
behave, and make behaviors outside rules illegal. An effect
system is a dual of a type system and specifies the ways in which a program is required to behave.
More formally, the study of type systems is known as type theory, with the
aid of lambda calculus.
Type checking
The process of verifying types is called type checking. It may occur either at compile-time (statically typed) or run-time (dynamically typed).
One of the primary tasks of semantic analysis in a compiler is this static type checking. If type rules are enforced strongly, the system is
called strongly typed.
Static and dynamic typing
In dynamic scope, type checking is often done at run-time because variables can be differently typed according to execution path. Static type systems for dynamic scope usually
need to explicitly represent the concept of an execution path, and allow types to depend on it. This seems to require either a
trivial or a cumbersome type system to work well.
C, Java, ML, and
Haskell are statically typed, whereas
Lisp, Perl, Visual Basic, Ruby, and Python, are dynamically typed. Dynamic typing is often associated with so-called "scripting languages" and other rapid application development environments. You
will see dynamic types more often used in interpreted
languages, whereas static types are used in compiled
languages. See typed and untyped
languages for the complete list of typed and untyped languages.
Duck typing is a humorous way of describing the (dynamic) typing typical
of many scripting languages, where the language guesses at the type of a value. Initially coined by Dave Thomas in the Ruby community, its premise is that "(referring to a
value) if it walks like a duck, and quacks like a duck, then it is a duck".
To see how type checking works, consider the following pseudocode
example:
var x; // (1)
x = 5; // (2)
x = "hi"; // (3)
In this example, (1) declares the name x; (2) associates the integer value 5 to
the name x; and (3) associates the string value "hi" to the name x. In some statically
typed systems, where the type is associated with the variable name, the above code fragment would be illegal, because (2) and (3)
bind x to values of inconsistent type. Other statically typed systems, like that of ML, allow such operations as the type is associated with the value, and not the variable.
By contrast, a purely dynamically typed system would permit the above program to execute, because the name x would not be
required to have a consistent type. The implementation of a dynamically typed language will catch errors related to the misuse of
values - "type errors" - at the time the erroneous statement or expression is
computed. In other words, dynamic typing catches errors during program execution. A typical implementation of dynamic
typing will keep all program values "tagged" with a type, and checking the type tag before any value is used in an operation.
Many OOP languages keep certain information
about type at run-time to make possible dynamic binding. In C++, such information is called RTTI. In this
code fragment, (1) binds the value 5 to x; (2) binds the value "hi" to y; and (3) attempts to add x to y. In a dynamically typed
language implementation, the value bound to x might be a pair (integer, 5), and the value bound to y might be a pair (string, "hi"). When the program attempts to execute line (3), the language implementation would check
the type tags integer and string, discover that the operation + (addition) is not defined over these
two types, and signal an error.
Some statically typed languages have a "back door" in the language that enables programmers to write code that does not
statically type check. For example, C and Java have "casts".
The presence of static typing in a programming language does not necessarily imply the absence of dynamic typing mechanisms.
For example, Java is statically typed, but certain operations require the support of runtime type tests, which are a form of
dynamic typing. See programming language for more
discussion of the interactions between static and dynamic typing.
Static and dynamic type checking in practice
The choice between static and dynamic typing requires some trade-offs. Many
programmers strongly favor one over the other; some to the point of considering languages following the disfavored system to be
unusable or crippled.
Static typing finds type errors reliably and at compile time. This should increase the reliability of delivered program.
However, programmers disagree over how common type errors are, and thus what proportion of those bugs which are written would be
caught by static typing. Static typing advocates believe programs are more reliable when they have been type-checked, while
dynamic typing advocates point to distributed code that has proven reliable and to small bug databases. The value of static
typing, then, presumably increases as the strength of the type system is increased. Advocates of languages such as ML and Haskell
have suggested that almost all bugs can be considered type errors, if the types used in a program are sufficiently well declared
by the programmer or inferred by the compiler.
Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact data types that
are in use, it can produce machine code that just does the right thing. Further, compilers in statically typed languages can find
shortcuts more easily. Dynamically-typed languages such as Common Lisp use
optional type declarations for optimization for this very reason. Static typing makes this pervasive. See optimization.
Statically-typed languages which lack type inference – such as Java – require that programmers declare the types
they intend a method or function to use. This can serve as additional documentation for the program, which the compiler will not
permit the programmer to ignore or drift out of synchronization. However, a language can be statically typed without requiring
declarations, so this is not a consequence of static typing.
Static typing allows construction of libraries which are less likely to be accidentally misused by their users. This can be
used as an additional mechanism for communicating the intentions of the library developer.
A static type system constrains the use of powerful language constructs more than it constrains less powerful ones. This makes
powerful constructs harder to use, and thus places the burden of choosing the "right tool for the problem" on the shoulders of
the programmer, who might otherwise be inclined to use the most powerful tool available. Choosing overly powerful tools may cause
additional performance, reliability or correctness problems, because there are theoretical limits on the properties that can be expected from powerful language
constructs. For example, indiscriminate use of recursion or global variables may cause well-documented adverse effects.
Dynamic typing allows constructs that would be illegal in some static type systems. For example, eval functions that
execute arbitrary data as code are possible (however, the typing within that evaluated code might be static). Furthermore,
dynamic typing accommodates transitional code and prototyping, such as allowing a string to be used in place of a data
structure.
Dynamic typing allows debuggers to be more functional; in particular, the debugger can modify the code arbitrarily and let the
program continue to run. Programmers in dynamic languages sometimes "program in the debugger" and thus have a shorter
edit-compile-test-debug cycle. However, the need to use debuggers is sometimes considered as a sign of design or development
process problems.
Dynamic typing allows compilers to run more quickly, since there is less checking to perform and less code to revisit when
something changes. This, too, may shrink the edit-compile-test-debug cycle.
Strong and weak typing
A strongly typed language does not allow an operation to succeed on arguments which are of the wrong type. An example
of the absence of strong typing is a C cast gone wrong; if you cast a value in C, not only is the compiler required to allow the
code, but the runtime is expected to allow it as well. This allows C code to be compact and fast, but it can make debugging more
difficult.
Sometimes the term safe language is used more generally for languages that do not allow nonsense to occur. For
example, a safe language will also check array bounds.
Weak typing means that types are implicitly converted (or cast) when they are used. If we were to revisit the
previous example:
var x = 5; // (1)
var y = "hi"; // (2)
x + y; // (3)
If the code above was written a weakly-typed language, such as Visual Basic, the code would run properly, yielding the result "5hi". The number 5 is
converted to a string "5" to make sense of the operation. There are problems with such conversions in weakly typed languages,
though. For example, would the result of the following code be 9 or "54"?
var x = 5;
var y = "4";
x + y
Many say that weak typing gets programmers into bad habits because it doesn't teach them to use explicit type conversion.
Polymorphism and types
The type system allows operations to be done relying on contexts by type. For example, in an arithmetic expression, a +
b, if a and b are typed as integer, an underlying operation
can be integer addition. If the type is real, floating-point addition is probably done. In generics the type of values determines which code will be executed. See also: type polymorphism
Explict or implicit declaration and inference
Many static type systems, such as C's and Java's, require type declarations: the programmer must explicitly associate
each variable in a function with a particular type. Others, such as Haskell's, perform type inference: the compiler
draws conclusions about the types of variables based on the operations which the function performs upon them. For instance, in a
function f(x,y), if at some point in the function the variables x and y are added together, the
compiler can infer that they must be numbers -- since addition is only defined over numbers. Therefore, that any call to
f elsewhere in the program that gives a string or a list (e.g.) as an argument would be erroneous.
Numerical and string constants and expressions in code can and often do imply type in a particular context. For example, an
expression 3.14 might imply that its type is floating-point while [1, 2, 3] might imply a list of integers; typically an array. See also: type inference.
Categories of types
Types can be classified with following categories:
Compatibility, equivalence and substitutability
The question of compatibility and equivalence is a complicated and controversial topic and it is related to the problem of
substitutionality: that is, given type A and type B, are they equal types? compatible? can the value with type
B be used in the place where the value of A?
If type A is compatible with type B, A is a subtype of B while not always vice versa. The
definition is known as Liskov substitution
principle.
See also
|