DESIGN AND ANALYSIS OF ALGORITHM
GROUP THREE(3) ASSIGNMENT
THE KOLMOGOROV COMPLEXITY ALGORITHM
In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov–Chaitin complexity, algorithmic entropy, or program-size complexity) of an object, such as a piece of text, is a measure of the computability resources needed to ...view middle of the document...
The Kolmogorov complexity can be defined for any mathematical object, but for simplicity the scope of this article is restricted to strings. We must first specify a description language for strings. Such a description language can be based on any computer programming language, such as Lisp, Pascal, or Java virtual machine bytecode. If P is a program which outputs a string x, then P is a description of x. The length of the description is just the length of P as a character string, multiplied by the number of bits in a character (e.g. 7 for ASCII).
We could, alternatively, choose an encoding for Turing machines, where an encoding is a function which associates to each Turing Machine M a bitstring <M>. If M is a Turing Machine which, on input w, outputs string x, then the concatenated string <M> w is a description of x. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally preferred in the research literature. In this article, an informal approach is discussed.
Any string s has at least one description, namely the program:
If a description of s, d(s), is of minimal length (i.e. it uses the fewest bits), it is called a minimal description of s. Thus, the length of d(s) (i.e. the number of bits in the description) is the Kolmogorov complexity of s, written K(s). Symbolically,
K(s) = |d(s)|.
The length of the shortest description will depend on the choice of description language; but the effect of changing languages is bounded (a result called the invariance theorem).
There are some description languages which are optimal, in the following sense: given any description of an object in a description language, I can use that description in my optimal description language with a constant overhead. The constant depends only on the languages involved, not on the description of the object, or the object being described.
Here is an example of an optimal description language. A description will have two parts:
* The first part describes another description language.
* The second part is a description of the object in that language.
In more technical terms, the first part of a description is a computer program, with the second part being the input to that computer program which produces the object as output.
The invariance theorem follows: Given any description language L, the optimal description language is at least as efficient as L, with some constant overhead.
Proof: Any description D in L can be converted into a description in the optimal language by first describing L as a computer program P (part 1), and then using the original description D as input to that program (part 2). The total length of this new description D’ is (approximately):
|D’| = |P| + |D|
The length of P is a constant that doesn't depend on D. So, there is at most a...