The limits of reason
In 1956 Scientific American published an article by Ernest Nagel and James R.
Newman entitled “Gödel’s Proof.”
Two years later the writers published a book
with the same title—a wonderful work that is still in print.
I was a child, not even a teenager,
and I was obsessed by this little book. I remember the thrill of discovering
it in the New York Public Library.
I used to carry it around with me and try to explain
it to other children.
It fascinated me because Kurt Gödel used mathematics to show that mathematics itself
has limitations.
Gödel refuted the position of David Hilbert, who about a century
ago declared that there was a theory of everything for math, a finite set of principles from
which one could mindlessly deduce all mathematical truths by tediously
following the rules of symbolic mathematical logic.
But Gödel demonstrated that mathematics contains true statements that cannot be proved
that way. His result is based on two self-referential paradoxes:
“This statement is false” and “This statement is unprovable.”
(For more on Gödel’s incompleteness theorem, see
box.)
My attempt to understand Gödel’s proof took over my life, and now half a century
later I have published a little book of my own. In some respects, it is my own
version of Nagel and Newman’s book, but it does not focus on Gödel’s proof.
The only things the two books have in common are their small size and their goal of critiquing
mathematical methods.
Unlike Gödel’s approach, mine is based on measuring information
and showing that some mathematical facts cannot be compressed into
a theory because they are too complicated.
This new approach suggests that what Gödel discovered was just the
tip of the iceberg:
an infinite number of true mathematical theorems exist that cannot be proved from any
finite system of axioms.
Complexity and Scientific Laws
My story begins in 1686 with Gottfried W. Leibniz’s philosophical essay
Discours de métaphysique (Discourse on
Metaphysics), in which he discusses how one can distinguish between facts that can be described
by some
law and those that are lawless, irregular facts.
Leibniz’s very simple and profound idea appears
in section VI of the Discours,
in which he essentially states that
a theory has to be simpler than the data it explains, otherwise it does not explain anything.
The concept of a law becomes
vacuous if arbitrarily high mathematical complexity is permitted, because then one can always
construct a law no matter how random and patternless the data really are.
Conversely, if the only law that describes some data is an extremely complicated one,
then the data are actually lawless.
Today the notions of complexity and simplicity are put in precise quantitative terms by a
modern branch of mathematics called algorithmic information theory.
Ordinary information theory quantifies information by asking how many bits are needed to
encode the information. For example, it takes one bit to encode a single yes/no answer.
Algorithmic information, in contrast, is defined by asking what size computer program is
necessary to generate the data. The minimum number of bits—what size string of zeros and
ones—needed to store the program is called the algorithmic information content of the data.
Thus, the infinite sequence of numbers 1, 2, 3, … has very little algorithmic information;
a very short computer program can generate all those numbers. It does not matter how long
the program must take to do the computation or how much memory it must use—just the length
of the program in bits counts. (I gloss over the question of what programming language is
used to write the program—for a rigorous definition, the language would have to be
specified precisely. Different programming languages would result in somewhat different
values of algorithmic information content.)
To take another example, the number π, 3.14159…, also has only a little algorithmic
information content, because a relatively short algorithm can be programmed into a computer
to compute digit after digit. In contrast, a random number with a mere million digits, say
1.341285…64, has a much larger amount of algorithmic information. Because the number lacks
a defining pattern, the shortest program for outputting it will be about as long as the
number itself:
-
Begin
Print “1.341285…64″
End
(All the digits represented by the ellipsis are included in the program.) No smaller program
can calculate that sequence of digits. In other words, such digit streams are incompressible,
they have no redundancy; the best that one can do is transmit them directly. They are called
irreducible or algorithmically random.
How do such ideas relate to scientific laws and facts?
The basic insight
is a software view of science: a scientific theory is like a computer program that predicts
our observations, the experimental data.
Two fundamental principles inform this viewpoint. First, as William of Occam noted,
given two theories that explain the data, the simpler theory is to be preferred
(Occam’s razor). That is, the
smallest program that calculates the observations is the best theory.
Second is Leibniz’s insight, cast in modern terms—if
a theory is the same size in bits as the data it explains, then it is worthless, because
even the most random of data has a theory of that size.
A useful theory is a compression of the data; comprehension is compression.
You compress things into computer programs, into concise algorithmic descriptions.
The simpler the theory, the better you understand something.
Sufficient Reason
Despite living 250 years before the invention of the computer program,
Leibniz came very close to the modern idea of algorithmic information. He had all the key
elements. He just never connected them.
He knew that everything can be represented with binary information, he built one of the
first calculating machines, he appreciated the power of computation,
and he discussed
complexity and randomness.
If Leibniz had put all this together, he might have questioned one of
the key pillars of his philosophy,
namely, the principle of
sufficient reason—that everything happens for a reason.
Furthermore, if something
is true, it must be true for a reason.
That may be hard to believe sometimes, in the confusion and chaos of daily life,
in the contingent ebb and flow of human history.
But even if we cannot always see a reason (perhaps because the chain of
reasoning is long and subtle), Leibniz asserted, God can see the reason. It is there!
In that, he agreed with the ancient Greeks, who originated the idea.
Mathematicians certainly believe in reason and in Leibniz’s principle of sufficient
reason, because they always try to prove everything. No matter how much
evidence there is for a theorem, such as millions of demonstrated examples,
mathematicians demand a proof of the general case.
Nothing less will satisfy them.
And here is where the concept of algorithmic information can make its surprising contribution
to the philosophical discussion of the origins and limits of knowledge.
It reveals that certain mathematical facts are true for no reason,
a discovery that flies in the face of the principle of sufficient reason.
Indeed, as I will show later, it turns out that an infinite number of mathematical facts
are irreducible, which means no theory explains why they are true.
These facts are not just computationally
irreducible, they are logically irreducible. The only way to “prove” such
facts is to assume them directly as new axioms,
without using reasoning at all.
The concept of an
“axiom” is closely related to the idea of logical irreducibility. Axioms are mathematical facts
that we take as self-evident and do not try to prove from simpler principles.
All formal mathematical theories start with axioms and then deduce the consequences
of these axioms, which are called its theorems. That is how Euclid did things in Alexandria
two millennia ago, and his treatise on geometry is the classical model for mathematical
exposition.
In ancient Greece,
if you wanted to convince your fellow citizens to vote with you on some issue,
you had to reason with them—which I guess is
how the Greeks came up with the idea that in mathematics you have to prove things rather than just
discover them experimentally.
In contrast, previous cultures in Mesopotamia and Egypt apparently relied on experiment.
Using reason has certainly been an extremely fruitful approach, leading to modern mathematics and
mathematical physics and all that goes with them, including the technology for
building that highly logical and mathematical machine, the computer.
So am I saying that this approach that science and mathematics has been following for more than
two millennia crashes and burns?
Yes, in a sense I am. My counterexample illustrating the limited power of logic and reason,
my source of an infinite stream of unprovable
mathematical facts, is the number that I call
Ω.
The Number Omega
The first step on the road to Ω
came in a famous paper published
precisely 250 years after Leibniz’s essay.
In a 1936 issue of the Proceedings of the London Mathematical
Society, Alan M. Turing began the computer age by presenting a mathematical model
of a simple, general-purpose, programmable digital computer.
He then asked, Can we determine
whether or not a computer
program will ever halt?
This is Turing’s famous halting problem.
Of course, by running a program you can eventually discover that it halts, if it halts.
The problem, and it is an extremely fundamental one, is to decide when to give up
on a program that does not halt.
A great many special cases can be solved,
but Turing showed that a general solution is impossible.
No algorithm, no mathematical theory, can ever tell us which programs will halt
and which will not.
(For a modern proof of Turing’s thesis, see
box.)
By the way, when I say “program,” in modern terms I mean the concatenation of the computer
program and the data to be read in by the program.
The next step on the path to the number Ω
is to consider the ensemble of all possible programs.
Does a program chosen at random ever halt?
The probability of having that happen is my Ω number.
First, I must specify how to pick a program at random.
A program is simply a series of bits, so flip
a coin to determine the value of each bit.
How many bits long should the program be?
Keep flipping the coin so long as the computer is asking for another bit of input.
Ω is just the probability that the machine will eventually
come to a halt when supplied with a stream of random bits in this fashion.
(The precise numerical value of Ω depends on the choice of computer programming
language, but Ω’s surprising properties are not affected by this choice.
And once you have chosen a language, Ω has a definite value, just like π or
the number 3.)
Being a probability, Ω has to be greater than 0 and less than 1,
because some programs halt and some do not.
Imagine writing Ω out in binary.
You would get something
like 0.1110100…. These bits after the decimal point
form an irreducible stream. They are our irreducible mathematical facts
(each fact being whether the bit is a 0 or a 1).
Ω can be defined as an infinite sum,
and each N-bit program that halts contributes precisely 1/2N to the sum
[see box].
In other words, each N-bit program that halts adds a 1
to the Nth bit in the binary expansion
of Ω. Add up all the bits for all programs that halt, and you would get the precise
value of Ω.
This description may make it sound like you can calculate Ω accurately, just
as if it were √2 or the number π.
Not so—Ω is perfectly well defined and it is a specific number, but it is impossible
to compute in its entirety.
We can be sure that Ω cannot be computed because
knowing Ω would let us solve Turing’s halting problem,
but we know that this problem is unsolvable.
More specifically, knowing the first N bits of Ω would
enable you to decide whether or not each program up to N bits in size ever halts
[see box].
From this it follows that you need at least an N-bit program to calculate
N bits of Ω.
Note that I am not saying that it is impossible to compute some digits of Ω.
For example, if we knew that computer programs 0, 10 and 110 all halt, then we would know
that the first digits of Ω were 0.111. The point is that the first N digits of
Ω cannot be computed using a program significantly shorter than N bits long.
Most important, Ω supplies us with an infinite number of these irreducible bits.
Given any finite program,
no matter how many billions of bits long, we have an infinite number of bits that the
program cannot compute. Given any finite set of axioms, we have an infinite number of
truths that are unprovable in that system.
Because Ω is irreducible, we can immediately conclude that a theory of everything for
all of mathematics cannot exist.
An infinite number of bits of Ω constitute mathematical facts
(whether each bit is a 0 or a 1) that
cannot be derived from any principles simpler than the string of bits itself.
Mathematics therefore has infinite complexity, whereas any individual
theory of everything would have only finite complexity and could not capture
all the richness of the full world of mathematical truth.
This conclusion
does not mean that proofs are no good,
and I am certainly not against reason.
Just because some things are irreducible does not mean we should give up using reasoning.
Irreducible principles—axioms—have
always been a part of mathematics.
Ω just shows that a lot more of them are out there than people suspected.
So perhaps mathematicians should not try to prove everything.
Sometimes they should just add new axioms.
That is what you have got to do if you are faced with irreducible facts.
The problem is realizing that they are irreducible!
In a way, saying something is irreducible is giving up, saying that it cannot ever be
proved. Mathematicians would rather die than do that,
in sharp contrast with
their physicist colleagues, who are happy to be pragmatic and to use plausible reasoning
instead of rigorous proof.
Physicists are willing to add new principles, new scientific laws,
to understand new domains of experience.
This raises what I think is an extremely interesting question:
Is mathematics like physics?
Mathematics and Physics
The traditional view is that mathematics and physics are quite different. Physics describes
the universe and depends on experiment and observation. The particular laws that govern
our universe—whether Newton’s laws of motion or the Standard Model of particle physics—must
be determined empirically and then asserted like axioms that cannot be logically proved, merely
verified.
Mathematics, in contrast, is somehow independent of the universe. Results and theorems, such
as the properties of the integers and real numbers, do not depend in any way on the particular
nature of reality in which we find ourselves. Mathematical truths would be
true in any universe.
Yet both fields are similar.
In physics, and indeed in science generally, scientists compress their
experimental observations into scientific laws.
They then show how their observations can be deduced from these laws.
In mathematics, too, something like this happens—mathematicians
compress their computational experiments into mathematical axioms,
and they then show how to deduce theorems from these axioms.
If Hilbert had been right, mathematics would be a closed system, without room for new ideas.
There would be a static, closed theory of everything for all of mathematics,
and this would be like a dictatorship. In fact, for mathematics to progress you actually need
new ideas and plenty of room
for creativity. It does not suffice to grind away, mechanically deducing all the
possible consequences of a fixed number of basic principles.
I much prefer an open system. I do not like rigid, authoritarian ways of thinking.
Another person who thought mathematics is like physics
was Imre Lakatos, who left Hungary in 1956
and later worked on philosophy of science in England.
There Lakatos came up with a great word,
“quasi-empirical,”
which means that even though there are no true experiments that can be carried out in
mathematics, something similar does take place.
For example, the Goldbach conjecture states that any even number greater than 2 can be
expressed as the sum of two prime numbers. This conjecture was arrived at experimentally,
by noting empirically that it was true for every even number that anyone cared to examine.
The conjecture has not yet been proved, but it has been verified up to 1014.
I think that mathematics is quasi-empirical.
In other words, I feel that mathematics is different from physics (which is truly empirical)
but perhaps not as different as most people think.
I have lived in the worlds of both mathematics and
physics, and I never thought there was such a big difference between
these two fields. It is a matter of degree, of emphasis, not an absolute
difference. After all, mathematics and physics coevolved.
Mathematicians should not isolate themselves.
They should not cut themselves off from rich sources of new ideas.
New Mathematical Axioms
The idea of choosing to add more axioms is not an alien one to mathematics.
A well-known example is the parallel postulate in Euclidean geometry:
given a line and a point not on the line, there is exactly one line that can be drawn
through the point that never intersects the original line. For centuries geometers
wondered whether that result could be proved using the rest of Euclid’s axioms.
It could not. Finally, mathematicians realized that they could substitute different
axioms in place of the Euclidean version, thereby producing the non-Euclidean geometries
of curved spaces, such as the surface of a sphere or of a saddle.
Other examples are the law of the excluded middle in logic and the axiom of choice
in set theory. Most mathematicians are happy to make use of those axioms in their proofs,
although others do not, exploring instead so-called intuitionist logic or constructivist
mathematics. Mathematics is not a single monolithic structure of absolute truth!
Another very interesting axiom may be the “P ≠ NP” conjecture.
P and NP are names for classes of problems. An NP problem is one
for which a proposed solution can be verified quickly. For example, for the problem
“find the factors of 8,633,” one can quickly verify the proposed solution “97 and 89″
by multiplying those two numbers. (There is a technical definition of “quickly,” but
those details are not important here.) A P problem is one that can be solved
quickly even without being given the solution. The question is—and no one knows the
answer—can every NP problem be solved quickly? (Is there a quick way to find
the factors of 8,633?) That is, is the class P the same as the class NP?
This problem is one of the Clay Millennium Prize Problems for which a reward of $1 million
is on offer.
Computer scientists widely believe that P ≠ NP, but no proof is known.
One could say that a lot of quasi-empirical evidence points to P not being equal to
NP. Should P ≠ NP be adopted as an axiom, then? In effect, this is
what the computer science community has done. Closely related to this issue is the security
of certain cryptographic systems used throughout the world. The systems are believed to be
invulnerable to being cracked, but no one can prove it.
Experimental Mathematics
Another area of similarity between mathematics and physics is experimental mathematics:
the discovery of new mathematical results by looking at many
examples using a computer. Whereas this approach is not as persuasive as a short
proof, it can be more convincing than a long and extremely complicated proof, and for
some purposes it is quite sufficient.
In the past, this approach was defended with great vigor by George Pólya
and Lakatos, believers in heuristic reasoning and in the quasi-empirical
nature of mathematics. This methodology is also
practiced and justified in Stephen Wolfram’s A New Kind of Science (2002).
Extensive computer calculations can be extremely persuasive, but do they render proof
unnecessary? Yes and no. In fact, they provide a different kind of evidence.
In important situations, I would argue that both kinds of evidence are required,
as proofs may be flawed, and conversely computer searches may have the bad luck to stop
just before encountering a counterexample that disproves the conjectured result.
All these issues are intriguing but far from resolved. It is now 2006, 50 years
after this magazine published its article on Gödel’s proof, and we still do not
know how serious incompleteness is. We do not know if incompleteness is telling us
that mathematics should be done somewhat differently. Maybe 50 years from now we
will know the answer.
Overview/Irreducible Complexity
-
Kurt Gödel demonstrated that mathematics is necessarily incomplete,
containing true statements that cannot be formally proved. A remarkable
number known as Ω reveals even greater incompleteness by providing
an infinite number of theorems that cannot be proved by any finite system
of axioms. A “theory of everything” for mathematics is therefore impossible. -
Ω is perfectly well defined [see box] and has a definite value,
yet it cannot be computed by any finite computer program. -
Ω’s properties suggest that mathematicians should be more willing to
postulate new axioms, similar to the way that physicists must evaluate
experimental results and assert basic laws that cannot be proved logically. -
The results related to Ω are grounded in the concept of algorithmic
information. Gottfried W. Leibniz anticipated many of the features of
algorithmic information theory more than 300 years ago.
The author
Gregory Chaitin
is a researcher at the IBM Thomas J. Watson Research Center.
He is also honorary professor at the University of Buenos Aires and visiting
professor at the University of Auckland.
He is co-founder, with Andrei N. Kolmogorov, of the field
of algorithmic information theory.
His nine books include the nontechnical works Conversations with a
Mathematician (2002) and Meta Math! (2005).
When he is not thinking about the foundations of mathematics, he enjoys
hiking and snowshoeing in the mountains.
What Is Gödel’s Proof?
Kurt Gödel’s incompleteness theorem demonstrates that mathematics contains
true statements that cannot be proved. His proof achieves this by constructing
paradoxical mathematical statements. To see how the proof works, begin
by considering the liar’s paradox: “This statement is false.”
This statement is true if and only if it is false,
and therefore it is neither true nor false.
Now let’s consider “This statement is unprovable.”
If it is provable, then we are proving a falsehood, which is extremely unpleasant
and is generally assumed to be impossible.
The only alternative left is that this statement is unprovable.
Therefore, it is in fact both true and unprovable.
Our system of reasoning is incomplete,
because some truths are unprovable.
Gödel’s proof assigns to each possible mathematical statement a so-called Gödel
number.
These numbers provide a way to talk about properties of the statements by
talking about the numerical properties of very large integers.
Gödel uses his numbers to construct self-referential statements analogous to the
plain English paradox “This statement is unprovable.”
Strictly speaking, his proof does not show that mathematics is incomplete.
More precisely, it shows that
individual formal axiomatic mathematical theories
fail to prove the true numerical statement “This statement is unprovable.”
These theories therefore cannot be “theories of everything” for mathematics.
The key question left unanswered by Gödel: Is this an isolated phenomenon, or are there
many important mathematical truths that are unprovable?—G.C.
Why Is Turing’s Halting Problem Unsolvable?
A key step in showing that incompleteness is natural and pervasive was taken by Alan M. Turing
in 1936, when he demonstrated that there can be no general procedure to decide if a self-contained
computer program will eventually halt.
To demonstrate this result,
let us assume the opposite of what we want to prove is true. Namely, assume that there is
a general procedure H that can decide whether any given computer program will halt.
From this assumption we shall derive a contradiction.
This is what is called a reductio ad absurdum proof.
So assuming the existence of H, we can construct the following program P
that uses H as a subroutine.
The program P knows its own size in bits (N)—there is certainly room in
P for it to contain the number
N—and then using H, which P contains, P takes a look at all programs
up to 100 times N bits in size to see which halt and which do not.
Then P runs all the ones that
halt to determine the output that they produce.
This will be precisely the set of all digital objects with complexity up to 100 times N.
Finally, our program P outputs the smallest positive integer not in this set,
and then P itself halts.
So P halts, P’s size is N bits, and
P’s output is an integer
that cannot be produced by a program whose size is less than or equal to 100 times N bits.
But P has just produced this integer as its output,
and it is much too small to be able to do this, because P’s size is only N bits,
which is much less than 100 times N. Contradiction!
Therefore, a general procedure H for deciding whether or not programs ever halt cannot exist,
for if it did then we could actually construct this paradoxical program P using H.
Finally, Turing points out that if there were a theory of everything
that always enables you to prove that an individual program halts or to prove that it never does,
whichever is the case,
then by systematically running through all possible proofs you could eventually decide
whether individual programs ever halt.
In other words, we could use this theory to construct H,
which we have just shown cannot exist.
Therefore there is no theory of everything for the halting problem.
Similar reasoning shows that no program that is substantially shorter than N bits
long can solve the Turing halting problem for all programs up to N bits long.—G.C.
How Omega Is Defined
To see how the value of the number Ω is defined,
look at a simplified example. Suppose that the computer we are dealing with has only
three programs that halt, and they are the bit strings 110, 11100 and 11110.
These programs are, respectively, 3, 5 and 5 bits in size.
If we are choosing programs at random by flipping a coin for each bit,
the probability of getting each of them by chance is precisely 1/23,
1/25 and 1/25, because each particular bit has probability 1/2.
So the value of Ω (the halting probability) for this particular computer is
given by the equation:
Ω = 1/23 + 1/25 + 1/25
= .001 + .00001 + .00001
= .00110
This binary number is the probability of getting one of the three halting programs by chance.
Thus, it is the probability that our computer will halt.
Note that because program 110 halts we do not consider any programs that start with 110 and
are larger than three bits—for example, we do not consider 1100 or 1101. That is,
we do not add terms of .0001 to the sum for each of those programs. We regard all the longer
programs, 1100 and so on, as being included in the halting of 110. Another way of saying
this is that the programs are self-delimiting; when they halt, they stop asking for more
bits.—G.C.
Why Is Omega Incompressible?
I wish to demonstrate that Ω is incompressible—that one cannot use a program
substantially shorter than N bits long to compute the first N bits of Ω.
The demonstration will involve a careful combination of facts about Ω and the Turing
halting problem that it is so intimately related to. Specifically, I will use the fact that
the halting problem for programs up to length N bits cannot be solved by a program
that is itself shorter than N bits (see box).
My strategy for demonstrating that Ω is incompressible is to show that having the
first N bits of Ω would tell me how to solve the Turing halting problem
for programs up to length N bits. It follows from that conclusion that no program
shorter than N bits can compute the first N bits of Ω.
(If such a program existed, I could use it to compute the first N bits of Ω
and then use those bits to solve Turing’s problem up to N bits—a task that is
impossible for such a short program.)
Now let us see how knowing N bits of Ω would enable me to solve the halting
problem—to determine which programs halt—for all programs up to N bits in size.
Do this by performing a computation in stages.
Use the integer K to label which stage we are at:
K = 1, 2, 3, ….
At stage K, run every program up to K bits in size for K seconds.
Then compute a halting probability, which we will call ΩK,
based on all the programs that halt by stage K.
ΩK will be less than Ω because it is based on only a
subset of all the programs that halt eventually, whereas Ω is based on all such
programs.
As K increases, the value of
ΩK will get closer and closer to the actual value of Ω.
As it gets closer to Ω’s actual value, more and more of
ΩK’s first bits will be correct—that is, the same as the
corresponding bits of Ω.
And as soon as the first N bits are correct, you know that you have encountered every program
up to N bits in size that will ever halt.
(If there were another such N-bit program, at some later-stage K
that program would halt, which would increase the value of
ΩK to be greater than Ω,
which is impossible.)
So we can use the first N bits of Ω to solve the halting problem for all
programs up to N bits in size. Now suppose we could compute the first N
bits of Ω with a program substantially shorter than N bits long. We could then
combine that program with the one for carrying out the
ΩK algorithm, to produce a program shorter than N bits
that solves the Turing halting problem up to programs of length N bits.
But, as stated up front, we know that no such program exists. Consequently, the first
N bits of Ω must require a program that is almost N bits long to compute
them. That is good enough to call Ω incompressible or irreducible. (A compression
from N bits to almost N bits is not significant for large N.)—G.C.
More to explore
For a chapter on Leibniz, see Men of Mathematics. E. T. Bell. Reissue. Touchstone, 1986.
For more on a quasi-empirical view of math, see New Directions in the Philosophy of Mathematics. Edited by Thomas Tymoczko. Princeton University Press, 1998.
Gödel’s Proof. Revised edition.
E. Nagel, J. R. Newman and D. R. Hofstadter.
New York University Press, 2002.
Mathematics by Experiment: Plausible Reasoning in the 21st Century.
J. Borwein and D. Bailey. A. K. Peters, 2004.
For Gödel as a philosopher and the Gödel-Leibniz connection, see
Incompleteness: The Proof and Paradox of Kurt Gödel.
Rebecca Goldstein. W. W. Norton, 2005.
Meta Math!: The Quest for Omega. Gregory Chaitin. Pantheon Books, 2005.
Short biographies of mathematicians can be found at
www-history.mcs.st-andrews.ac.uk/BiogIndex.html
Gregory Chaitin’s home page is
www.umcs.maine.edu/~chaitin/