WWW: http://www.cs.bham.ac.uk/~mmk/
Paradoxes have from the days of the Ancient Greeks been a considerable source of exhilaration. They can be very funny, but also considerably confusing and lead to inappropriate conclusions. In the most concise form, a logical paradox can be formulated by the sentence: "This sentence is false." Let's assume it is true, then it must be false, since it says that it is false. Hence we have a contradiction since it is true and false at the same time. So our initial assumption that it is true can't have been correct, hence it must be false. This leads to a contradiction as well, since if it is false, its content can't be true, hence the sentence can't be false either.
The problems with paradoxical sentences have been known for very long, but were initially thought to be of no relevance for the development of set theory and logic around the end of the 19th/start of the 20th century. But then paradoxes in set theory and logic seemed to endanger the progress in the field and for a while it wasn't clear whether "the paradise of sets" built by Cantor (as Hilbert called it) and "the paradise of logic" built by Frege weren't inhabited by terrible lions that could attack and kill at any time. Frege had formalised predicate logic and achieved a new level of clarity and rigour. Naive set theory was built by Cantor and first published around the same time. The first modern paradoxes were found in 1897 by Burali-Forti, and have been refined to paradoxes which involve the cardinality of the set of all sets. Russell's notion of the set of all sets which do not contain themselves presented a major problem for set theory and also meant that Frege's original system was reflective and paradoxical.
Zermelo [Zermelo08] presented in 1908 the first axiomatisation of set theory, and Russell [Russell08] in the same year the theory of types. This way it was possible to fence the lions out. It was achieved by the foundation axiom which excludes sets which contain (or worse do not contain) themselves in the case of set theory, and the theory of types which excludes that a predicate can speak about itself. This way it was possible to build a safe area, which is free of paradoxes. As Roy Dyckhoff has rightly pointed out to me after the publication, it is not the Foundation Axiom that fences the lions out of ZF (otherwise there would be no Aczel theory of anti-foundation, consistent w.r.t. ZF); what does it is the careful formulation of comprehension/separation, forbidding {x | P(x) } while allowing (provided A is a set) { x IN A | P(x) }. Likewise I advocate here a system which is in Aczel's sense anti-foundational. My apologies for the imprecision. To be more precise, there might still be some wild beasts out there in the bush. But for a hundred years there was no more attack and so we are pretty confident that there aren't any inside the fence. But of course, you never know, that may just be a false kind of security.
While the constructs of axiomatic sets and types may be considered as adequate for mathematical reasoning, there are quite a number of examples in the areas of natural language understanding and knowledge representation in a more general sense, where we need a more powerful language, where we can't exclude self-referentiality a priori on mere syntactic grounds. If we did, we would end up with a system which is difficult to use and/or is not close to the everyday usage of language, which does allow for paradoxes. Actually quite a number of complicated formalisms have been developed and are currently in the main stream of investigations in AI, although some simpler system may do the job better. Perlis pleaded in 1985 [Perlis85] that we "can have everything in first-order logic." He investigated first-order logic plus strings and redefined Tarski's definition of truth. I advocate a three-valued logic here.
First-order logic plus strings is an alternative knowledge representation formalism built on logic which can be compared to modal logic. As Davis [Davis90, p.77] points out, the difference between the two approaches is pretty much the same as the difference between direct quotation - John knows "The evening star is the morning star.", formally represented in syntactic theory by Knows(John,"EveningStar = MorningStar") - and and indirect quotation - John knows that the evening star is the morning star, formally represented in modal logic by [John] EveningStar = MorningStar. In both approaches the truth values of the formulae are not extensional, that is, the truth value of composed formulae cannot be calculated as functions of the subformulae: In modal logic the truth value in all possible worlds reachable from an initial world has to be known, in syntactic theory, expressions like "EveningStar = MorningStar" stand for strings of symbols and not for the objects they denote.
The approach is problematic when we adopt Tarski's definition of truth True("A") = A. Since the language is self-referential we are able to express the sentence "This sentence is false.", formally as L = ~ True("L"). Together with Tarski's definition of truth we get L = ~ L. This is contradictory in a two-valued setting. We want to be able to express such sentences not since they are particularly useful in their own right, but since it is difficult to draw a line between useful and not useful self-referentiality.
There are different ways out, one is to abandon classical logic and to live with contradictions (adapt a paraconsistent logic), another is to forbid self-referentiality (Russell's approach), a third is to go for a three-valued logic in which the third truth value stands for paradoxicality. Just adding a third truth value doesn't solve the problem, since "higher-order" type paradoxes are possible, which involve speaking about paradoxicality. When we disallow this we can speak about the truth of sentences, for instance, we can say "This sentence is false.", but we must not say something like "This sentence is paradoxical or false." This way it seems possible to use efficient reasoners and come close to the treatment in everyday language. However to deal with the full phenomenon seems - as usual - to be more difficult. For more details see [Kerber03].