\documentstyle{Not quite LaTeX, but definitely literate Haskell} \title{Real number computation in Haskell with real numbers represented as infinite sequences of digits} \author{Martin Escardo} \affiliation{University of Birmingham, UK} \date{1998-2011, version for Fun in the Afternoon, 14 March 2011, slightly updated 24 March 2011} \begin{document} This file is a literate Haskell program. It is written in sort-of LaTeX-style ASCII, so that it is human-readable without running latex. A future version will be simultaneously a latex file and a literate Haskell program using lhs2tex. \section{Running examples} There are several examples to choose from (search the file for the keyword "example"): > {-# LANGUAGE NPlusKPatterns #-} > import System.IO > main = do > hSetBuffering stdout NoBuffering > putStr ((show (take 500 example4)) ++ "\n") \section{Decimal notation is problematic} Brouwer (1920's) observed that decimal notation is not suitable for computation with infinitely many digits. Consider 3 x 0.33333..., where we don't know yet what the next digit is. What should be the first digit of the result? If we eventually see a digit >3, the output must be 1.0... If we eventually see a digit <3, the output must be 0.9... Hence while we read a digit =3, we can't figure out the first digit of the result. We need a crystal ball to compute the first digit. Continuity is violated. Continuity says that finitely many digits of the result depend only on finitely many digits of the argument. To regain continuity, alternative representations of real numbers are used (see e.g. Plume's (1997) report). \section{Our representation of the unit interval} We work with the signed unit interval I=[-1,1], using binary notation with signed digits. Alphabet of digits -1,0,1: > type Digit = Int Representation of the space I=[-1,1]: > type I = [Digit] A sequence ds = [d_0, d_1, ..., d_n, ...] represents the number x = d_0 / 2 + d_1 / 4 + d_2 / 8 + ... + d_n / 2^{n+1} + ... Soon we will actually use a wider variety of digits, but not in the type I. \section{Some constants} > one, zero, minusOne, half, minusHalf :: I > minusOne = repeat (-1) > minusHalf = -1 : repeat 0 > zero = repeat 0 > half = 1 : zero > one = repeat 1 \section{Auxiliary representations of real numbers} Although the main representation we consider is the above, it is convenient to use auxiliary intermediate representations to simplify the algorithms. We still use base 2, but allow more digits, and then convert back to the three digits -1,0,1. Type of digits -2,-1,0,1,2: > type Digit2 = Int Type of digits -4,-3,-2,-1,0,1,2,3,4: > type Digit4 = Int More generally, type of digits |d|<=n. > type Digitn = Int Type of interval spaces [-2,2], [-4,4], and [-n,n]: > type I2 = [Digit2] > type I4 = [Digit4] > type In = [Digitn] Dependent types would be handy! Maybe the next version will be in Agda. The following converts back to our standard representation. It amounts to division by n as a function [-n,n] -> [-1,1] for n>=2. Dependent types again would be handy here. We use n = 2 and n = 4. The first case of the following definition can be omitted. It is an optimization that makes a significant difference, by reducing the lookahead complexity. > divideBy :: Int -> In -> I > divideBy n (0:x) = 0 : divideBy n x -- Added 5 Feb 2015. Makes everything way faster. > divideBy n (a:x) | abs a == n > = (if a < 0 then -1 else 1) : divideBy n x > divideBy n (a:b:x) = > let d = 2 * a + b > in if d < -n then -1 : divideBy n (d+2*n:x) > else if d > n then 1 : divideBy n (d-2*n:x) > else 0 : divideBy n (d:x) > divideBy2 :: I2 -> I -- Added 5 Feb 2015 again to make things faster. > divideBy2 ( 0:x) = 0 : divideBy2 x > divideBy2 ( 2:x) = 1 : divideBy2 x > divideBy2 (-2:x) = -1 : divideBy2 x > divideBy2 (a:b:x) = > let d = 2*a+b > in if d < -2 then -1 : divideBy2 (d+4:x) > else if d > 2 then 1 : divideBy2 (d-4:x) > else 0 : divideBy2 (d:x) \section{Some very basic operations on I=[-1,1]} Addition as a function [-1,1] x [-1,1] -> [-2,2]. This is our first example where we use an auxiliary representation: > add2 :: I -> I -> I2 > add2 = zipWith (+) Midpoint (x+y)/2 as a function [-1,1] x [-1,1] -> [-1,1]: > mid :: I -> I -> I > mid x y = divideBy2 (add2 x y) Proceeding in this way, we have avoided a myriad of cases in the definition of addition and mid points that one often sees in the literature, while still being time and space efficient. (There are in fact 3^4 cases, because we need to look at two digits of each argument to produce one output digit.) Complement x |-> -x as a function [-1,1] -> [-1,1]: > compl :: I -> I > compl = map (\d -> -d) Division by a positive integer: > divByInt :: I -> Int -> I > divByInt x n = f x 0 > where f (a:x) s = let t = a + 2 * s > in if t >= n then 1 : f x (t - n) > else if t <= -n then -1 : f x (t + n) > else 0 : f x t \section{Infinitary operations} Given a sequence x0,x1,x2,...,xn,... of points of [-1,1], compute M_i x_i = x0 / 2 + x1 / 4 + x2 / 8 + ... + xn / 2^(n+1) + ... = sum_n xn / 2^(n-1) = mid(x0, mid(x1, mid(x2, ...))) = bigMid x This infinitary operation is proposed by Escardo and Simpson (LICS'2001, "A universal characterization of the closed Euclidean interval"). The definition (*) bigMid (x:xs) = mid x (bigMid xs) doesn't work, because mid needs two digits of input from each argument to produce one digit of output, and hence (*) cannot produce any digit. We use Scriven's algorithm (in his 2008 MSc thesis (MFPS'2009)). It uses the auxiliary representation I4 to compute bigMid' :: [I] -> [I4], and then converts back to I by division by 4: > bigMid :: [I] -> I > bigMid = (divideBy 4).bigMid' > where bigMid'((a:b:x):(c:y):zs) = 2*a + b + c : bigMid'((mid x y):zs) Although bigMid cannot be defined using (*), it does satisfy (*). \section{Truncated operations} Some truncated operations are also useful. The truncation retraction truncate: R -> [-1,1] is mathematically defined as: truncate(x) = max(-1,min(x,1)) / | -1 if x <= -1, = < x if -1 <= x <= 1, | 1 if 1 <= x. \ Truncated x+1 as a function [-1,1] -> [-1,1], that is, x |-> min(x+1,1): > addOne :: I -> I > addOne ( 1 : x) = one > addOne ( 0 : x) = 1 : addOne x > addOne (-1 : x) = 1 : x Truncated x-1 as a function [-1,1] -> [-1,1], that is, x |-> max(-1,x-1) > subOne :: I -> I > subOne ( 1 : x) = -1 : x > subOne ( 0 : x) = -1 : subOne x > subOne (-1 : x) = minusOne Truncated x |-> 1-x as a function [-1,1] -> [-1,1]. > oneMinus :: I -> I > oneMinus = addOne.compl Truncated multiplication by 2 as a function [-1,1] -> [-1,1], that is, x |-> max(-1,min(2x,1)). > mulBy2 :: I -> I > mulBy2 ( 1 : x) = addOne x > mulBy2 ( 0 : x) = x > mulBy2 (-1 : x) = subOne x Truncated multiplication by 4: > mulBy4 :: I -> I > mulBy4 = mulBy2.mulBy2 Truncated multiplication of a number by non-negative integer: > tMulByInt :: I -> Int -> I > tMulByInt x 0 = zero > tMulByInt x 1 = x > tMulByInt x n = if even n > then mulBy2(tMulByInt x (n div 2)) > else add x (mulBy2(tMulByInt x (n div 2))) Truncated addition as a function [-1,1] x [-1,1] -> [-1,1], that is, (x,y) |-> max(-1,min(x+y,1)) > add :: I -> I -> I > add x y = mulBy2(mid x y) \section{Summary so far} We have one, zero, minusOne, half, minusHalf :: I divideBy :: Int -> In -> I add2 :: I -> I -> I2 mid :: I -> I -> I bigMid :: [I] -> I compl :: I -> I divByInt :: I -> Int -> I with truncated operations addOne :: I -> I subOne :: I -> I oneMinus :: I -> I mulBy2 :: I -> I mulBy4 :: I -> I tMulByInt :: I -> Int -> I \section{Computation of the constant pi} Using Bailey, Borwein & Plouffe 1997, we get: pi/8 = bigMid_k 8^{-k} (1/(8k+1) - 1/2(8k+4) - 1/4(8k+5) - 1/4(8k+6)) (http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula). > piDividedBy32 :: I > piDividedBy32 = > bigMid > [f k (mid (mid (g1 k) (g2 k))(mid (g3 k) (g4 k))) | k <- [0..]] > where f k x = if k == 0 then x else 0:0:0: f (k-1) x > g1 k = divByInt (repeat 1) (8*k+1) > g2 k = divByInt (-1 : zero) (8*k+4) > g3 k = divByInt ( 0 : -1 : zero) (8*k+5) > g4 k = divByInt ( 0 : -1 : zero) (8*k+6) > example1 = piDividedBy32 This is an unusual example of a situation of when things go rather well regarding the trade-off elegance-versus-efficiency. Of course, there are much better ways of computing pi. But this is not the worst. \section{Conversion from and to Double} We now want to print the results in human-readable form. The easiest kind of conversion is to/from Double. This can be done without round-off errors, but of course the conversion to Double will involve a truncation error. > toDouble :: I -> Double > toDouble = f 55 > where f 0 x = 0.0 > f k (-1 : x) = (-1.0 + f (k-1) x)/2.0 > f k ( 0 : x) = ( f (k-1) x)/2.0 > f k ( 1 : x) = ( 1.0 + f (k-1) x)/2.0 > fromDouble :: Double -> I > fromDouble = f 55 > where f 0 x = zero > f k x = if x < 0.0 then -1 : f (k-1) (2.0 * x + 1) > else 1 : f (k-1) (2.0 * x - 1) > example2 = (toDouble piDividedBy32) * 32 > example3 = example2 - pi The last example gives 0.0, which confirms that Haskell correctly computes pi in Double. Or that our algorithms so far are not completely wrong. \section{Multiplication by an integer} We multiply a number in [-1,1] by positive integer, producing integer and fractional parts. In particular we will be able to multiply by 10, and hence convert to decimal (with a caveat - see below). > mulByInt :: I -> Int -> (Int,I) > mulByInt x n = f n > where f 1 = (0, x) > f n = let (a,u) = f (n div 2) > d:y = u > b = 2*a+d > in if even n > then (b,y) > else let e:t = (mid x y) > in (b+e,t) \section{Conversion to decimal} Cannot be done without crystal balls, for continuity reasons. However, there is a conversion to decimal that diverges if the input is of the form m/10^n with m and n integers (10-adic numbers). This is the best that can be done without violating continuity, and hence computability, requirements. We consider conversion using negative decimal digits first. The alphabet of positive and negative (and zero) decimal digits is: > type Decimal = [Int] The conversion allowing negative decimal digits is defined by: > signedDecimal :: I -> Decimal > signedDecimal x = let (d,y) = mulByInt x 10 > in d : signedDecimal y We now get rid of negative decimal digits. Only works for positive, non-10-adic numbers, as discussed above. We use a finite state machine with stages "f" and "g", using a weak positiveness test as an auxiliary function: > normalize :: Decimal -> Decimal > normalize x = f x > where f(d:x) = if wpositive x > then d:f x > else (d-1): g x > g(0:x) = 9: g(x) > g(d:x) = if wpositive x > then (10+d) : f x > else (10+d-1) : g x > wpositive (d:x) = > if d > 0 > then True > else if d < 0 > then False > else wpositive x We now convert a positive, non 10-adic, number to decimal notation using only non-negative digits: > decimal :: I -> Decimal > decimal = normalize.signedDecimal > decimalString :: I -> String > decimalString = concat.(map show).decimal The decimal expansion of pi can be computed as follows: > example4 = let (m,x) = mulByInt piDividedBy32 32 > in show m ++ "." ++ decimalString x \section{Full multiplication algorithms} At this point here start in an elegant way when the only requirement is correctness, but quickly get ugly when we aim for efficiency. We will consider several algorithms, starting from the elegant ones. You have to choose the one you want: > mul :: I -> I -> I > mul = mul_version2 All of them require multiplication of a signed digit by a number in [-1,1]: > digitMul :: Digit -> I -> I > digitMul (-1) x = compl x > digitMul 0 x = zero > digitMul 1 x = x The easiest algorithm uses the bigMid function, and we could have presented it much earlier. But we wanted to show that it is possible to compute pi without knowing how to perform full multiplication. > mul_version0 :: I -> I -> I > mul_version0 x y = bigMid (zipWith digitMul x (repeat y)) I don't have the time (at the time of writing) to fully explain why this is not as efficient as it can be, but I intend to do so in a future version. But it is pleasant that we can get an easy, self-explanatory algorithm (which works in the same way as we do computations by hand in the case of finitely many digits, thanks to the bigMid operation). At this point I want to save time and refer you to the beautiful work by David Plume at Edinburgh, supervised by Alex Simpson and myself in 1997: http://www.dcs.ed.ac.uk/home/mhe/plume/index.html He was an undergrad student at that time! Here is Plume's multiplication algorithm (at that time version0 was not known as far as I am aware): > mul_version1 :: I -> I -> I > mul_version1 (a0 : a1 : x) (b0 : b1 : y) = mid p q > where p = mid p' p'' > p' = (a0*b1): mid (digitMul b1 x) (digitMul a1 y) > p''= mid (digitMul b0 x) (digitMul a0 y) > q = (a0*b0) : (a1*b0) : (a1*b1) : mul_version1 x y Soon after that (still last millenium), I came up with the following way-faster algorithm, which arises by (1) adding particular cases (first two equations) and (2) considering special cases of the above algorithm and making mathematical simplifications: > mul_version2 :: I -> I -> I > mul_version2 (0:x) y = 0 : mul_version2 x y > mul_version2 x (0:y) = 0 : mul_version2 x y > mul_version2 (a0 : 0 : x) (b0 : 0 : y) = mid p q > where p = 0 : mid (digitMul b0 x) (digitMul a0 y) > q = (a0*b0) : 0 : 0 : mul_version2 x y > mul_version2 (a0 : 0 : x) (b0 : 1 : y) = mid p q > where p = mid p' p'' > p' = 0 : 0 : x > p''= mid (digitMul b0 x) (digitMul a0 y) > q = (a0*b0) : 0 : 0 : mul_version2 x y > mul_version2 (a0 : 0 : x) (b0 : b1 : y) = mid p q > where p = mid p' p'' > p' = (a0*b1): 0 : digitMul b1 x > p''= mid (digitMul b0 x) (digitMul a0 y) > q = (a0*b0) : 0 : 0 : mul_version2 x y > mul_version2 (a0 : a1 : x) (b0 : 0 : y) = mid p q > where p = mid p' p'' > p' = 0 : 0 : digitMul a1 y > p''= mid (digitMul b0 x) (digitMul a0 y) > q = (a0*b0) : (a1*b0) : 0 : mul_version2 x y > mul_version2 (a0 : a1 : x) (b0 : b1 : y) = mid p q > where p = mid p' p'' > p' = (a0*b1): mid (digitMul b1 x) (digitMul a1 y) > p''= mid (digitMul b0 x) (digitMul a0 y) > q = (a0*b0) : (a1*b0) : (a1*b1) : mul_version2 x y This is not beautiful at this point, but it is the price you have to pay to be fast. I don't think the last word has been said about efficient multiplication in the infinite case (in practice, or in theory). The previous algorithm can be further simplified when we want to compute (mul x x), that is, squares, and this gives a further speed up. > sqr :: I -> I > sqr (0:x) = 0 : 0 : sqr x > sqr (a0 : 0 : x) = mid p q > where p = 0 : digitMul a0 x > q = (a0*a0) : 0 : 0 : sqr x > sqr (a0 : a1 : x) = mid p q > where p = mid p' p'' > p' = (a0*a1): digitMul a1 x > p''= digitMul a0 x > q = (a0*a0) : (a1*a0) : (a1*a1) : sqr x \section{Wrong results very fast, and right result not so fast} A computation which is well-known to go wrong if care is not taken is the orbit of the logistic map f(x) = ax(1-x). The orbit is the sequence x0, f(x0), f(f(x0)), and so on. A particularly bad case takes place when a=4. In this case we have a nice map f:[0,1]->[0,1] with a very nasty orbit. The point is that the map looks innocent, but its orbit plays tricks. This map was introduced in the 1800's as a model of population evolution (let's say fish in a lake, where 1 is the maximum capacity of the lake, and 0 is extinction). Of course, this is probably not a very good model. The point is that, whether or not the model is good, it gives us computational (and indeed mathematical) trouble. Before writing Haskell code, let's what happens in C with float and doubles: void main() { float const a = 4.0; float const x0 = 0.671875; float xs = x0; float const n = 60; double xd = x0; for (int i = 0; i <= n; i++) { xs = a * xs * (1.0 - xs); xd = a * xd * (1.0 - xd); } printf("xs = %f xd = %f \n", xs, xd); } We get: xs = 0.934518 xd = 0.928604 Maybe we can be confident that the correct result rounded to 2 digits would be $0.93$? Let's see. Make the program print intermediate results: void main() { float const a = 4.0; float const x0 = 0.671875; float xs = x0; float const n = 60; double xd = x0; for (int i = 0; i <= n; i++) { xs = a * xs * (1.0 - xs); xd = a * xd * (1.0 - xd); printf("%d \t %f \t %f \n", i, xs, xd); } } We get: i xs xd ----------------------------- 0 0.671875 0.671875 5 0.384327 0.384327 10 0.313034 0.313037 15 0.022702 0.022736 20 0.983813 0.982892 25 0.652837 0.757549 <------ 30 0.934927 0.481445 35 0.848153 0.313159 40 0.057696 0.024009 45 0.991613 0.930892 50 0.042174 0.625693 55 0.108415 0.637033 60 0.934518 0.928604 We now use a different, equivalent formula: f(x) = 4x(1-x) = 1-(2x-1)^2 n exact double 1 double 2 ---------------------------------------- 0 0.671875 0.671875 0.671875 5 0.384327 0.384327 0.384327 10 0.313037 0.313035 0.313033 15 0.022735 0.022720 0.022700 20 0.982892 0.983326 0.983864 25 0.757549 0.709991 0.646726 <-- 30 0.481445 0.367818 0.997196 35 0.313159 0.824940 0.984603 40 0.024009 0.899100 0.553930 45 0.930881 0.632135 0.975145 50 0.625028 0.823770 0.880008 55 0.615752 0.926760 0.898787 60 0.315445 0.371371 0.648129 Let's briefly discuss how the exact entry was \emph{not} computed. Notice that if a and x are rational, then so is f(x) = ax(1-x). So we could in principle use rational arithmetic with arbitrary precision for the numerators and denominators (as in Haskell). But the computation of x_n runs out of memory for n > 30 or so (2Gb of memory). Numerators and denominators quickly grow to large, relatively prime integers. We move back to infinite precision arithmetic in Haskell. We consider a slow and a fast' way of computing the logistic map (you can also play with the several ways of defining multiplication, and I have): > logistic, logistic' :: I -> I > logistic x = mulBy4 (mul x (oneMinus x)) The following is more efficient (faster) when it is iterated: > logistic' x = oneMinus (sqr(g x)) -- 1-(2x-1)^2 > where g ( 1 : x) = x -- g(x)= max(-1,2x-1) > g ( 0 : x) = subOne x > g (-1 : x) = minusOne Here is an experiment, where I work with the fast one: Firstly, x0 has an exact binary representation: > x0 = 0.671875 It is > d0 :: I > d0 = [1,0,1,0,1,1] ++ zero > logistics = map toDouble (iterate logistic' d0) > dlogistic' :: Double -> Double > dlogistic' x = 1.0-(2.0 * x - 1.0)^2 > logisticsDouble = iterate dlogistic' x0 > logisticsError = zipWith (-) logistics logisticsDouble > example5 = logisticsError \section{Power series} Consider an analytic function f(x) = sum_n a_n x^n. Now define g(x) = 1/2 f(x/2). We calculate g(x) = 1/2 sum_n a_n x^n (1/2)^n = sum_n a_n x^n (1/2)^{n-1} = M_n a_n x_n. Hence we can compute g using bigMid. To compute the elementary functions, we can apply the usual Taylor series from calculus. \section{Exponential function 1/2 e^(x/2)} > mexp :: I -> I > mexp x = bigMid(series one 1) > where series y n = y : series (divByInt (mul x y) n) (n+1) \section{Trigonometric function 1/2 sin(x/2)} > msin :: I -> I > msin x = bigMid(series x 2) > where x2 = compl(sqr x) > series y n = zero : y : > series(divByInt(mul x2 y)(n*(n+1)))(n+2) \section{Trigonometric function 1/2 cos(x/2)} > mcos :: I -> I > mcos x = bigMid(series one 1) > where x2 = compl(sqr x) > series y n = y : zero : > series(divByInt(mul x2 y)(n*(n+1)))(n+2) \section{Function 1/2 arctan(x/2)} > marctan :: I -> I > marctan x = bigMid(series x 1) > where x2 = compl(sqr x) > series y n = zero : divByInt y n : > series (mul x2 y) (n+2) \section{Number pi again} Use K. Takano 1982: pi/4 = 12 arctan(1/49) + 32 arctan1/57) - 5 arctan(1/239) + 12 arctan(1/110443) > piDividedBy4 :: I > piDividedBy4 = let inverse n = divByInt one n > arctan = mulBy2.marctan.mulBy2 > y1 = tMulByInt (arctan(inverse 49)) 12 > y2 = tMulByInt (arctan(inverse 57)) 32 > y3 = compl(tMulByInt (arctan(inverse 239)) 5) > y4 = tMulByInt (arctan(inverse 110443)) 12 > in add (add y1 y2) (add y3 y4) > example13 = let (m,x) = mulByInt piDividedBy4 4 > in show m ++ "." ++ decimalString x \section{Trigonometric function 1/2 arcsin(x/2)/2} > marcsin :: I -> I > > marcsin x = bigMid(series x 1) > where x2 = sqr x > series y n = zero : divByInt y n : > series (tMulByInt (divByInt (mul x2 y) > (n+1)) n) (n+2) \section{Logarithmic function ln(1+x/2)/x} When x = 0 we get the limit, namely 1/2. > mlni :: I -> I > > mlni x = bigMid(series one 1) > where x2 = compl x > series y n = divByInt y n : series (mul x2 y) (n+1) \section{Truncated logarithmic function ln(1+x/2)} > mln :: I -> I > mln x = mul x (mlni x) \section{Division} The inverse function 1 / (2 - x) using power series: > inv :: I -> I > inv x = bigMid(series one) > where series y = y : series (mul x y) \section{Affine maps} The expression (affine a b), defined below, gives the unique affine map f(x)=px+q with f(-1)=a, f( 1)=b. That is, with p = (b-a)/2, q = (b+a)/2. Notice that f(0) = (a+b)/2 and more generally f((x+y)/2) =(f(x) + f(y)) /2, and even more generally f(M_i x_i) = M_i f(x_i). That is, f is a (big) midpoint homomorphism. See Escardo and Simpson (2001). > affine :: I -> I -> I -> I > affine a b x = bigMid(map h x) > where h (-1) = a > h 0 = mid a b > h 1 = b Notice some particular cases. Multiplication is given by > mul_version3 :: I -> I -> I > mul_version3 y = affine (compl y) y Complement is given by > complAffine :: I -> I > complAffine = affine one minusOne The identity function is given by > idAffine :: I -> I > idAffine = affine minusOne one \section{Quantification over the unit interval} This is essentially Berger's 1990 algorithm. We exploit that every number in I=[-1,1] can be represented using digits -1 and 1 only. Hence we only check such sequences of digits. > findI :: (I -> Bool) -> I > findI p = if p left then left else right > where left = -1 : findI(\x -> p(-1:x)) > right = 1 : findI(\x -> p( 1:x)) > forEveryI, forSomeI :: (I -> Bool) -> Bool > forSomeI p = p(findI p) > forEveryI p = not(forSomeI(not.p)) Several applications are given in the following sections. But we continue a bit in this section, because one application towards the end of this file (program verification), requires checking all sequences, not just the ones singled out above. > findI' :: (I -> Bool) -> I > findI' p | p(left) = left > | p(centre) = centre > | otherwise = right > where left = -1 : findI'(\x -> p(-1:x)) > centre = 0 : findI'(\x -> p( 0:x)) > right = 1 : findI'(\x -> p( 1:x)) > forEveryI', forSomeI' :: (I -> Bool) -> Bool > forSomeI' p = p(findI' p) > forEveryI' p = not(forSomeI'(not.p)) \section{Modulus of continuity} Now the modulus of continuity. How many correct digits of the input are needed to correctly compute the output with a given precision? This algorithm is due to Berger (1990). > modulus :: (I -> I) -> (Int -> Int) > modulus f 0 = 0 > modulus f n = if forEveryI(\x -> head(f x) == head (f zero)) > then modulus (tail.f) (n-1) > else 1 + max (modulus (f.((-1):)) n) (modulus (f.(1:)) n) This is very close to the lookahead complexity of the function (also known as its intensional modulus of continuity (see e.g. Simpson (1998)). How many digits of x are needed to get two correct digits of x^2? > example11 = modulus sqr 2 The answer is 5. \section{Supremum} We use Simpson's (1998) algorithm. > supremum :: (I -> I) -> I > supremum f = > let h = head(f zero) > in if forEveryI(\x -> head(f x) == h) > then h : supremum(tail.f) > else imax (supremum(f.((-1):))) (supremum(f.(1:))) Similarly, > infimum :: (I -> I) -> I > infimum f = > let h = head(f zero) > in if forEveryI(\x -> head(f x) == h) > then h : infimum(tail.f) > else imin (infimum(f.((-1):))) (infimum(f.(1:))) But we need to define imax and imin. This has to be done so that their lookahead complexities are 1. Otherwise the algorithms diverge. > example12 = supremum (\x -> mid (0:x) (sqr x)) \section{Maximum, minimum and absolute value} We can't define max(x,y) = if x < y then y else x, because the less-than relation is undecidable (violation of continuity, need for crystal balls). > imin :: I -> I -> I > imin ( a : x) ( b : y) | a == b = a : imin x y > imin (-1 : x) ( 1 : y) = -1 : x > imin ( 1 : x) (-1 : y) = -1 : y > imin (-1 : x) ( 0 : y) = -1 : imin x (oneMinus y) > imin ( 0 : x) (-1 : y) = -1 : imin (oneMinus x) y > imin ( 1 : x) ( 0 : y) = 0 : imin (addOne x) y > imin ( 0 : x) ( 1 : y) = 0 : imin x (addOne y) The maximum function by "cheating" > imax :: I -> I -> I > imax x y = compl (imin (compl x) (compl y)) The absolute-value function > iabs :: I -> I > iabs x = imax (compl x) x \section{Integration} We use Alex Simpson's (1998) algorithm, but changing his intermediate representation. We represent a number x in [-1,1] by a sequence a in [I] such that x = bigMid a. What we need is any representation where average can be computed with lookahead 1 (and that can be converted back to our chosen representation). > average :: [I] -> [I] -> [I] > average = zipWith mid Compute integral of f from -1 to 1 divided by 2, first using the above representation for the output: > halfIntegral' :: (I -> I) -> [I] > > halfIntegral' f = > let h = head(f zero) > in if forEveryI(\x -> head(f x) == h) > then (repeat h) : halfIntegral'(tail.f) > else average (halfIntegral'(f.((-1):))) (halfIntegral'(f.(1:))) Now we go back to our chosen representation: > halfIntegral :: (I -> I) -> I > halfIntegral f = bigMid(halfIntegral' f) Let's integrate the absolute value function: > example10 = halfIntegral iabs \section{Zero normalization} The number zero has countably many distinct representations. The following procedure "normalizes towards zero": The sequence y = znorm x has the following properties: 1. y denotes the same number as x, and y = norm y. 2. If x denotes zero, then y = repeat 0 3. If x doesn't denote zero, then we can read off the the sign of the denoted number by looking at the the first non-zero digit of y. We exploit the rewrite rules (-1)1 |-> 0(-1) 1(-1) |-> 0( 1), which preserve denotation: > znorm :: I -> I > znorm (0:x) = 0:znorm x > znorm (-1: 1:x) = 0:znorm (-1:x) > znorm ( 1: -1:x) = 0:znorm ( 1:x) > znorm x = x See also my 1998 paper "Effective and sequential definition by cases on the reals via infinite signed-digit numerals". The following gives True if x < 0, False if x > 0, and bottom if x = 0. > negative :: I -> Bool > negative x = f(znorm x) > where f( 0 : x) = f x > f(-1 : x) = True > f( 1 : x) = False This is not used anywhere else: > smaller :: I -> I -> Bool > smaller x y = negative(mid x (compl y)) \section{Roots} The following works only if f is strictly increasing, f(-1) < f(1), and the unique root of f is not dyadic (of the form m/2^n with m,n integer). In other case it usually diverges. > bisection :: (I -> I) -> I > bisection f = > if negative(f zero) > then 1 : bisection(f.( 1:)) > else -1 : bisection(f.(-1:)) Divergence can be avoided using trisection (an old algorithm in constructive mathematics with a tendency to be rediscovered again and again). Assuming either x < 0 or 0 < y, we tell which one holds, where True is for x and False for y. Will diverge if x=y=0. This is a particular case of the trichotomy law x < a or a < y, where a=0. > ztrichot :: I -> I -> Bool > ztrichot x y = f (znorm x) (znorm y) > where f (0 : x) (0 : y) = f x y > f (-1 : x) y = True > f x y = False Assume that f is strictly increasing and that f(-1) < f(1). > trisection :: (I -> I) -> I > trisection f = > let l = f minusHalf > c = f zero > r = f half > in if (ztrichot c r) > then 1 : trisection(f.(1:)) -- f(0) < 0, so root in [0,1] > else if (ztrichot l c) -- 0 < f(1/2), so root in [-1, 1/2] > then 0: trisection(f.( 0:)) -- f(-1/2) < 0,so root in [-1/2,1/2] > else -1: trisection(f.(-1:)) -- 0 < f(0), so root in [-1,0] A bijection I->I is therefore invertible: > inverse :: (I -> I) -> (I -> I) > inverse f y = trisection(\x -> mid (f x) (compl y)) The function x^2 is not a bijection, but it is a bijection on [0,1], and the above can be applied. But don't use it for negative numbers, unless you want to get unproductive loops. > squareRoot :: I -> I > squareRoot = inverse sqr > example6 = toDouble(squareRoot half) Perhaps counter-intuitively, the closer we get the root, the algorithm takes longer to produce the next digit. A related example is this: > example7 = trisection f > where tiny n = if n == 0 then one else 0 : tiny(n-1) > epsilon = tiny (10^3) > f = affine (compl epsilon) epsilon If we want a root of f in a given interval [a,b], we find the unique affine map g such that g(-1)=a and g(1)=b. We then solve h(x)=0, where h(x)=f(g(x)), and the root of f is g(x). > trisectionInterval :: I -> I -> (I -> I) -> I > trisectionInterval a b f = g(trisection (f.g)) > where g = affine a b The particular case where a=0 and b=1 is often useful, and in this case g(x) = 1 : x. > trisection01 :: (I -> I) -> I > trisection01 f = 1 : trisection(f.(1:)) In fact, this is what is used in the above recursive definition of trisection! \section{Definition by cases} As mentioned above, (in)equality of real numbers is not decible (continuity fails, crystal balls would be needed). However this is not as bad as it seems at a first sight. The following is a partial function defined on pairs (x,y) such that if x=0 then y=0, that is, undefined on (0,y) with y /= 0. The classical mathematical definition is zppif x y = if x < 0 then 0 else y. A constructive definition is > zppif :: I -> I -> I > zppif x y = c (znorm x) (znorm y) > where c (0:x) (0:y) = 0 : c x y > c (0:x) y = c x y > c (-1:x) _ = zero > c ( 1:x) y = y If x=0 and y/=0 is close to zero, the answer is a partial number close to zero. Although this function is partial, it can be used to define total functions, e.g. f(x) = zppif x x = if x < 0 then 0 else x. Application: we can define (ppif x < y then u else v) = u + if (x-y) < 0 then 0 else v-u. Except that + and - are not available. Hence we need a little bit more work. > ppif :: I -> I -> I -> I -> I > ppif x y u v = > mulBy4(mid (0:u) (zppif (mid x (compl y)) (mid (compl u) v))) We have that ppif x y u v = u if x < y, ppif x y u v = v if x > y, ppif x y w w = w. Actually the last equation doesn't need the last arguments to be equal: it is enough they represent the same number, and the result will be (a third) representation of that number. Sorry for being sloppy. I called this the "pseudo-parallel conditional" in a 1998 publication. A particular case of interest is y = 0, for which we get a more efficient definition: > ppifz :: I -> I -> I -> I > ppifz x u v = mulBy4(mid (0:u) (zppif (0:x) (mid (compl u) v))) These partial functions can be used to define total functions. Examples: absolute value, maximum. > pabs :: I -> I > pabs x = ppifz x (compl x) x > pmax :: I -> I -> I > pmax x y = ppif x y y x \section{Finding bugs automatically} Time to do some automatic program verification. Some years ago I manually introduced a bug in the multiplication algorithm, and now truly I don't remember what it was. But we are going to be able to see that there is indeed a bug, and compute the offending input. > buggyMul (0:x) y = 0 : buggyMul x y > buggyMul x (0:y) = 0 : buggyMul x y > buggyMul (a0 : 0 : x) (b0 : 0 : y) = mid p q > where p = 0 : mid (digitMul b0 x) (digitMul a0 y) > q = (a0*b0) : 0 : 0 : buggyMul x y > buggyMul (a0 : 0 : x) (b0 : 1 : y) = mid p q > where p = mid p' p'' > p' = 0 : 0 : x > p''= mid (digitMul b0 x) (digitMul a0 y) > q = (a0*b0) : 0 : 0 : buggyMul x y > buggyMul (a0 : 0 : x) (b0 : b1 : y) = mid p q > where p = mid p' p'' > p' = (a0*b1): 0 : digitMul b1 x > p''= mid (digitMul b0 x) (digitMul a0 y) > q = (a0*b0) : 0 : 0 : buggyMul x y > buggyMul (a0 : a1 : x) (b0 : 0 : y) = mid p q > where p = mid p' p'' > p' = 0 : 0 : digitMul a1 y > p''= mid (digitMul b0 x) (digitMul a0 y) > q = (a0*b0) : (a1*b0) : 0 : buggyMul x y > buggyMul (a0 : a1 : x) (b0 : b1 : y) = mid p q > where p = mid p' p'' > p' = (a0*b1): mid (digitMul b1 x) (digitMul a1 y) > p''= mid (digitMul b0 x) (digitMul a0 y) > q = (a0*b0) : (a1*b0) : (a1*a1) : buggyMul x y How do we know it is wrong, and what is the mistake? We first check when two numbers x and y are close up to precision n. > close :: Int -> I -> I -> Bool > close n x y = closez n (mid x (compl y)) This uses closeness to zero: > closez :: Int -> I -> Bool > closez n x = all (==0) (take n (znorm x)) To catch all bugs, we should use forEveryI'. But the non-primed version is faster, and any bug caught by it will be a bug: > example8 = forEveryI(\x -> forEveryI(\y -> close 4 (mul x y) (buggyMul x y))) This evaluates to False (and to True for closeness smaller than 4). We now find a counter-example: > example9 = (take 5 x, take 5 y) > where x = findI(\x -> forSomeI(\y -> not(close 4 (mul x y) (buggyMul x y)))) > y = findI(\y -> not(close 4 (mul x y) (buggyMul x y))) This gives ([-1,-1,-1,-1,-1],[-1,1,-1,-1,-1]) as an example for which buggyMul has a bug (assuming mul is correct). These two examples run in under a second each, compiling this file as \$ ghc -O2 --make fun2011.lhs This is a kind of model-free model checking. \section{General real numbers} General real numbers can be represented by a mantissa in I and an integer exponent: > type R = (I,Integer) Using the above development for I, it is now fairly easy to implement operations on R. But laborious too, and hence I stop here. \section{Apologies} Many people around the world have produced (better and faster) implementations of real numbers, using a variety of real number representations (and programming languages). They include, in random order, Norbert Muller, Branimir Lambov, Andrej Bauer, Russel O'Connor, Valery Menissier-Morain, Pietro di Gianantonio, Boehm & Cartwright, Peter Potts, David Lester, Vuilimin, and lots of people whose names don't come immediately to my mind as I write this sentence. \end{document}