A particular sentence in Luke’s post the other day made me chuckle – in a good way, of course. Re-reading it today produced the same effect, so I’ve decided to pursue it. The sentence is this:
“I think Hindu-Arabic Numerals make it too easy to write down numbers we don’t or can’t comprehend…”
Curse them. They’re so… efficient. Logarithmically so: let n be a positive integer we are going to transcribe. The number of digits required is D = 1 + floor( log10 n ). This is one way to represent the difficulty of writing a number out: let’s call it the representation cost function. That floor function just makes it even easier, so we can say that it’s at worst an O(log n) process. Any good numericist would rejoice at such compactness, and if the aim is to successfully notate abstract quantities that are beyond our experience, e.g. a billion, then it’s just the ticket. But just for today we are going to pretend that we have the opposite problem. We might believe that the ease with which overwhelmingly large numbers can be notated has led to complacency: AIDS has killed over 25 000 000 people since 1981. Doesn’t look like so many when you write it down, does it?
To make it more difficult then, let the process be O(n), i.e. linear. The simple D = n provides a characteristic case. This digit system has one numeral: 1. Two would be written 11, three 111, and so on. In one sense, this is just perfect: to write down the number formerly known as 100, we need one hundred digits. The notator can’t be allowed to wimp out and say they can ‘visualise’ this number; no, they have to write down as many separate objects as they are representing. No abstraction here, that’s for sure. Compared to the miserly Hindu-Arabic system, though, is it excessive?
Let’s take a second look at what just happened. Instead of using the decimal (base 10) system, we went straight to the unary (base 1) system. What about those numbers in between? Most of us who have used computers will have a passing acquaintance with the binary (base 2) system, which has two numerals: 0 and 1. The representation cost function is now D(n) = 1 + floor ( log2 n ). Perhaps you have spotted the rule already. A base N system will have N digits and have a rcf equal to DN(n) = 1 + floor( logN n ). All such system are logarithmic, unless N = 1, where the formula breaks down – but we have dealt with that case above. Using a system like this (excluding N = 1) is qualitatively the same as the Hindu-Arabic system, just scaled by a factor of log10 N inside the floor function. Let’s call them scaled Hindu-Arabic systems, to emphasise how different they really aren’t.
So we’ve only met one system so far that’s definitely not Hindu-Arabic: the case N = 1. You might think we could try to make things even worse, e.g. O(n2) or, better yet, O(en). But in fact it isn’t possible to devise a digit system that scales in either of these fashions. Why? Because just to achieve O(n) (in)efficiency, we are already down to one numeral. Anything less efficient requires less numerals than this, i.e. none. The reader may like to try to devise a representation system cut from this cloth, but it can’t be numeral-based in the traditional way (hint: it needs to be probabilistic).
We should turn to a different problem then: representing things other than counting numbers. The rationals should be the next stop, as recurring and terminating decimals are encountered in everyday life. These can be represented O(n) in the same way as before, with two tricks. First, a decimal point is needed to separate a part from the wholes. Then, reverse the decimal part and write out that number. So, two and a half would be 11.11111 (2.5). Two point zero one would be 11.1111111111 (2.01, where the part after the decimal point has been reversed). Do you see? For more precision, substantially more effort is required, though I can’t write out a representation cost function quite so neatly this time.
And that’s it – a fully-fledged number system that is less efficient than the Hindu-Arabic system and yet not completely useless. What’s that you say? Irrational numbers? Here is where I get off the boat – irrational numbers cannot be represented using only numerals. To visualise them, geometry is required: for π I see a circle with its diameter; for the square root of an integer n, the diagonal of the unit n-cube. Negative numbers get short shrift too – being the absence of positively many things. And if you can really comprehend that, then it’s not me who should be writing this post.
[Update: While the above is novel to me, it is not even close to such in the sense of collective knowledge. I have just now discovered: i) the wikipedia article on the Unary number system; ii) a more general discussion of non-standard positional numeral systems; iii) the existence of an article by Knuth on a numeral system with complex base 2i, written when he was 17. I apologise to Donald for not generalising the above discussion completely, in the full knowledge that he will never know of this transgression.]
Working in the other direction, you may find arrow notation interesting; it allows for compact formulation of truly staggering numbers (as often encountered in number theory or combinatorics). I suspect it’d be very difficult to fashion anything resembling a base system out of them, though.
I am not quite clear what you require of your representation system. If you just want some arbitrary way of encoding natural numbers, then surely more ineffecient number systems exist. Consider representing each natural number in the decimal system, but each digit repeated n times. So, 25 becomes twenty-five twos followed by twenty-five fives. Then length of the encoding now is \bigOh(n log n). Which is worse than the unary system, and exponentially worse than all base k systems for k>1. Also, this seems like a valid encoding, because whatever you can do in your unary system, you can do in this system also, albeit with a logarithmic slowdown. This shameless padding of digits seems farcical, but is valid, and proves quite useful in computer science, where it leads to some inclusion relations between complexity classes. Obviously, one can imagine more pathological paddings.