Damjan was an NMSS student in 2000 and 2001, a tutor from 2003 to 2005 and a lecturer in 2013. He works as a statistician who helps geneticists overwhelmed by their data.
Look at a list of country population sizes, or the frequency of words in today’s newspaper. If you take just the first digit, what do you think you’ll see? You may expect to see all the digits from 1 to 9 appear roughly equally, but, remarkably, this is not what happens. In 1881, Simon Newcombe noticed that in books of logarithm tables, the pages at the front (which were used for calculating with ‘small’ numbers) wore out much more quickly than those at the back. He hypothesised that in any set of data, numbers will tend to begin with 1 more often than any other digit, followed by 2, 3, and so on, with 9 being the least common.
In 1938, Frank Benford re-discovered this fact and collected a large and diverse set of data to verify it. In all cases, the distributions of first digits is the same (see figure). This means that 1 occurs about 30% of the time, while 9 is seen less than 5% of the time. Benford’s efforts ultimately accorded him naming rights and this is now known as Benford’s Law. Although Benford demonstrated its existence, it was far from clear why it occurs with such uncanny regularity. In 1961, Roger Pinkham made a crucial insight. Suppose you measured the distances travelled by NMSS students to Canberra and look at the first digit. What unit did you use? Should it make a difference? If there was a natural distribution of first digits, you wouldn’t expect it to depend on an arbitrary choice like a measurement scale. Pinkham showed that there is only one distribution for which this is true: Benford’s Law.
Imagine starting a large number of bank accounts with different amounts of money. Start with the first digits of the bank balances evenly distributed. Now, keep the money in the bank and earn interest until the money exactly doubles. Every account that previously started with a digit 5 or higher will now start with 1. More than half of the accounts now have amounts starting with a 1. Over time, the distribution will gradually converge to Benford’s Law. Is there a secret to Benford’s Law? The scale invariance property is its crucial property, but there is an even easier way to understand it. If you take the logarithm of numbers whose distributions follow Benford’s Law, you’ll notice that now the first digits are equally representative of all numbers from 1 to 9. The uniform distribution was there after all, we just had to look in the right place!
There are many situations when Benford’s magic does not apply: for numbers that are systematically generated (like phone numbers), which have a constrained range (like race times for a 200 m sprint), which can be negative (like temperatures) or are designed with a specific distribution (like lottery numbers). For nearly everything else, the data are like moths to Benford’s flame!