In any country if you rank all the cities by population, these populations will follow the same pattern worldwide. The law is so strong that you could use this as a geeky party trick: give me the population of the largest city in a country and I will tell you the population of the 50th largest. It is a remarkable feature of cities that their populations follow not only follow a power law, but that the distribution is identical across the planet.
Power laws can be seen all over nature and society: George Kingsley Zipf famously showed that word frequency has a power law distribution. Remarkably this same distribution occurs everywhere: business sizes, income distribution and website rankings are among innumerable examples of this distribution. A famous example of a power law is The Pareto principle, (governing the 80:20 rule) named after Italian engineer turned economist Vilfredo Pareto. Power Laws appear to describe many attributes of cities too, such as GDP, length of roads, railway stations – although the relationship is usually different from that of populations. Interestingly power laws are self-similar: the distribution looks similar however much you zoom in or out – this is called scale invariance. To put it simply: take the area of a square. Everyone knows the area of a square is the length of its side to the power of two. No matter what the length of the side of the square the area will always be its length squared – no matter how big or small the square is. And it will always have the same shape properties. The function describing the relationship between the length of a square and its area is probably the simplest power law in existence. In the case of Zipf’s law for cities we see that in most economies the rank of the city is proportional to its size: i.e. if the first city has a population of 10m we would then expect to see the second city with a population of 5m and the third with 3.3m etc. It is quite reamrkable that this law seems to hold everywhere: the function is usually close to P(r) =c 1/r where P is population, r is rank and c is the largest city.
Suffolk settlements obey Zipf’s Law
Before thinking further about the implications of the law, I thought it would be interesting test Zipf’s law for settlements in Suffolk and in the process learn about its application and challenges. Suffolk is my home county so its a region I know pretty well. This is useful because my in-depth knowledge of Suffolk will aid interpretation of the rule (in absence of more sophisticated analysis – I’ll come back to this). It also has a small number of towns so the data illustration will be easy.
The first step is to gather the populations of towns in Suffolk. I have gone for the 2011 census – a reliable source of population data. Already though we encounter a problem: do the administrative boundaries of Suffolk towns reflect the natural boundaries of Suffolk towns? They do not exactly, and this might be a problem as we are interested in the natural phenomenon of settlement size.
Plotting the populations of the Suffolk towns versus their rank does indeed produce a power distribution with good fit (R squared 0.96). I’ve added a trend line, with a power law equation to demonstrate this.
For ease of interpretation I will migrate our chart to log-log axis. This just turns the same data into a straight line, which is easier to interpret (for what its worth I used natural log, but it doesn’t make any difference).

So already without much effort we have a pretty good power law fit. In this case I took the natural log of both the size and the rank, but you could just as easily set the axis to be log. It doesn’t change the data, just the presentation.
One interesting feature of the Suffolk system is that the coefficient on the log-log curve is somewhat greater than one. In most systems of cities the coefficient should be one i.e. P(r) = r^-1 = 1/r which is equal to log(p)=-log(r) appears to be driven by the large sizes of Ipswich, Bury, Lowestoft etc compared with the remainder of settlements. The truth is that Zipf’s law doen’t hold for very small locations – I’ll return to this.
Defining settlements is challenging
The Suffolk fit is good, but can be improved by challenging the way the settlements have had their populations defined. There are two key problems:
1) Boundaries do not match urban limits, potentially under-representing populations. At the same time some settlements are in close proximity and difficult to separate e.g. Kesgrave and Ipswich.
2) The boundary of Suffolk is partly arbitrary but is partly defined by Stour along the South and Rive Waverley along the north. The reason I say only partly arbitrary is that these rivers may have had a historical influence on the development of settlement hierarchy in Suffolk. In this sense the administrative boundary might yield better results than say a 50km circular catchment around Ipswich.
So for (1) how do we determine where a city ends? One method, developed is to use population density cutoffs. In essence imagine walking in a direction from the centre of a town until you reach a defined population density. Once you reach a parcel of land that is below a given population density, you stop counting that land as part of the city. An example is the City Clustering Algorithm presented by Rozenfeld et al. . Another approach might be to use satellite data, or travel to work data.
In lieu of such brainpower, I will rely on my many years travelling by bus around Suffolk and simply use judgement, and see if my judgements improve the relationship.
As for (2) this is a really interesting point of Zipf’s law – it only works when you capture the entire regional economy starting from the apex. So for example, Zipf’s law won’t work for the UK if you don’t include London, and Suffolk doesn’t work without Ipswich. Similarly if you include places closer to Norwich, then Norwich should be included. Once Norwich is included we no longer have an apex city as both Ipswich and Norwich are regional centres of similar order. You would then have to include the entire UK system up to London. This need for at a complete coherent economic system has been explored in some detail by Cristelli, Batty and Pietronero in an article published in Nature. I also like how Jiang et al. use the Koch curve as an analogy. This simple fractal is only self similar when you define the repeating pattern correctly.

On different scale – Chinese cities
Having convinced myself that I can show the law for Suffolk I’ve now had a stab at China. This represents an entirely different scale as the population of China stood at 1,381,000,000, last time I counted, compared to 730,100 in Suffolk.
This time the data comes from Wikipedia. We can see that again the power law is followed. Plotting logs makes the trend easier to view and reveals a tail at the low population end, which is characteristic of city distributions. We can discount this tail to improve the fit, but it would be interesting to understand what causes it. I thought this is simply as at the bottom end of the scale a lot of data points become missing as they are not counted. The tail would therefore be an artefact of the data in this case. The truth is that the law tends not to hold for lower populations. In the Suffolk data set I assume this is a problem, but for these Chinese cities, the smallest one in the list is Lijiang with 155k residents.
So it might be useful to exclude the bottom ranked settlements.
There are further interesting factors in this data. The largest cities in the data set (Shanghai, Beijing, Guangzhou, Chongqing and Chengdu) appear to be smaller than would be expected from the rest of the data. Using metropolitan area estimates seems to improve this, but relies upon some vast urban agglomerations. The Guangzhou metropolitan area, extending around the Pearl River Delta, would be the largest city in the world by this measure. Its 44 million population similar to Spain or Argentina.
In this chart the slope is slightly lower than would be expected. I do not know enough about the way China counts its populations to speculate on why this might be.
Zipf’s law seems to hold most perfectly in the USA. It has been suggested by Krugman and others that primate cities such as London, Paris and Bangkok might be slightly too large for their systems on due to historic concentrations of political power.
Other city attributes scale with city size, most will scale differently to population. For example, we know that GDP and pollution increase at a greater rate than population for bigger cities. The effects of transport congestion and pollution as cities get larger is of particular concern. I will endeavour to return to this in a later post.
REFERENCES
- Rozenfeld, H. et al (2011) The Area and Population of Cities: New Insights from a Different Perspective on Cities. American Economic Review 101 (August 2011): 2205–2225. http://pages.stern.nyu.edu/~xgabaix/papers/zipfCCA-new.pdf
- Cristelli, M. et al (2012) There is More than a Power Law in Zipf. Nature. http://www.nature.com/articles/srep00812
- Jiang, B et al (2013) Zipf’s Law for All the Natural Cities around the World. https://arxiv.org/ftp/arxiv/papers/1402/1402.2965.pdf
- Batty, M (2013) The New Science of Cities, MIT Press
The relatively flattened regression line might be explained by the strict household registration system in China, especially in the more developed metropolis. Despite the pollution/congestion/other negative externalities that Chinese cities have to bear, one of the books I come across recently pointed out that the level of urban agglomeration in Chinese cities is actually lower than that of its European/American counterparts as a result of this system.
Thank you Xin, I didn’t know about that but sounds like a plausible explanation. This rather heavy going paper seems to agree with the book you’ve read http://www.brown.edu/Departments/Economics/Faculty/henderson/papers/China402.pdf Its quite an old paper now though (2004).
Hello