Infrastructure access and population growth · Dan Bogart , Xuesheng You y, Eduard Alvarez z, Max...
Embed Size (px)
Transcript of Infrastructure access and population growth · Dan Bogart , Xuesheng You y, Eduard Alvarez z, Max...
Infrastructure access and population growth:evidence from nineteenth century England and Wales
Dan Bogart∗, Xuesheng You†, Eduard Alvarez‡, Max Satchell, and Leigh Shaw-Taylor¶
Draft, September 26 , 2018‖
The English and Welsh economy underwent a remarkable urbanization in the nine-teenth century. We examine how infrastructure access contributed to long-run popu-lation change and interacted with local characteristics. Unlike prior works we examinethe e�ects of being close to several infrastructures, including railway stations, turnpikeroads, inland waterways, and ports. We also develop a least cost path instrument toaddress the endogenous placement of railway stations. Our estimates show that beingclose to railway stations and other infrastructures signi�cantly increased populationgrowth from 1841 to 1891. We also show the e�ects of infrastructures were larger formore densely populated localities in 1841 and those with a lower share of male agri-cultural employment. The results are consistent with economic geography and trademodels that emphasize increases in agglomeration with lower transport costs. Theyare also highlight the long-term e�ects of infrastructure. To illustrate we show thatlocalities close to infrastructures in the 1840s and 1850s have higher population densityin 2011.
Keywords: Economic growth, railways, transport, spatial reorganizationJEL Codes: N4, O18, R11
∗Corresponding author. Associate Professor, Department of Economics, UC Irvine, [email protected]†Research Associate, Faculty of History, University of Cambridge, [email protected]‡Senior Lecturer, Economics and Business, Universitat Oberta de Catalunya, [email protected] Associate, Dept. of Geography, University of Cambridge, [email protected]¶Senior Lecturer, Faculty of History, University of Cambridge, [email protected]‖Data for this paper was created thanks to grants from the Leverhulme Trust (RPG-2013-093), Transport
and Urbanization c.1670-1911, NSF (SES-1260699), Modelling the Transport Revolution and the IndustrialRevolution in England, the ESRC (ES 000-23-0131), Male Occupational Change and Economic Growth inEngland 1750 to 1851, and ESRC (RES-000-23-1579) the Occupational Structure of Nineteenth CenturyBritain: Grant. We thank Walker Hanlon, Gary Richardson, Petra Moser, Kara Dimitruk, Arthi Vellore,Alan Rosevear, and Elisabet Viladecans Marsal for comments on earlier drafts and seminar participants atUC Irvine, UC San Diego, NYU, Florida State, Trinity College Dublin, Queens Belfast, the University ofLos Andes, and EHA Meetings.
Over the last 50 years there has been a dramatic rise in urbanization. According to the
United Nations, the share of the world's population living in urban areas increased from 30%
in 1950 to 54% in 2014. Much of this urban growth has occurred in developing countries
and beyond the metropolitan areas, which act as the main labor market but not necessary
the main place of residence (Demographia World Urban Areas 2018).
Currently developed economies underwent a similar process of urbanization in their past.
For example, in England and Wales the urban percentage (people in towns of 5000 or more)
rose from 30% in 1801 to 57% in 1871 (Shaw-Taylor and Wrigley 2014). London accounted
for some urban growth with its percentage of the national population increasing from 11%
in 1801 to 15% in 1871. Much of remaining urban growth occurred near the northern
industrial towns like Manchester, Leeds, Nottingham, and Birmingham. These places were
already densely populated in 1801 but they became more dense by centuries end.
History o�ers a unique perspective to probe the drivers of urbanization. This paper ex-
amines the role of transport infrastructure in creating local di�erences in population growth
during the nineteenth century and up to the present. The building of railways is perhaps
the most famous transport improvement. Railways rapidly spread through the English and
Welsh (EW) economy in the 1830s, 40s, and 50s. They lowered inland transport costs and
expanded trade and travel between regions. The following fact illustrates the impact of rail-
ways. In 1850, 98.4% of coal imported into London came by sea and 1.6% by rail. By 1870
the majority of coal into London (55.7%) came by rail.1 While railways were revolutionary,
they were not the �rst major infrastructure improvement in EW. Between 1750 and 1830
there were substantial investments in roads, ports, and canals. They developed urban areas
before railways and provided transport alternatives during the railway era.
We address the following questions: (1) how did the placement of infrastructures within
1These �gures are reported in Hawke (1970, p. 168).
30 minute walking distance of a locality change its population growth relative to other
localities which shared similar geographic and economic characteristics but were farther
away from infrastructures? (2) How did a locality's initial characteristics change the e�ects
of nearby infrastructures. In order to answer these questions, we introduce a new data set.
First, we create new continuous units which allow for dis-aggregated analyses of population
in every census year from 1801 to 1891 and male occupational structure in 1851 and 1881.
We observe population and occupational data for 9489 units. They are slightly larger than
parishes and townships, but signi�cantly smaller than 625 registration districts which are
often studied in the literature. Second, we incorporate new GIS data on railways lines and
stations, turnpike roads, inland waterways, and ports. In particular we able to measure the
distance between units and each infrastructure type across decades. Third, we add new data
on geographic characteristics, like the location of exposed coal �elds, and add existing data
on elevation, soil types, rainfall, and temperature.
The new data set is used to estimate a long-di�erences speci�cation, where population
growth from 1841 to 1891 is regressed on indicators for being within 2 km of infrastructure
(rail, roads, waterways, and ports) plus geographic and structural controls. Importantly our
preferred model also includes �xed e�ects for the registration district. That implies we are
controlling for unobservable factors common to all units in a district (about 250 square km).
While our baseline model includes a rich set of controls, we are still concerned that proximity
to railways is correlated with unobservables contributing to population growth.2 We address
endogeneity in part by employing propensity score matching techniques. More directly,
we construct a Least Cost Path (LCP) connecting large towns in 1801 and incorporating
the added costs of building railways over sloping terrain. Distance to the LCP is a good
instrument for distance to stations because it identi�es units that were close to stations
mainly because they were near favorable routes for connecting large towns.3
2Turnpike roads, inland waterways, and port infrastructures were mostly built in the decades before 1841and so we are less concerned about their correlation with unobservables causing growth from 1841 to 1891.
3Our methodology draws on the so-called inconsequential place approach and other studies which least
We document several new �ndings. First our baseline model shows that being within
2 km of a railway station in 1851 increased population growth from 1841 to 1891 by 16
log points. The matching and IV estimates suggest a larger e�ect and so this estimate
appears to be a lower bound. Combining the growth estimate with the rate of convergence
implies that units within 2 km of stations had 115 log points higher population levels in
1891 compared to units that were farther away.
Second, we show that access to pre-rail infrastructures also contributed to population
growth. Speci�cally being within 2 km of inland waterways and turnpikes increased pop-
ulation growth between 1841 and 1891 by 6 and 4 percentage points respectively. That
translates into population levels that were 40 and 28 log points higher in 1891.
Third, we show the e�ects of railways and other infrastructures were larger for more
densely populated units in 1841 and units that had a lower share of male agricultural em-
ployment. The latter results are consistent with New Economic Geography (NEG) models
that emphasize increases in agglomeration with lower transport costs. They are also consis-
tent with the literature on nineteenth century globalization, which emphasizes EW's shift
out of agriculture and into manufacturing as trade costs fell due to railways and steamships
(O'Rourke and Williamson 2002, Pascali 2016).
Our results contribute to the literature studying railways and growth in England and
Wales.4 Previous studies have documented a correlation between proximity to railways and
local population growth, but none has addressed endogeneity and confounding factors in
a systematic manner.5 We introduce new data and estimation techniques, including an
instrumental variable for the location of railway lines. We also add to a broader literature
on English industrialization. A wide range of factors are discussed like endowments, access
cost paths as instruments for infrastructure. See Chandra and Thompson (2000), Michaels (2008), Faber(2014), and Lipscombe et. al. (2013).
4For an introduction see Hawke (1974), Simmons (1986), and Aldcroft and Freeman (1991).5See Gregory and Marti Henneberg (2010), Casson (2013), and Alvarez et. al. (2013) for existing studies
examining spatial impacts of railways in England and Wales.
to markets, human capital, and institutions.6 By building an extensive new data set we are
able to show that infrastructures had signi�cant e�ects even after controlling for endowments
Our paper is also novel by incorporating pre-rail infrastructures. To our knowledge no
study on railways and nineteenth century growth analyzes pre-rail infrastructures in such
a detailed way.7 Why are pre-rail infrastructures important? We show that the e�ects
of railways are over-stated when they are absent in the analysis. Also including pre-rail
infrastructures as controls is crucial to identifying the e�ects of railways.
This paper also complements existing studies which analyze the network e�ects of trans-
port infrastructure through market access.8 While network e�ects are omitted in this study,
it should be stressed that local e�ects were quite relevant in the nineteenth century as towns
lobbied extensively for and against having railway stations and canals nearby. The local
e�ects are also relevant in the twenty-�rst century as the placement of new infrastructure
projects alters local population geography.
Finally our study contributes to the literature seeking to understand urbanization in
contemporary contexts.9 We provide evidence that infrastructure access is one of the key
drivers of long-run outcomes. As one indication we conclude with a `persistence' regression,
where population density in 2011 is regressed on proximity to nineteenth century infrastruc-
tures. We �nd that being within 2 km of mid-nineteenth century infrastructures is associated
with signi�cantly higher population levels in 2011 even after including a rich set of controls.
Investments in infrastructure in�uence population geography long into the future.
6See Crafts and Mulatu (2006), Fernihough and Hjortshøj O'Rourke (2014), Crafts and Wolf (2014),Klein and Crafts (2012), Becker, Hornung, and Woessmann (2011), Heblich and Trew (2017).
7See Donaldson (2014) for India, Gutberlet (2014) for Germany, Berger and En�o (2015) for Sweden,Herranz-Loncán (2006) for Spain, Tang (2014, 2017) for Japan, Hornung (2015) for Prussia, and Attack andMargo (2010, 2011) for the US.
8See Donaldson and Hornbeck (2016) and Jaworski and Kitchens (2017) for two examples.9See Redding and Turner (2014) for an overview. Some papers of related interest include Duranton and
Turner (2012), Faber (2014), Jedwab et. al. (2015), Storeygard (2016), and Baum-Snow et. al. (2017).
Figure 1: Evolution and size of infrastructure networks in England and Wales 1700-1890
Sources: see data section.
2 Background on infrastructure
England and Wales (EW) had a well developed transport network long before its railway
network grew. Figure 1 shows the length of turnpike road, inland waterway, and railway
networks from 1700 to 1890. In this background section, we discuss each of these networks
Turnpikes are gates that prevent road users from passing without paying a toll. In the
EW context, turnpike trusts were granted rights to �nance road improvements by levying
tolls. Their powers came from an act of parliament and they generally managed the main
roads. As �gure 1 shows the turnpike network grew mainly between 1750 and 1830. At
their peak in the mid-nineteenth century there were 38,000 km of turnpike road managed
by 1000 di�erent trusts. The tolls forced road-users to pay for the improvement and upkeep
of roads. But as it turned out, the bene�ts to road users from improved roads substantially
outweighed the burden of the tolls. Travel times and freight charges declined by over 40 per
cent between 1750 and 1800 (Bogart, 2005a, b).
One important feature of turnpike roads was their local promotion and �nancing. Landown-
ers and merchants lobbied parliament to authorize turnpike bills. They also provided the
�nancing and received no subsidies from the central government. Many had to choose their
turnpike routes carefully in order to avoid �nancial losses. Turnpike trusts faced severe
competition from railways starting in the 1840s. Most disbanded by the 1870s and 80s,
reducing the number of turnpike roads to near zero. Responsibility for maintaining their
roads passed to rural sanitation districts.
The inland waterway network developed more gradually between 1700 and 1830. Around
1700 EW had a large system of navigable rivers including the Thames, Severn, and Trent
(Willan, 1964). River navigations, or improved rivers which bypassed di�cult sections, were
added in the early 1700s. Canals or arti�cial waterways were built starting in the 1760s.
Like turnpike roads, they were promoted by local landowners and merchants. In one of
the most famous examples, the Duke of Bridgewater aimed to connect his coal mines at
Worsley with Manchester. The early canals like the Bridgewater were a �nancial success
and many joint stock canal companies were proposed in parliament in the 1790s. Many of
these completed their canals in the early 1800s.
In terms of location, there were several long distance canals linking important centers.
One example is the Leeds and Liverpool canal, which connected the woolen textile towns
around Leeds with the cotton textile towns around Manchester and the Atlantic port Liver-
pool. Another example is the Grand Junction canal, which greatly shortened the waterway
distance between London and Manchester and avoided narrow sections. It is worth em-
phasizing that canals brought low cost transport to inland regions. They were especially
important in the movement of bulky-low value goods, like coal. Consider the following fact:
shortly after the completion of the Bridgewater canal in 1761, the price of coal in Manchester
fell by half.
There was signi�cant investment in ports between 1760 and 1830. Much e�ort went
into building harbors and wet docks in London and Liverpool. But ports were developed
in other areas too. It is estimated there were 391 acres of wet dock space and 50 harbors
were being maintained in 1830. By contrast, England had no wet docks and a handful of
harbors in 1660 (Pope and Swann 1960). Port infrastructures complemented improvements
in shipping technology. One indication is the greater speeds achieved by sailing vessels in
the early 1800s (Solar 2013). Later came the steamships that would revolutionize trade and
travel across the oceans (Armstrong 2009, Pascali 2016).
The �rst steam powered rail service open to the public came in 1825 in the northern coal
mining region between Stockton and Darlington. In 1830, the Liverpool and Manchester
railway was opened to facilitate passenger tra�c. Several other railways connecting large
towns were promoted in the mid 1830s. The organization form was the same as canals.
Railways were built and operated by joint stock companies. The rail network expanded
dramatically following the `Railway Mania' of the mid-1840s. The signi�cance of the Mania
can be seen in Figure 1 through the growth of track mileage. By 1851 regional rail networks
had formed around the large towns in addition to more trunk lines being opened.
Railways marked a major improvement over the two inland modes of transport. For
example, rail freight rates in 1870 were one-tenth road freight rates in 1800 in real terms.
Railways were 15 times faster than barges on inland waterways (Bogart 2014). Most of all
railways had their greatest impact in passenger travel. They were able to displace long-
distance stage coach services. Shortly after the Mania, between 1845 and 1850, the number
of passenger journeys by rail increased by 117%, and again by 65% between 1850 and 1855.
It is important to note that railway companies were pro�t-seeking. Like their predecessors
who built canals and roads, their projects were privately �nanced. As a result, railway
companies selected their routes considering the �nancial bene�ts and costs. Their records
suggest a distinction was made between the �original line,� which often aimed to connect
large trading towns, and the �branch lines,� which linked smaller towns to the original lines.
Railway companies preferred original lines, but were sometimes pressured to build branches
through the parliamentary approval process. One promoter advised the following, �stick to
the original line; keep down the capital, and let competing schemes do their worst (quoted
in Simmons 1986, p. 271). Railway companies sometimes engaged in strategic building to
maintain their regional dominance.10 As a result, certain branch railway lines were built
even though they were unlikely to be individually pro�table. The details of railway building
will be important later as we consider endogeneity concerns.
Our population data come from British censuses, available every decade starting in 1801.
They have been digitized at the parish level up to 1911 by Tony Wrigley at the Cambridge
Group for the History of Population and Social Structure (CamPop). The population data
are complemented by similarly detailed occupational data. The census published parish-level
occupational counts starting in the early nineteenth century. The counts for 1851 and 1881
are available through the Integrated Census Micro data (ICeM) project (see Schürer and
Higgs 2014). We focus on male occupations because there is more agreement in the literature
on their classi�cation.11 Male occupations are classi�ed into employment categories using
the primary, secondary, and tertiary (PST) coding system.12
The administrative units recording population and occupations from 1801 to 1891 are not
always the same across time. The sources report data sometimes in parishes, sometimes in
10For the literature on the railway mania see Casson (2009), Odlyzko (2010), Campbell and Turner (2012,2015)
11See You (2014) for a new analysis of female employment in the 1851-1911 census.12Primary normally includes agriculture and mining, but we separate these two. Secondary refers to
the transformation of the raw materials produced by the primary sector into other commodities, whetherin a craft or manufacturing setting. Tertiary encompasses all services including transport, shop-keeping,domestic service, and professional activities. The PST system is described in detail in Shaw Taylor et. al.(2014) and Wrigley (2015).
townships within parishes, and sometimes in parishes that were later sub-divided. Wrigley,
Satchell, and collaborators at CamPop have created continuous parish units between 1801
and 1891 and linked them with census population data (see Satchel et. al. 2016 for details).
Using similar spatial matching techniques, we create a consistent set of boundaries for 9489
units that map population from 1801 to 1891 and male occupations in 1851 and 1881.13 We
call these `units' for short. Units are 15 square km on average and they belong to a larger
jurisdiction called registration districts. Districts are about 250 square km on average and
we have 616 registration districts in our data.
This paper also uses new GIS data on infrastructures. These include turnpike roads
in 1800 and 1830, waterways in 1800 and 1830, ports in 1826 and 1842, and railway lines
and stations in every census year starting in 1831.14 All these networks are created using
historical sources, not modern maps. For the analysis a straight line is drawn from the
center of each unit to its nearest infrastructure type. The unit center corresponds to the
market square if it had a town or the centroid if the unit had no town.15
Another key feature of our study is the incorporation of geographic data. For each unit
we create variables for being on exposed coal�elds, being on the coast, ruggedness, average
rainfall, average temperature, an index for wheat suitability, and the share of land in dif-
ferent soil types. We call these �rst-nature variables following the literature. Coastal units
are identi�ed using shape�les for parish boundaries in England and Wales. The ruggedness
measures include average elevation in the parish, the average elevation slope in the parish,
13Ms Gill Newton, of the Cambridge Group, developed the Python code for Transitive Closure as partof the research project `The occupational structure of Britain, 1379-1911' based at the Cambridge Group.Xuesheng You implemented this code for this particular paper.
14For a description of the turnpike GIS see Rosevear et. al. (2017). For description of the inland waterwaysdata see Satchell (2017). For a description of ports see Alvarez et. al. (2017). For railways see del Río, Martí-Henneberg, and Valentín (2008) for an initial description of the railways shape-�le data. Additional upgradeswere produced by the Cambridge group for the history of population and social structure (CamPop), seehttp://www.campop.geog.cam.ac.uk/research/projects/transport/data/railwaystationsandnetwork.html.
15We identify if a market existed at some point between 1600 and 1850. This applies to 746 of the 9489units. The centroid is taken as the unit center if there was no market. It should be noted that little erroris introduced by using the market or the centroid since units are so small. For a description of towns seehttp://www.campop.geog.cam.ac.uk/research/projects/transport/data/towns.html.
and the standard deviation in elevation slope. Appendix 3 provides a description of rugged-
ness measures, specially constructed for this paper. The share of parish land in 10 soil types
is from Avery (1980) and Clayden and Hollis (1985) and digitized by the National Soils Map
of England and Wales.16 Rainfall, temperature, and wheat suitability come from FAO.17
Of special signi�cance, Satchell and Shaw Taylor (2013) have generated a GIS shape�le of
exposed coal�elds from British Geological Survey data. It identi�es geographic areas in
England and Wales where coal bearing strata are not concealed by rocks laid down during
the Carboniferous Period. In economic terms, they represent the coal�elds that were known
in the nineteenth century and could be exploited by contemporary technology.18
We have another set of unit-level variables which we call second nature factors. These
include distance to the nearest major city in 1801. Major cities include the top ten cities in
terms of 1801 population. Also included are 1851 male employment shares. We use 5 main
occupational categories: (1) tertiary, (2) agriculture, (3) secondary, (4) mining/forestry, and
Table 1 reports summary statistics for the main variables. The population variables are
expressed in natural log di�erences between the years 1841 and 1891 and 1841 and 2011.
These approximate population growth but diminish outliers. As a check we can show that
our data replicates accepted national trends. For example, the share of the population living
in `urban units' (places with at least 400 persons per square km) increased from 42% in 1841
to 68% by 1891. It is thought that population generally declined outside the major towns.
The median population growth from 1841 to 1891 across all our units was -0.09%. Finally,
there is a view that population became more concentrated. The share of the total population
16The 10 soil categories are (1) Raw gley, (2) Lithomorphic, (3) Pelosols, (4) Brown, (5) Podzolic,(6) Surface-water gley, (7), Ground-water gley, (8) Man made, (9) peat soils, and (10) other. Seehttp://www.landis.org.uk/downloads/classi�cation.cfm#Clayden_and_Hollis. Brown soil is the most com-mon and serves as the comparison group in the regression analysis.
17See the Global Agro-Ecological Zones data at http://www.fao.org/nr/gaez/about-data-portal/agricultural-suitability-and-potential-yields/en/.
18The GIS does not capture a handful of tiny post carboniferous coal deposits,such as that at Cleveland (Yorkshire) which was worked in the 19th century. Seehttp://www.campop.geog.cam.ac.uk/research/projects/transport/data/coal.html for more details.
living in the top 1% of units increased from 5.4% in 1841 to 11.3% in 1881.
The summary statistics indicate that railway access di�ered across space in 1851. The
mean distance to an 1851 station is 10.4 km and 13.6% of units had a station within 2 km.
It was more common to be within 2 km of inland waterways and turnpike roads. In fact,
66% of units were within 2 km of a turnpike by 1830. As expected only 2.7% of units were
within 2 km of port.
The summary statistics for second nature controls indicate that the average unit had
a male agricultural employment share equal to 0.55. This �gure is much higher than the
aggregate agricultural share reported by Shaw-Taylor and Wrigley (2014). They report that
19% of adult males in England and Wales worked primarily in agriculture in 1871. The
reason is that secondary employment was far more concentrated than agricultural employ-
ment. In our units the top 1% accounted for 57% of male secondary employment in 1851.
Not surprisingly, our average unit was not representative of the EW economy in terms of
Is there any visual evidence that proximity to railways a�ected population growth?
Figure 2 provides an a�rmative answer. It shows railway lines and stations in 1851 along
with each unit's population growth from 1851 to 1881 (darker is higher growth). It is clear
that many areas which grew rapidly from 1851 to 1881 were close to railway stations. This is
most clear near Manchester, Birmingham, and London. However, the East Anglia region in
the upper right shows that having railway stations nearby did not guarantee high population
There are two other stylized facts worth mentioning. First, on average units that had
lower population density in the early nineteenth century grew more. However, there is one
density range where this was not true. Medium to high density units in the early nineteenth
century grew more than the middle or very large density units. The left panel in �gure 3
illustrates. It plots the log di�erence in 1891 and 1841 population density on the y-axis and
Table 1: Summary statisticsVariable Obs. Mean Std. Dev. Min Max
Population growth and occupational change variables
Ln di�. population 1841 to 1891 9489 0.0100 0.5136 -3.0796 4.8742
Ln di�. population 1841 to 2011 9488 0.5554 1.1906 -5.0739 5.9113
Distance to rail station in in 1851 km 9489 10.456 11.065 0.0215 73.129
Distance to GM LCP km (IV) 9489 11.861 16.548 0.0001 116.38
Indicator distance to rail station in 1851
Figure 2: Railways and parish population growth from 1851 to 1881
Sources: see text.
the log of 1841 population density on the x-axis. It also plots the locally weighted curve
�tting this data. There is a `hump' in the curve for medium to high 1841 density units,
roughly in the 75th to 90th percentiles.
The second fact to note is that units with a higher share of male agricultural employment
grew less. The right-hand graph in �gure 3 illustrates. It plots the share of male 1851
agricultural employment on the x-axis and the log di�erence in population on the y-axis.
The locally weighted curve has a negative slope between 0.2 and 0.9 agricultural shares.
Later we will see that the e�ects of railways stations are related to these two facts.
4 Baseline results
In this section, we examine how population growth was a�ected by access to infrastructure.
We begin by analyzing the following long di�erences speci�cation:
Figure 3: Plots of population growth against initial population density and agriculturalemployment shares
Sources: see text.
yi1891 − yi1841 = βI(Station < 2km)i1851 + β2I(Prerail < 2km)i1830 + γxi + di + εij (1)
where yi1891 − yi1841 is the natural log di�erence in 1891 and 1841 population. The initial
year of 1841 is chosen because there were few railway stations open in the census year
1831. 1891 was chosen because it is the last historical date for which we have data. The
main explanatory variable is the indicator I(Station < 2km)i1851 equal to one if unit i is
within 2 km distance of a railway station in 1851. 1851 is chosen because the rail network
underwent a major expansion in the 1840s due to the Railway Mania. 2 km is chosen
because it takes approximately 30 minutes to walk 2 km. We think 30 minutes represents
an average commute time for individuals who worked near the station or for �rms carting
their goods to the station for quick delivery. Below we also consider greater distances than
2 km and variables for station density to see if the main conclusions change. The summary
variable I(Prerail < 2km)i1830 includes I(turnpike < 2km)i1830, I(waterway < 2km)i1830, and
I(port < 2km)i1842. They are 3 indicators identifying whether a unit is within 2 km distance
from turnpike roads, canals, and ports c.1840. Together the coe�cients measure whether
walking distance to stations had a larger e�ect than walking distance to other transport
There are two sets of control variables included in xi. The �rst nature controls are listed
in the summary statistics and capture geographic endowments. Coal is perhaps the most
crucial as it thought to be linked with industrialization (Wrigley 2010). The second nature
controls are also listed in the summary statistics and include the log of population density
in 1841 and the log distance to the nearest major city in 1801. The coe�cient on the log
of 1841 population density tests whether smaller units in 1841 grew faster than larger units
over the next 50 years. The second nature controls also include the share of 1851 male
occupations in 4 main categories. The omitted group is the share in secondary.
Our list of explanatory variables is large but nevertheless there are some factors that
cannot be measured. Therefore it is useful to include district FEs di controlling for any
unobservables that are speci�c to the district surrounding the unit. In particular, we are
netting out di�erences in market access that are shared by all units in a district. For
example, if a district is close to the main rail line to Manchester or London then all units
in the district will have greater market access to a degree. Thus we are identifying the local
e�ect of being near a station.
The main coe�cient estimates for equation 1 are shown in table 2. Robust standard
errors are reported in the �rst four columns because they omit district FEs. Column (1) is
our most parsimonious speci�cation. It only includes the indicator for units within 2 km of
stations in 1851. The estimate shows that units within walking distance of stations in 1851
had 24.5 higher log points of population growth from 1841 to 1891. Column (2) is the same
as (1) but also includes the indicators for distance to pre-rail infrastructures. Being within
2 km of inland waterways, turnpike roads, and ports had positive and signi�cant e�ects on
population growth equal to 9.1, 6.4, and 13.0 log points respectively. The railway e�ect
falls to 18.6 log points. It is worth emphasizing what is learned from speci�cation (2). The
e�ect of railway station access is overstated without including measures for access to pre-rail
Table 2: Access to railways, infrastructures, and local population growth: baseline estimates
Dep. var.: unit pop. growth 1841 to 1891 (1) (2) (3) (4) (5)
coe� coe� coe� coe� coe�
variable (t-stat) (t-stat) (t-stat) (t-stat) (t-stat)
Indicator dist. to rail station in 1851
e�cients on stations, turnpike roads, and inland waterways decline somewhat. Now being
within walking distance to railway stations is estimated to have increased growth by 15.9
log points, while walking distance to inland waterways and turnpike roads is estimated to
have increased growth by 5.6 and 3.8 log points. These coe�cients can be converted into
a level e�ect by 1891. The coe�cient on the log of 1841 population density is -0.138 in
model (5). Its absolute value can be interpreted as the rate of convergence. Dividing the
station coe�cient 0.159 by 0.138 gives a value of 1.15. Thus population levels for units near
stations are estimated to have been 115 log points higher in 1891. By the same calculation
population levels for units near waterways and turnpikes were 40 and 30 log points higher
Overall the estimates suggest that being within walking distance of railways had larger
e�ects that walking distance to other infrastructures. The magnitudes suggest the e�ect of
proximity to railways was twice as large as proximity to waterways and more than three times
as large as proximity to turnpike roads. While railways were important, it is nonetheless
signi�cant that pre-rail infrastructures mattered for population growth from 1841 to 1891.
The latter had been built up decades or even a century before railways.
The conclusions are similar using di�erent indicators for walking distance to railway
stations. In the appendix we report models that use indicators for units within 2 km of
1841 or 1861 stations instead of 1851 stations. For context, 4.6% of units had an 1841
railway station within 2 km, 13.6% had an 1851 station within 2 km, and 19.7% had an
1861 station within 2 km. The coe�cients on railway station access are similar. It does
not seem to matter a great deal whether we use 1841, 1851, or 1861 stations to de�ne our
railway treatment group.
Another speci�cation considers whether a unit has any railway line in its boundaries.
The same model with all controls and FEs shows they had 16.4 more log points of growth.
One might �nd it surprising that indicators for being within 2 km of a station and having any
railway line have similar estimated e�ects. We think the high station density and relative
uniformity across lines in England and Wales meant that units close to railway lines were
generally close to stations. We also check whether station density within a unit matters.
We de�ne an indicator if the unit has exactly one station and another if the unit has more
than one station. The results show that units with more than one station had 27 log points
higher growth than units without any stations. This result makes sense since greater station
density o�ered more local and long distance connections.
Our baseline model considers walking distance to infrastructures and treats all other
distances the same. Next we consider a model using three distance bins: 0 to 2 km, 2 to 4
km, and 4 to 6 km. These represent approximately 30 minutes, 60 minutes, and 90 minutes
walking distance to rail stations, turnpikes, waterways, etc. Some individuals would have
been willing to commute 60 or 90 minutes and so population growth could be higher at 4
or 6 km from a station. Also �rms with lower value goods might be willing to locate 60 or
90 minutes away from the station because it was not essential to deliver goods quickly.
The results for the main variables are reported in table 3. The omitted group in these
regressions are units more than 6 km from infrastructures. The model also includes �rst and
second nature controls and district FEs. We �nd that being 2-4 and 4-6 km from a railway
station increased population growth, but less so compared to the e�ect of being 0-2 km from
stations. In other words, population growth diminished as distance to stations increased
up to 6 km. The same pattern is found for waterways although the e�ects of population
growth are smaller than railways in all distance bins. Being 2-4 km from a port has the same
e�ect as being 0-2 km. Interestingly, being 2-4 km from a turnpike signi�cantly decreased
population growth relative to areas beyond 6 km. Note that 7% of units were more than 6
km from an 1830 turnpike so this comparison could be biased by small numbers.
Table 3: Population growth at increasing distances from railways and infrastructures
Dep. var.: unit pop. growth 1841 to 1891 (1) (2) (3)
coe� coe� coe�
variable (t-stat) (t-stat) (t-stat)
rail station in 1851
discussed, but it is most important for our analysis as growth is our dependent variable. We
formally test whether units close to railway stations had higher population growth decades
before getting railway stations. A simple di�erence in means test shows that population
growth from 1801 to 1831 was 6.8 log points higher for parish units within 2 km of railway
stations in 1851 (p-value is 0.000). The same `pre-trend' result holds in a regression including
district FEs and �rst and second nature controls. The implication is that our baseline
estimates for the e�ect of station distance are potentially biased.
We address this issue using two approaches. The �rst applies propensity score matching.
The treatment variable is being within 2 km of an 1851 railway station. The outcome
variable is population growth from 1841 to 1891. We use a parsimonious set of covariates:
(1) population density in 1841, (2) the share of male agricultural employment in 1851, (3)
having exposed coal, and (4) population growth from 1801 to 1831. We match exactly one
nearest neighbor to the units within 2 km of 1851 stations using the logit model. This set
of covariates yields balanced matched sample. Table 4 shows the standard di�erences in
the covariate means are close to zero in the matched sample but not in the raw data. The
bottom rows of table 4 show the average di�erence in means for population growth from
1841 to 1891. In the raw data units within 2 km of stations have 24.4 log points higher
population growth. In the matched sample they have 20.6 log points higher growth. Thus a
parsimonious matching exercise implies slightly larger e�ects as the regression models using
a large number of control variables.
Our second and primary approach to endogeneity of stations uses an instrumental vari-
able derived from the `inconsequential places' approach.19 The key assumption is that some
inconsequential units became close to railway stations simply because they were on the orig-
inal line designed to connect larger towns at a low capital cost. In other words, they were
not selected because of their potential for future growth. The �rst step in creating the in-
19See Chandra and Thompson (2000), Michaels (2008), Faber (2014), and Lipscombe et. al. (2013).
Table 4: Matching estimator for e�ect of walking distance to railway stationsUnits within 2km 1851 stations (1 vs. 0)
Covariate Standardized di�erences�raw Variance ratio�raw
Ln pop. per sq. km 1841 1.0714 8.1842
Has exposed coal 0.2919 2.123
Share of 1851 male emp. in agric. -1.137 1.7532
Ln di�erence pop. 1831 and 1801 0.2301 2.5178
Covariate Standardized di�erences�matched Variance ratio�matched
Ln pop. per sq. km 1841 0.0069 1.010
Has exposed coal -0.0201 0.9384
Share of 1851 male emp. in agric. 0.0045 0.9222
Ln di�erence pop. 1831 and 1801 -0.0159 1.1530
Units within 2km 1851 stations (1 vs. 0)
Av. Ln di�, pop. 1891 and 1841 Di�erence in means�raw data Di�erence in means�matched data
(standard error) (robust standard error)
0.01004 0.244 0.206
N 9,489 9,485
Notes: * p
minimize the construction costs considering distance and elevation slope. The baseline
model uses construction cost data for railways built in the 1830s and early 1840s. We
measure the distance of the lines and total elevation changes between towns at the two ends
of the line. The construction cost is then regressed on the distance and the elevation change
to identify the parameters (the details are in appendix 2). Based on this analysis we �nd a
baseline construction cost per km when the slope is zero and for every 1% increase in slope
the construction cost rises by three times the baseline (costperkm = 1+ 3 ∗ slope%). Next,
we use this formula to identify the least cost path connecting our town pairs. The end result
is a network of candidate railway lines linking towns.
The LCP network is shown in the right hand panel of �gure 4. The left hand panel shows
the real railway network in 1851. The overlap of the LCP and the 1851 rail network is fairly
high. Locations close to the LCP are also generally close to railway stations because they
were so numerous along the line.
The instrument must predict proximity to railway stations and it must satisfy the exclu-
sion restriction. First, we show that being within 2 km of the LCP predicts the likelihood of
a unit being within 2 km of a railway station. We estimate a regression similar to equation
(1) which includes �rst and second nature controls and district FEs. We �nd a strong posi-
tive relationship between the indicator for being within 2km of the LCP and the probability
of being within 2 km of an 1851 railway station. The coe�cient 0.082 with a t-stat equal to
Second, we test whether distance to the LCP is correlated with population growth be-
tween 1801 and 1831 once the e�ect of pre-rail infrastructure distance, geography, and
district FEs are accounted for. If there is a signi�cant correlation that would raise questions
about the exclusion restriction. The results from several speci�cations are shown in table 5.
Note we exclude 364 units within 2 km of the the town nodes used to construct the LCP.
In column (1), being within 2 km of the LCP is positively and signi�cantly associated with
Figure 4: The rail network in 1851 and the least cost path (LCP) network
Sources: see text.
higher population growth from 1801 to 1831. Columns (2) and (3) show the same result
holds after including district FEs and �rst nature controls. However, in columns (4) and (5),
which include pre-rail infrastructures and second nature controls, being within 2 km of the
LCP is no longer signi�cantly associated with higher population growth from 1801 to 1831.
Why? Distance to the LCP is correlated with distance to turnpikes and inland waterways,
which themselves contributed to population growth from 1801 to 1831. Therefore, omitting
them leads to a spurious correlation between LCP distance and population growth from
1801 to 1831. Thus the exclusion restriction for within 2 km of the LCP is only defensible
in a model that includes distance to pre-rail infrastructures as controls.
Table 5: Pre-trend tests for the validity of the distance to LCP instrument
Dep. var.: unit pop. growth 1801 to 1831 (1) (2) (3) (4) (5)
coe� coe� coe� coe� coe�
variable (t-stat) (t-stat) (t-stat) (t-stat) (t-stat)
Distance to LCP for railways
Table 6: Railway stations and population growth: IV estimates
Dep. var.: unit pop. growth 1841 to 1891 IV OLS
variable (t-stat) (t-stat)
Indicator distance to 1851 railway station
makes sense according to the new economic geography (NEG) literature.21 It argues that
agglomeration can increase as transport costs fall. The key factor is increasing returns to
scale. As an area gets larger it becomes more productive and hence more attractive for
consumers and �rms. The economy will not completely degenerate to a single location
however. Congestion and land constraints provide a check on the size of the largest cities.
Applying this framework to England and Wales, being close to railway stations should have
increased population growth more in initially dense units, although not the most dense units.
In order to test this hypothesis we estimate the e�ect of being within 2 km of railway
stations depending on a unit's 1841 population density. We estimate the following model.
yi1891−yi1841 = β0I(Station < 2km)i+β1I(Station < 2km)ilnpop41i+β2I(Station < 2km)i(lnpop41i)2+γxi+dj+εij
where the natural log of 1841 population density and its square are interacted with the
indicator for being within 2 km of an 1851 station. The quadratic formulation is �exible
and allows for non-linear e�ects. Note that 1841 population density and its square are
included as controls in xi, along with district FEs and �rst and second nature controls.
The estimates reveal an important result: being close to railway stations had a signi�-
cantly larger growth e�ect for units with medium to large population density in 1841. To
illustrate, we plot our predicted population growth from 1841 to 1891 for units between the
5th and 95th percentiles in 1841 population density. One prediction is for units less than 2
km from 1851 stations and the other is for units more than 2 km from stations (see �gure
5). Railways have their largest e�ect for population densities above the mean and approx-
imately between the 75th and 90th percentiles. Thus our results are consistent with NEG
forces, which suggest that by lowering transport costs railways leads to more agglomeration
in dense locations.
21See Fujita et. al. (2001) and Desment and Rossi-Hansberg (2014) for details.
Figure 5: Heterogeneity I: initial population density
Sources: see text.
There is additional evidence to support the NEG mechanism. Units with other infras-
tructures, like inland waterways and turnpikes, also had lower transport costs. Thus if the
same mechanism is at work we should also expect that being near inland waterways or being
near turnpikes should have increased growth more in units with medium to large popula-
tion density in 1841. We test this prediction for waterways using the same methodology as
equation (2). We interact the log of 1841 pop. density and its square with the indicator
for being within 2 km of an inland waterway. The results show a similar pattern. They are
summarized in the right hand panel of �gure 5.
Next we examine how heterogeneity was related to occupational structure drawing on
the globalization literature. As railways spread through EW, the global economy was be-
coming more open. England then exported more manufactured goods and imported more
agricultural goods. There were many reasons for these changes in trade. One was compar-
ative advantage. England had the most productive manufacturing sector in the world by
1850. Transport improvements were another. Steamships and railways provided better con-
nections between inland areas and the international economy. (O'Rourke and Williamson
2002). The connection between railways and grain imports is supported in tra�c data.
Hawke (1970, p. 128) estimates that imported wheat represented at least half of all wheat
hauled by English railways in 1865. All of this suggests being near railways created more
growth if the unit was more specialized in secondary sector and less growth if it was more
specialized in agriculture. We test this hypothesis with the following model
yi1891 − yi1841 = β0I(Station < 2km)i +4∑
βkI(Station < 2km)ioccshareki + γxi + dj + εij (3)
where occshareki is the share of 1851 male occupations in category k. Note that occupational
shares and pre-rail infrastructure are included as controls in xi.
The estimates show that being close to railway stations had a signi�cantly lower e�ect on
population growth for units with a higher share of agricultural employment. The coe�cient
estimates are reported in the appendix and here we illustrate the e�ects using an example.
We consider the occupational structure of the average unit (where average is de�ned by the
mean occupational shares across units) along with hypothetical units more specialized in
secondary, agriculture, and tertiary. The occupational shares for each type of unit are shown
in table 7. Next we predict population growth from 1841 to 1891 for each hypothetical unit
type with and without the treatment of being close to 1851 railway stations. The calculations
are reported in table 7. The hypothetical average unit grows 13 percentage points more being
close to stations, while the agricultural unit grows 7.4 percentage points more being close
to stations. The secondary unit grows the most from being close to stations, speci�cally 19
percentage points more.
These �ndings suggest that railways enhanced globalization forces in EW. Units more
specialized in the secondary sector had the greatest comparative advantage in world markets.
Railways helped these advantaged units to grow more. On the other hand, units more
specialized in agriculture had the least comparative advantage. Railways slowed their growth
and encouraged out-migration.
Table 7: Heterogeneity II: occupational structure in 1851Occ. shares in unit types
Occupation categories `average' `agricultural' `secondary' `tertiary'
agriculture 0.55 0.75 0.35 0.35
secondary 0.2 0.0 0.4 0.2
tertiary 0.15 0.15 0.15 0.35
mining/forestry 0.025 0.025 0.025 0.025
unspeci�ed 0.075 0.075 0.075 0.075
Predicted pop. growth 2km from stations -0.0111 -0.2128 0.1905 0.2141
(standard error) (0.0043) (0.0196) (0.0197) (0.0192)
Di�erence Predicted pop. growth 0.1324 0.0747 0.1903 0.1556
As a �nal exercise we examine the current e�ects of infrastructures formed in the past. Here
we merge our historical continuous units with 2011 data on 34,753 Lower Super Output
Areas (LSOA). The intersect function in ArcMap is applied to the boundary lines of LSOAs
and the boundary lines of units. The result is that we can study the change in population
from 1841 to 1891 to 2011 for the same spatial units. We estimate the following `very' long
yi2011 − yi1841 = βI(Station < 2km)i1851 + β2I(Prerail < 2km)i1830 + γxi + di + εij (4)
where the dependent variable yi2011− yi1841 measures population growth from 1841 to 2011.
The OLS results are reported in column (1) of table 8. We �nd that being close to railway
stations in the mid-nineteenth century increases population density in 2011. More strikingly
the same is true for being close to turnpike roads and inland waterways around 1830. Also
striking is the large long-term e�ect of turnpike roads. The coe�cient on turnpike roads
nearly as large as railways. Column (2) shows the IV estimate using the indicator for being
within 2 km of the LCP as the instrument. As before the IV estimate for access to stations
is much larger. While there are many questions as to how persistence worked, we take this
as strong evidence for a long-term e�ect of infrastructures in the EW economy.
Table 8: Infrastructure access population growth over the very long run
Dep. var.: unit pop. growth 1841 to 1891 OLS IV
variable (t-stat) (t-stat)
Indicator distance to 1851 railway station
the validity of least cost path instruments may be under-mined if other infrastructures are
ignored. Our estimates still show that railways had the largest e�ect on population growth,
but turnpike roads and canals still mattered a great deal.
Second, the e�ects of being close to railway stations depended on initial population
density and occupational structure. Speci�cally we �nd that the e�ects of railways and
other infrastructures were larger for more densely populated units in 1841 and units that
had a lower share of male agricultural employment. The latter results are consistent with
New Economic Geography (NEG) models that emphasize increases in agglomeration with
lower transport costs. They are also consistent with the literature on nineteenth century
globalization, which emphasizes EW's shift out of agriculture and into manufacturing as
trade costs fell.
Third, we argue that the local e�ects of infrastructures can persist for centuries. We
show that the population distribution in 2011 England and Wales is still in�uenced by
infrastructures in the mid-nineteenth century. This implies that the policy decisions made
today to promote or manage urbanization will have e�ects for decades to come, perhaps
1. Alvarez, Eduard, Xavi Franch, and Jordi Martí-Henneberg. "Evolution of the territorialcoverage of the railway network and its in�uence on population growth: The case of Englandand Wales, 1871�1931." Historical Methods: A Journal of Quantitative and InterdisciplinaryHistory 46.3 (2013): 175-191.
2. Alvarez, E., Dunn, O., Bogart, D., Max Satchell, Leigh Shaw-Taylor, 'Ports of England andWales, 1680-1911', 2017.
3. Armstrong, John. The Vital Spark: The British Coastal Trade, 1700-1930. InternationalMaritime Economic History Association, 2009.
4. Atack, Jeremy, Fred Bateman, Michael Haines, and Robert A. Margo. "Did railroads induceor follow economic growth?." Social Science History 34, no. 2 (2010): 171-197.
5. Atack, Jeremy, and Robert A. Margo. "The Impact of Access to Rail Transportation onAgricultural Improvement: The American Midwest as a Test Case, 1850-1860." Journal ofTransport and Land Use 4.2 (2011).
6. Avery, Brian William. Soil classi�cation for England and Wiles: higher categories. No.631.44 A87. 1980.
7. Baines D. Migration in a mature economy: emigration and internal migration in Englandand Wales 1861-1900. Cambridge University Press; 2002.
8. Baum-Snow, N., Brandt, L., Henderson, J. V., Turner, M. A., & Zhang, Q. (2017). Roads,railroads, and decentralization of Chinese cities. Review of Economics and Statistics, 99(3),435-448.
9. Becker, Sascha O., Erik Hornung, and Ludger Woessmann. "Education and catch-up in theindustrial revolution." American Economic Journal: Macroeconomics (2011): 92-126.
10. Berger, Thor, and Kerstin En�o. "Locomotives of local growth: The short-and long-termimpact of railroads in Sweden." Journal of Urban Economics (2015).
11. Bogart, Dan. �The Transport Revolution in Industrializing Britain,� in Floud, Roderick, JaneHumphries, and Paul Johnson, eds. The Cambridge Economic History of Modern Britain:Volume 1, Industrialisation, 1700�1870. Cambridge University Press, 2014.
12. Campbell, Gareth, and John D. Turner. "Dispelling the Myth of the Naive Investor duringthe British Railway Mania, 1845�1846." Business History Review 86.01 (2012): 3-41.
13. Campbell, Gareth, and John D. Turner. "Managerial failure in mid-Victorian Britain?:Corporate expansion during a promotion boom." Business History 57.8 (2015): 1248-1276.
14. Casson, Mark. The world's �rst railway system: enterprise, competition, and regulation onthe railway network in Victorian Britain. Oxford University Press, 2009.
15. Casson, Mark. "The determinants of local population growth: A study of Oxfordshire in thenineteenth century." Explorations in Economic History 50.1 (2013): 28-45.
16. Chandra, Amitabh, and Eric Thompson. "Does public infrastructure a�ect economic ac-tivity?: Evidence from the rural interstate highway system." Regional Science and UrbanEconomics 30.4 (2000): 457-490.
17. Clayden, Benjamin, and John Marcus Hollis. Criteria for di�erentiating soil series. No. TechMonograph 17. 1985.
18. Cormen, Thomas H., Charles E Leiserson, Ronald L Rivest and Cli�ord Stein: Introductionto Algorithms, Cambridge, MA, MIT Press (3rd ed., 2009) pp.695-6.
19. Crafts, Nicholas, and Abay Mulatu. "How did the location of industry respond to fallingtransport costs in Britain before World War I?." The Journal of Economic History 66.03(2006): 575-607.
20. Crafts, Nicholas, and Nikolaus Wolf. "The location of the UK cotton textiles industry in1838: A quantitative analysis." The Journal of Economic History 74.04 (2014): 1103-1139.
21. Del Río, Eloy, Jordi Martí-Henneberg, and Antònia Valentín. "La Evolución de la red fer-roviaria en el Reino Unido (1825-2000)." Treballs de La Societat Catalana de Geogra�a 65(2008): 654-663.
22. Demographia World Urban Areas, 14th Annual Edition" (PDF). April 2018
23. Desmet, Klaus, and Esteban Rossi-Hansberg. "Spatial development." The American Eco-nomic Review 104.4 (2014): 1211-1243.
24. Donaldson, Dave. Railroads of the Raj: Estimating the impact of transportation infrastruc-ture. No. w16487. National Bureau of Economic Research, 2010.
25. Donaldson, Dave, and Richard Hornbeck. "Railroads and American economic growth: A�market access� approach." The Quarterly Journal of Economics 131.2 (2016): 799-858.
26. Duranton, Gilles, and Matthew A. Turner. "Urban growth and transportation." The Reviewof Economic Studies 79.4 (2012): 1407-1440.
27. Faber, Benjamin. "Trade integration, market size, and industrialization: evidence fromChina's National Trunk Highway System." Review of Economic Studies 81.3 (2014): 1046-1070.
28. Fernihough, Alan, and Kevin Hjortshøj O'Rourke. Coal and the European industrial revolu-tion. No. w19802. National Bureau of Economic Research, 2014.
29. Fishlow, Albert. American Railroads and the Transformation of the Ante-bellum Economy.Vol. 127. Cambridge, MA: Harvard University Press, 1965.
30. Fogel, R. "Railways and American Economic Growth." Baltimore: Johns Hopkins Press.(1964).
31. Freeman, Michael J., and Derek H. Aldcroft, eds. Transport in Victorian Britain. ManchesterUniversity Press, 1991.
32. Fujita, Masahisa, Paul R. Krugman, and Anthony Venables. The spatial economy: Cities,regions, and international trade. MIT press, 2001.
33. Gregory, Ian N., and Jordi Martí Henneberg. "The railways, urbanization, and local demog-raphy in England and Wales, 1825�1911." Social Science History 34.2 (2010): 199-228.
34. Gutberlet, Theresa. "Cheap Coal versus Market Access: The Role of Natural Resources andDemand in Germany's Industrialization." (2014).
35. Hawke, Gary Richard. Railways and economic growth in England and Wales, 1840-1870.Clarendon Press, 1970.
36. Heblich, Stephan, and Alex Trew. "Banking and Industrialization." (2017).
37. Herranz-Loncán, Alfonso. "Railroad impact in backward economies: Spain, 1850�1913." TheJournal of Economic History 66.04 (2006): 853-881.
38. Hornung, Erik. "Railroads and growth in Prussia." Journal of the European Economic As-sociation 13.4 (2015): 699-736.
39. Jedwab, Remi, Edward Kerby, and Alexander Moradi. "History, path dependence and de-velopment: Evidence from colonial railroads, settlers and cities in Kenya." The EconomicJournal (2015).
40. Jarvis A., H.I. Reuter, A. Nelson, E. Guevara (2008). Hole-�lled seamless SRTM data V4, In-ternational Centre for Tropical Agriculture (CIAT), available from http://srtm.csi.cgiar.org.
41. Jaworski, Taylor, and Carl T. Kitchens. "National Policy for Regional Development: Histor-ical Evidence from Appalachian Highways." (2017).
42. Klein, Alexander, and Nicholas Crafts. "Making sense of the manufacturing belt: determi-nants of US industrial location, 1880�1920." Journal of Economic Geography 12.4 (2012):775-807.
43. Law, Christopher M. "The growth of urban population in England and Wales, 1801-1911."Transactions of the Institute of British Geographers (1967): 125-143.
44. Leunig, Timothy. "Time is money: a re-assessment of the passenger social savings fromVictorian British railways." The Journal of Economic History 66.3 (2006): 635-673.
45. Lipscomb, Molly, Mush�q A. Mobarak, and Tania Barham. "Development e�ects of electri�-cation: Evidence from the topographic placement of hydropower plants in Brazil." AmericanEconomic Journal: Applied Economics 5.2 (2013): 200-231.
46. Long, Jason. "Rural-urban migration and socioeconomic mobility in Victorian Britain."Journal of Economic History (2005): 1-35.
47. Michaels, Guy. "The e�ect of trade on the demand for skill: Evidence from the interstatehighway system." The Review of Economics and Statistics 90.4 (2008): 683-701.
48. Odlyzko, Andrew. "Collective hallucinations and ine�cient markets: The British RailwayMania of the 1840s." University of Minnesota (2010).
49. O'Rourke, Kevin H. "The European grain invasion, 1870�1913." The Journal of EconomicHistory 57.4 (1997): 775-801.
50. O'Rourke, Kevin H., and Je�rey G. Williamson. "When did globalisation begin?." EuropeanReview of Economic History 6.1 (2002): 23-50.
51. Pascali, Luigi. "The wind of change: Maritime technology, trade and economic development."American Economic Review (2016).
52. Pascual Domènech, P. (1999). Los caminos de la era industrial: la construcción y �nanciaciónde la red ferroviaria catalana, 1843-1898 (Vol. 1). Edicions Universitat Barcelona.
53. Pope, Alexander, and D. SWANN. "The pace and progress of port investment in England1660�1830." Bulletin of Economic Research 12.1 (1960): 32-44.
54. Poveda, G. (2003). El antiguo ferrocarril de Caldas. Dyna, 70 (139), pp. 1-10.
55. Purcar, Cristina. "Designing the space of transportation: railway planning theory in nine-teenth and early twentieth century treatises." Planning Perspectives 22.3 (2007): 325-352.
56. Ravenstein, Ernest George. "The laws of migration." Journal of the statistical society ofLondon 48.2 (1885): 167-235.
57. Redding, Stephen J., and Matthew A. Turner. Transportation costs and the spatial organi-zation of economic activity. No. w20235. National Bureau of Economic Research, 2014.
58. Redford, Arthur. Labour migration in England, 1800-1850. Manchester University Press,1976.
59. Riley, S. J., S. D. Gloria, and R. Elliot (1999). A terrain Ruggedness Index that quanti�esTopographic Heterogeneity, Intermountain Journal of Sciences, 5(2-4), 23-27.
60. Robson, Brian T. Urban growth: an approach. Vol. 9. Routledge, 2006.
61. Rosevear, A., Satchell, A.E.M., Bogart, D., Shaw Taylor, L., 'Turnpike roads of England andWales,' 2017.
62. Satchell, A.E.M. 'Navigable waterways and the economy of England and Wales 1600-1835,'2017.
63. Satchell, A.E.M., Kitson, P.M.K., Newton, G.H., Shaw-Taylor, L., Wrigley E.A., 1851 Eng-land and Wales census parishes, townships and places. Working paper (2016).
64. Satchell, A.E.M. and Shaw-Taylor, L., Exposed coal�elds of England and Wales (2013).
65. Schurer, K., Higgs, E. (2014). Integrated Census Microdata (I-CeM), 1851-1911. [datacollection]. UK Data Service. SN: 7481, http://doi.org/10.5255/UKDA-SN-7481-1.
66. Shaw-Taylor, L. and Wrigley, E. A. �Occupational Structure and Population Change,� inFloud, Roderick, Jane Humphries, and Paul Johnson, eds. The Cambridge Economic Historyof Modern Britain: Volume 1, Industrialisation, 1700�1870. Cambridge University Press,2014.
67. Simmons, Jack. The railway in town and country, 1830-1914. (1986).
68. Storeygard, Adam. "Farther on down the road: transport costs, trade and urban growth insub-Saharan Africa." The Review of Economic Studies 83.3 (2016): 1263-1295.
69. Tang, John P. "Railroad expansion and industrialization: evidence from Meiji Japan." TheJournal of Economic History 74.03 (2014): 863-886.
70. Tang, John P. "The Engine and the Reaper: Industrialization and mortality in late nineteenthcentury Japan." Journal of health economics 56 (2017): 145-162.
71. United Nations, Department of Economic and Social A�airs, Population Division (2014).World Urbanization Prospects: The 2014 Revision, Highlights (ST/ESA/SER.A/352)
72. Wellington, A.M. The Economic Theory of the Location of Railways: An Analysis of theConditions Controlling the Laying Out of Railways to E�ect the Most Judicious Expenditureof Capital. Ed. J. Wiley & sons, 1877.
73. Willan, Thomas Stuart. River navigation in England, 1600-1750. Psychology Press, 1964.
74. Wrigley, Edward Anthony. Energy and the English industrial revolution. Cambridge Uni-versity Press, 2010.
75. Wrigley, E. A. �The PST system of classifying occupations,� Working paper 2015.
76. You, Xuesheng. Women's employment in England and Wales, 1851-1911, University of Cam-bridge, unpublished phd dissertation, 2014.
A.1 The least cost path instrument
In this section, we describe how we construct the instrument for distance to railway stations.
The �rst step is to select the nodes of the hypothetical network and then which nodes will
become origins and destinations connected by the least cost path (LCP). The candidate
nodes are all the towns with a population over 5,000 inhabitants in 1801. These were the
major population centers. Each pair of towns, both with a population above 5000, is a
potential origin and destination for railway lines. A gravitational model selects the origins
and destinations that will be connected based on an approximation for the value of trade
between the potential origin and destination. We assume the value of connecting an origin
and destination pair is given by GMij =PopiPopjDistij
, where GMij is the gravitational potential
between town i and j, Popi is the 1801 population of town i, and Distit is the straight line
distance between i and j. We chose the town pair i and j as origins and destinations in our
LCP if GMij > 10, 000.
The second step is to identify the LCP connecting our nodes. The main criteria used to
plan linear projects is usually the minimization of earth-moving works. Assuming that the
track structure (composed by rails, sleepers and ballast) is equal for the entire length, it is
in the track foundation where more di�erences can be observed. Thus, terrains with higher
slopes require larger earth-moving and, in consequence, construction costs become higher
(Pascual 1999, Poveda 2003, Purcar 2007). The power of traction of the locomotives and
the potential adherence between wheels and rails could be the main reason. Besides, it is
also important to highlight that having slopes over 2% might imply the necessity of building
tunnels, cut-and-cover tunnels or even viaducts. The perpendicular slope was also crucial.
During the construction of the track section, excavation and �lling have to be balanced in
order to minimize provisions, waste and transportation of land. Nowadays, bulldozers and
trailers are used, but historically workers did it manually. It implied a direct linkage between
construction cost, wages and availability of skilled laborers. In fact, it is commonly accepted
in the literature that former railways were highly restricted by several factors. The quality of
the soil, the necessity of construction tunnels and bridges or the inference with preexistences
(building and land dispossession) were several. Longitudinal and perpendicular slope were
the more signi�cant ones and we focus on these below.
Slopes are determined using elevation data. Several DEM rasters have been analyzed
in preliminary tests, but we �nally chose the Shuttle Radar Topography Mission (SRTM)
obtained in 90 meter measurements (3 arc-second). Although being a current raster data
set, created in 2000 from a radar system on-board the Space Shuttle, the results o�ered
in historical perspective should not di�er much from the reality. The LCP tool calculates
the route between an origin and a destination, minimizing the elevation di�erence (or cost
in our case) in accumulative terms. The method developed was based on the ESRI Least-
Cost-Path algorithm, although additional tasks were implemented to optimize the results
and to o�er di�erent scenarios. The input data was the SRTM elevation raster, converted
into slope. This conversion was necessary in order to input di�erent construction costs.
The third step is to specify the relationship between construction costs and slope. One
approach is to use the historical engineering literature. Wellington (1877) discusses elevation
slope (i.e. gradients), distance, and operational costs of railways, but this is not ideal as we
are interested in construction costs. We could not �nd an engineering text that speci�ed
the relationship between construction costs and slopes. As an alternative we use historical
construction cost data. The following details our data and procedure.
A select committee on railways in 1844 published a table on the construction costs of 54
railways.22 There were 45 with a clear origin and destination, to which we can measure total
elevation change along the route (details are available). For these 45 railways we calculate
the distance of the railway line in meters and the total elevation change (all meters of ascent
and descent). We then ran the following regression for railway i:
ConstructionCostsi = αDistance100Metersi + βElevationchangeMetersi + εi. (5)
where construction costs are measured in pounds. This regression produces unsatisfactory
results, with total elevation change having a negative sign. We think the main reason is
that the sample includes railways with London as an origin and destination. Land values
in London were much higher than elsewhere and thus construction costs were higher there.
Therefore, we omit railways with a London connection. We also think it is important to
account for railways in mining areas as they were typically built to serve freight tra�c rather
than a mix with passenger.
Our extended model uses construction costs for 36 non-London railways and follows the
22See the Fifth report from the Select Committee on Railways; together with the minutes of evidence,appendix and index (BPP 1844 XI). The speci�c section with the data is appendix number 2, report to thelords of the committee of the privy council for trade on the statistics of British and Foreign railways, pp.4-5.
ConstructionCostsi = αDistance100Metersi+βElevationchangeMetersi+µminingrailwayi+εi
The results imply that for every 100 meters of distance construction costs rise by 128.9
(st. err 45.27) and holding distance constant construction costs rise by 382.6 (st. err.
274.5) for every 1 meter increase in total elevation change. Construction costs for min-
ing railways are 340,418 pounds less (st. err. 179,815). For our LCP model we assume
a non-mining railway, re-scale the �gures into construction costs per 100 meters, and nor-
malize so that costs per 100 meters are 1 at zero elevation change. The formula becomes
NormalizedCostper100meters = 1+2.96∗(ElevationChangeMeters/Distance100meters).
The elevation change divided by distance can be considered as the slope in percent, in which
case our formula becomes Cost = 1 + 2.96 ∗%slope. We think this is a reasonable approxi-
mation of the relationship between construction costs, distance, and elevation slope.
For computational purposes it is convenient to divide slope into bins of 0 to 1%, 1 to 2%,
and so on. The following table gives the costs over a standardized distance for di�erent slope
bins in our preferred, which is labeled scenario 2. For comparison, we also show parameters
assuming a constant unitary linear cost in slope (scenario 1) and case where slope costs
are graded, and are constant up to 2 to 3% and then rise up to 6-7% when costs become
constant (scenario 3).
slope % cost scenario 1 cost scenario 2 (preferred) cost scenario 3
0 0 1 1
0-1 1 4 1
1-2 2 7 1
2-3 3 10 4
3-4 4 13 7
4-5 5 16 11
5-6 6 19 15
6-7 7 22 19
7-8 8 25 19
8-9 9 28 19
9-10 10 31 19
>10 ... 34 19
The LCP algorithm is implemented using ESRI python, using as initial variables the
elevation slope raster, the reclassi�cation table of construction costs, and the node origin-
destination nodes. The cost distance and the back-link rasters using the formulation below:
GMij = ((CostSurface(a) ∗HF (a)) + CostSurface(b) ∗HF (b))
2)∗SurfaceDistance(ab)∗V F (ab)
where CostSurface(j) is the cost of travel for cell j, HF (j) is the horizontal factor for cell
j, SurfaceDistance(ab) is the surface distance for a to b, and V F (ab) is the vertical factor
from a to b. Note that the division by 2 of the friction of the segments is deferred until
the horizontal factor is integrated. Finally, we implemented the least-cost-path function to
obtain the LCP corridors. These corridors were converted to lines, exported, merged and
post-processed. Maps of our preferred LCP using scenario 2 are shown in the text.
A.2 Elevation, slope, and ruggedness variables
The aim of this appendix is to explain the creation of the elevation variables, including the
original sources and method we followed to estimate them. There are several initiatives
working on the provision of high-resolution elevation raster data across the world. The
geographical coverage, the precision of the data and the treatment of urban surroundings
concentrate the main di�erences between databases.
In order to carry on this work, we have downloaded several elevation DEM rasters,
preferably DTM , covering the entire England and Wales. In decreasing order in terms of
accuracy, the most precise one database was LIDAR (5x5m.), Landmap Data set contained
in the NEODC Landmap Archive (Centre for Environmental Data Archival). In second
instance, we used EU-DEM (25x25m.) from the GMES RDA project, available in the
EEA Geospatial Data Catalogue (European Environment Agency). The third dataset was
the Shuttle Radar Topography Mission (SRTM 90x90m), created in 2000 from a radar
system on-board the Space Shuttle Endeavor by the National Geospatial-Intelligence Agency
(NGA) and NASA. And �nally, we have also used GTOPO30 (1,000x1,000m) developed by
a collaborative e�ort led by sta� at the U.S. Geological Survey's Center for Earth Resources
Observation and Science (EROS). All those sources have been created using satellite data,
which means all of them are based in current data. The lack of historical sources of elevation
data obligate us to use them, although the involved contradictions. This simpli�cation may
be considered reasonable for rural places but it is more inconsistent in urban surroundings
where the urbanization process altered the original landscape. Even using DTM rasters, the
construction of buildings and technical networks involved a severe change in the surface of
the terrain. Several tests at a local scale were conducted with the di�erent rasters in order to
establish a balance between precision and operational time spend in the calculations. Total
size of the �les, time spend in di�erent calculations and precision in relation to the �nest
data were some of the comparisons carried on. After these, we opted for SRTM90.
As stated in the appendix on mappable units, the spatial units used as a basis for the
present paper were civil parishes, comprising over 9000 continuous units. In this regard,
we had to provide a method to obtain unique elevation variables for each unit, keeping
the comparability across the country. We estimated six variables in total: elevation mean,
elevation std, slope mean, slope std, ruggedness mean and ruggedness std. Before starting
with the creation of the di�erent variables, some work had to be done to prepare the data. In
order to obtain fully coverage of England and Wales with SRTM data, we had to download
7 raster tiles. Those images were merged together, projected into the British National Grid
and cut externally using the coastline in ArcGIS software.
Having the elevation raster of England and Wales, we proceed to calculate the �rst two
variables: the elevation mean and its standard deviation. A python script was written to
split the raster using the continuous units, to calculate the raster properties (mean and
standard deviation) of all the cells in each sub-raster, and to aggregate the information
obtained in a text �le. These �les were subsequently joined to the previous shape�le of civil
parishes, o�ering the possibility to plot the results.
The second derivative of those results aimed to identify the variability of elevation be-
tween adjacent cells. In this regard, two methods were developed to measure this phe-
nomenon: ruggedness and slope. Ruggedness is a measure of topographical heterogeneity
de�ned by Riley et al (1999). In order to calculate the ruggedness index for each unit, a
python script was written to convert each raster cell into a point keeping the elevation value,
to select the adjacent values using a distance tool, to implement the stated equation to every
single point, to spatially join the points to their spatial units and to calculate aggregated
indicators (mean and standard deviation) per each continuous units.
Slope was an alternative measure of topographical heterogeneity. In order to calculate
the slope variable for each unit, a python script was written to convert the elevation into a
slope raster, to split the raster using the continuous units, to calculate the raster properties
(mean and standard deviation) of all the cells in each sub-raster, and to aggregate the
information obtained in a text �le. The obtained results for both ruggedness and slope are
displayed at the end of this note. As the reader will appreciate, the scale of the indices
is di�erent (1 - 2 times) but the geographical pattern is rather similar. In this regard, we
Figure 6: Slope and ruggedness measures
used for the paper those variables derived from slope measures because the time spend in
calculations was rather lower.
A.3 Additional results
Table 9: Di�erent speci�cations for railway variables
Dep. var.: unit pop. growth 1841 to 1891 (1) (2) (3) (4)
coe� coe� coe� coe�
variable (t-stat) (t-stat) (t-stat) (t-stat)
Indicator distance to railway station in 1841
Table 10: Heterogeneity speci�cations: railway access and 1851 occupational shares
Dep. var.: unit pop. growth 1841 to 1891 (1)
Indicator distance to railway station in 1851