Infrastructure access and population growth · Dan Bogart , Xuesheng You y, Eduard Alvarez z, Max...

Infrastructure access and population growth:evidence from nineteenth century England and Wales

Dan Bogart∗, Xuesheng You†, Eduard Alvarez‡, Max Satchell�, and Leigh Shaw-Taylor¶

Draft, September 26 , 2018‖

Abstract

The English and Welsh economy underwent a remarkable urbanization in the nine-teenth century. We examine how infrastructure access contributed to long-run popu-lation change and interacted with local characteristics. Unlike prior works we examinethe e�ects of being close to several infrastructures, including railway stations, turnpikeroads, inland waterways, and ports. We also develop a least cost path instrument toaddress the endogenous placement of railway stations. Our estimates show that beingclose to railway stations and other infrastructures signi�cantly increased populationgrowth from 1841 to 1891. We also show the e�ects of infrastructures were larger formore densely populated localities in 1841 and those with a lower share of male agri-cultural employment. The results are consistent with economic geography and trademodels that emphasize increases in agglomeration with lower transport costs. Theyare also highlight the long-term e�ects of infrastructure. To illustrate we show thatlocalities close to infrastructures in the 1840s and 1850s have higher population densityin 2011.

Keywords: Economic growth, railways, transport, spatial reorganizationJEL Codes: N4, O18, R11

∗Corresponding author. Associate Professor, Department of Economics, UC Irvine, [email protected]†Research Associate, Faculty of History, University of Cambridge, [email protected]‡Senior Lecturer, Economics and Business, Universitat Oberta de Catalunya, [email protected]�Research Associate, Dept. of Geography, University of Cambridge, [email protected]¶Senior Lecturer, Faculty of History, University of Cambridge, [email protected]‖Data for this paper was created thanks to grants from the Leverhulme Trust (RPG-2013-093), Transport

and Urbanization c.1670-1911, NSF (SES-1260699), Modelling the Transport Revolution and the IndustrialRevolution in England, the ESRC (ES 000-23-0131), Male Occupational Change and Economic Growth inEngland 1750 to 1851, and ESRC (RES-000-23-1579) the Occupational Structure of Nineteenth CenturyBritain: Grant. We thank Walker Hanlon, Gary Richardson, Petra Moser, Kara Dimitruk, Arthi Vellore,Alan Rosevear, and Elisabet Viladecans Marsal for comments on earlier drafts and seminar participants atUC Irvine, UC San Diego, NYU, Florida State, Trinity College Dublin, Queens Belfast, the University ofLos Andes, and EHA Meetings.

1

1 Introduction

Over the last 50 years there has been a dramatic rise in urbanization. According to the

United Nations, the share of the world's population living in urban areas increased from 30%

in 1950 to 54% in 2014. Much of this urban growth has occurred in developing countries

and beyond the metropolitan areas, which act as the main labor market but not necessary

the main place of residence (Demographia World Urban Areas 2018).

Currently developed economies underwent a similar process of urbanization in their past.

For example, in England and Wales the urban percentage (people in towns of 5000 or more)

rose from 30% in 1801 to 57% in 1871 (Shaw-Taylor and Wrigley 2014). London accounted

for some urban growth with its percentage of the national population increasing from 11%

in 1801 to 15% in 1871. Much of remaining urban growth occurred near the northern

industrial towns like Manchester, Leeds, Nottingham, and Birmingham. These places were

already densely populated in 1801 but they became more dense by centuries end.

History o�ers a unique perspective to probe the drivers of urbanization. This paper ex-

amines the role of transport infrastructure in creating local di�erences in population growth

during the nineteenth century and up to the present. The building of railways is perhaps

the most famous transport improvement. Railways rapidly spread through the English and

Welsh (EW) economy in the 1830s, 40s, and 50s. They lowered inland transport costs and

expanded trade and travel between regions. The following fact illustrates the impact of rail-

ways. In 1850, 98.4% of coal imported into London came by sea and 1.6% by rail. By 1870

the majority of coal into London (55.7%) came by rail.1 While railways were revolutionary,

they were not the �rst major infrastructure improvement in EW. Between 1750 and 1830

there were substantial investments in roads, ports, and canals. They developed urban areas

before railways and provided transport alternatives during the railway era.

We address the following questions: (1) how did the placement of infrastructures within

1These �gures are reported in Hawke (1970, p. 168).

30 minute walking distance of a locality change its population growth relative to other

localities which shared similar geographic and economic characteristics but were farther

away from infrastructures? (2) How did a locality's initial characteristics change the e�ects

of nearby infrastructures. In order to answer these questions, we introduce a new data set.

First, we create new continuous units which allow for dis-aggregated analyses of population

in every census year from 1801 to 1891 and male occupational structure in 1851 and 1881.

We observe population and occupational data for 9489 units. They are slightly larger than

parishes and townships, but signi�cantly smaller than 625 registration districts which are

often studied in the literature. Second, we incorporate new GIS data on railways lines and

stations, turnpike roads, inland waterways, and ports. In particular we able to measure the

distance between units and each infrastructure type across decades. Third, we add new data

on geographic characteristics, like the location of exposed coal �elds, and add existing data

on elevation, soil types, rainfall, and temperature.

The new data set is used to estimate a long-di�erences speci�cation, where population

growth from 1841 to 1891 is regressed on indicators for being within 2 km of infrastructure

(rail, roads, waterways, and ports) plus geographic and structural controls. Importantly our

preferred model also includes �xed e�ects for the registration district. That implies we are

controlling for unobservable factors common to all units in a district (about 250 square km).

While our baseline model includes a rich set of controls, we are still concerned that proximity

to railways is correlated with unobservables contributing to population growth.2 We address

endogeneity in part by employing propensity score matching techniques. More directly,

we construct a Least Cost Path (LCP) connecting large towns in 1801 and incorporating

the added costs of building railways over sloping terrain. Distance to the LCP is a good

instrument for distance to stations because it identi�es units that were close to stations

mainly because they were near favorable routes for connecting large towns.3

2Turnpike roads, inland waterways, and port infrastructures were mostly built in the decades before 1841and so we are less concerned about their correlation with unobservables causing growth from 1841 to 1891.

3Our methodology draws on the so-called inconsequential place approach and other studies which least

2

We document several new �ndings. First our baseline model shows that being within

2 km of a railway station in 1851 increased population growth from 1841 to 1891 by 16

log points. The matching and IV estimates suggest a larger e�ect and so this estimate

appears to be a lower bound. Combining the growth estimate with the rate of convergence

implies that units within 2 km of stations had 115 log points higher population levels in

1891 compared to units that were farther away.

Second, we show that access to pre-rail infrastructures also contributed to population

growth. Speci�cally being within 2 km of inland waterways and turnpikes increased pop-

ulation growth between 1841 and 1891 by 6 and 4 percentage points respectively. That

translates into population levels that were 40 and 28 log points higher in 1891.

Third, we show the e�ects of railways and other infrastructures were larger for more

densely populated units in 1841 and units that had a lower share of male agricultural em-

ployment. The latter results are consistent with New Economic Geography (NEG) models

that emphasize increases in agglomeration with lower transport costs. They are also consis-

tent with the literature on nineteenth century globalization, which emphasizes EW's shift

out of agriculture and into manufacturing as trade costs fell due to railways and steamships

(O'Rourke and Williamson 2002, Pascali 2016).

Our results contribute to the literature studying railways and growth in England and

Wales.4 Previous studies have documented a correlation between proximity to railways and

local population growth, but none has addressed endogeneity and confounding factors in

a systematic manner.5 We introduce new data and estimation techniques, including an

instrumental variable for the location of railway lines. We also add to a broader literature

on English industrialization. A wide range of factors are discussed like endowments, access

cost paths as instruments for infrastructure. See Chandra and Thompson (2000), Michaels (2008), Faber(2014), and Lipscombe et. al. (2013).

4For an introduction see Hawke (1974), Simmons (1986), and Aldcroft and Freeman (1991).5See Gregory and Marti Henneberg (2010), Casson (2013), and Alvarez et. al. (2013) for existing studies

examining spatial impacts of railways in England and Wales.

3

to markets, human capital, and institutions.6 By building an extensive new data set we are

able to show that infrastructures had signi�cant e�ects even after controlling for endowments

like coal.

Our paper is also novel by incorporating pre-rail infrastructures. To our knowledge no

study on railways and nineteenth century growth analyzes pre-rail infrastructures in such

a detailed way.7 Why are pre-rail infrastructures important? We show that the e�ects

of railways are over-stated when they are absent in the analysis. Also including pre-rail

infrastructures as controls is crucial to identifying the e�ects of railways.

This paper also complements existing studies which analyze the network e�ects of trans-

port infrastructure through market access.8 While network e�ects are omitted in this study,

it should be stressed that local e�ects were quite relevant in the nineteenth century as towns

lobbied extensively for and against having railway stations and canals nearby. The local

e�ects are also relevant in the twenty-�rst century as the placement of new infrastructure

projects alters local population geography.

Finally our study contributes to the literature seeking to understand urbanization in

contemporary contexts.9 We provide evidence that infrastructure access is one of the key

drivers of long-run outcomes. As one indication we conclude with a `persistence' regression,

where population density in 2011 is regressed on proximity to nineteenth century infrastruc-

tures. We �nd that being within 2 km of mid-nineteenth century infrastructures is associated

with signi�cantly higher population levels in 2011 even after including a rich set of controls.

Investments in infrastructure in�uence population geography long into the future.

6See Crafts and Mulatu (2006), Fernihough and Hjortshøj O'Rourke (2014), Crafts and Wolf (2014),Klein and Crafts (2012), Becker, Hornung, and Woessmann (2011), Heblich and Trew (2017).

7See Donaldson (2014) for India, Gutberlet (2014) for Germany, Berger and En�o (2015) for Sweden,Herranz-Loncán (2006) for Spain, Tang (2014, 2017) for Japan, Hornung (2015) for Prussia, and Attack andMargo (2010, 2011) for the US.

8See Donaldson and Hornbeck (2016) and Jaworski and Kitchens (2017) for two examples.9See Redding and Turner (2014) for an overview. Some papers of related interest include Duranton and

Turner (2012), Faber (2014), Jedwab et. al. (2015), Storeygard (2016), and Baum-Snow et. al. (2017).

4

Figure 1: Evolution and size of infrastructure networks in England and Wales 1700-1890

Sources: see data section.

2 Background on infrastructure

England and Wales (EW) had a well developed transport network long before its railway

network grew. Figure 1 shows the length of turnpike road, inland waterway, and railway

networks from 1700 to 1890. In this background section, we discuss each of these networks

and ports.

Turnpikes are gates that prevent road users from passing without paying a toll. In the

EW context, turnpike trusts were granted rights to �nance road improvements by levying

tolls. Their powers came from an act of parliament and they generally managed the main

roads. As �gure 1 shows the turnpike network grew mainly between 1750 and 1830. At

their peak in the mid-nineteenth century there were 38,000 km of turnpike road managed

by 1000 di�erent trusts. The tolls forced road-users to pay for the improvement and upkeep

of roads. But as it turned out, the bene�ts to road users from improved roads substantially

5

outweighed the burden of the tolls. Travel times and freight charges declined by over 40 per

cent between 1750 and 1800 (Bogart, 2005a, b).

One important feature of turnpike roads was their local promotion and �nancing. Landown-

ers and merchants lobbied parliament to authorize turnpike bills. They also provided the

�nancing and received no subsidies from the central government. Many had to choose their

turnpike routes carefully in order to avoid �nancial losses. Turnpike trusts faced severe

competition from railways starting in the 1840s. Most disbanded by the 1870s and 80s,

reducing the number of turnpike roads to near zero. Responsibility for maintaining their

roads passed to rural sanitation districts.

The inland waterway network developed more gradually between 1700 and 1830. Around

1700 EW had a large system of navigable rivers including the Thames, Severn, and Trent

(Willan, 1964). River navigations, or improved rivers which bypassed di�cult sections, were

added in the early 1700s. Canals or arti�cial waterways were built starting in the 1760s.

Like turnpike roads, they were promoted by local landowners and merchants. In one of

the most famous examples, the Duke of Bridgewater aimed to connect his coal mines at

Worsley with Manchester. The early canals like the Bridgewater were a �nancial success

and many joint stock canal companies were proposed in parliament in the 1790s. Many of

these completed their canals in the early 1800s.

In terms of location, there were several long distance canals linking important centers.

One example is the Leeds and Liverpool canal, which connected the woolen textile towns

around Leeds with the cotton textile towns around Manchester and the Atlantic port Liver-

pool. Another example is the Grand Junction canal, which greatly shortened the waterway

distance between London and Manchester and avoided narrow sections. It is worth em-

phasizing that canals brought low cost transport to inland regions. They were especially

important in the movement of bulky-low value goods, like coal. Consider the following fact:

shortly after the completion of the Bridgewater canal in 1761, the price of coal in Manchester

6

fell by half.

There was signi�cant investment in ports between 1760 and 1830. Much e�ort went

into building harbors and wet docks in London and Liverpool. But ports were developed

in other areas too. It is estimated there were 391 acres of wet dock space and 50 harbors

were being maintained in 1830. By contrast, England had no wet docks and a handful of

harbors in 1660 (Pope and Swann 1960). Port infrastructures complemented improvements

in shipping technology. One indication is the greater speeds achieved by sailing vessels in

the early 1800s (Solar 2013). Later came the steamships that would revolutionize trade and

travel across the oceans (Armstrong 2009, Pascali 2016).

The �rst steam powered rail service open to the public came in 1825 in the northern coal

mining region between Stockton and Darlington. In 1830, the Liverpool and Manchester

railway was opened to facilitate passenger tra�c. Several other railways connecting large

towns were promoted in the mid 1830s. The organization form was the same as canals.

Railways were built and operated by joint stock companies. The rail network expanded

dramatically following the `Railway Mania' of the mid-1840s. The signi�cance of the Mania

can be seen in Figure 1 through the growth of track mileage. By 1851 regional rail networks

had formed around the large towns in addition to more trunk lines being opened.

Railways marked a major improvement over the two inland modes of transport. For

example, rail freight rates in 1870 were one-tenth road freight rates in 1800 in real terms.

Railways were 15 times faster than barges on inland waterways (Bogart 2014). Most of all

railways had their greatest impact in passenger travel. They were able to displace long-

distance stage coach services. Shortly after the Mania, between 1845 and 1850, the number

of passenger journeys by rail increased by 117%, and again by 65% between 1850 and 1855.

It is important to note that railway companies were pro�t-seeking. Like their predecessors

who built canals and roads, their projects were privately �nanced. As a result, railway

companies selected their routes considering the �nancial bene�ts and costs. Their records

7

suggest a distinction was made between the �original line,� which often aimed to connect

large trading towns, and the �branch lines,� which linked smaller towns to the original lines.

Railway companies preferred original lines, but were sometimes pressured to build branches

through the parliamentary approval process. One promoter advised the following, �stick to

the original line; keep down the capital, and let competing schemes do their worst (quoted

in Simmons 1986, p. 271). Railway companies sometimes engaged in strategic building to

maintain their regional dominance.10 As a result, certain branch railway lines were built

even though they were unlikely to be individually pro�table. The details of railway building

will be important later as we consider endogeneity concerns.

3 Data

Our population data come from British censuses, available every decade starting in 1801.

They have been digitized at the parish level up to 1911 by Tony Wrigley at the Cambridge

Group for the History of Population and Social Structure (CamPop). The population data

are complemented by similarly detailed occupational data. The census published parish-level

occupational counts starting in the early nineteenth century. The counts for 1851 and 1881

are available through the Integrated Census Micro data (ICeM) project (see Schürer and

Higgs 2014). We focus on male occupations because there is more agreement in the literature

on their classi�cation.11 Male occupations are classi�ed into employment categories using

the primary, secondary, and tertiary (PST) coding system.12

The administrative units recording population and occupations from 1801 to 1891 are not

always the same across time. The sources report data sometimes in parishes, sometimes in

10For the literature on the railway mania see Casson (2009), Odlyzko (2010), Campbell and Turner (2012,2015)

11See You (2014) for a new analysis of female employment in the 1851-1911 census.12Primary normally includes agriculture and mining, but we separate these two. Secondary refers to

the transformation of the raw materials produced by the primary sector into other commodities, whetherin a craft or manufacturing setting. Tertiary encompasses all services including transport, shop-keeping,domestic service, and professional activities. The PST system is described in detail in Shaw Taylor et. al.(2014) and Wrigley (2015).

8

townships within parishes, and sometimes in parishes that were later sub-divided. Wrigley,

Satchell, and collaborators at CamPop have created continuous parish units between 1801

and 1891 and linked them with census population data (see Satchel et. al. 2016 for details).

Using similar spatial matching techniques, we create a consistent set of boundaries for 9489

units that map population from 1801 to 1891 and male occupations in 1851 and 1881.13 We

call these `units' for short. Units are 15 square km on average and they belong to a larger

jurisdiction called registration districts. Districts are about 250 square km on average and

we have 616 registration districts in our data.

This paper also uses new GIS data on infrastructures. These include turnpike roads

in 1800 and 1830, waterways in 1800 and 1830, ports in 1826 and 1842, and railway lines

and stations in every census year starting in 1831.14 All these networks are created using

historical sources, not modern maps. For the analysis a straight line is drawn from the

center of each unit to its nearest infrastructure type. The unit center corresponds to the

market square if it had a town or the centroid if the unit had no town.15

Another key feature of our study is the incorporation of geographic data. For each unit

we create variables for being on exposed coal�elds, being on the coast, ruggedness, average

rainfall, average temperature, an index for wheat suitability, and the share of land in dif-

ferent soil types. We call these �rst-nature variables following the literature. Coastal units

are identi�ed using shape�les for parish boundaries in England and Wales. The ruggedness

measures include average elevation in the parish, the average elevation slope in the parish,

13Ms Gill Newton, of the Cambridge Group, developed the Python code for Transitive Closure as partof the research project `The occupational structure of Britain, 1379-1911' based at the Cambridge Group.Xuesheng You implemented this code for this particular paper.

14For a description of the turnpike GIS see Rosevear et. al. (2017). For description of the inland waterwaysdata see Satchell (2017). For a description of ports see Alvarez et. al. (2017). For railways see del Río, Martí-Henneberg, and Valentín (2008) for an initial description of the railways shape-�le data. Additional upgradeswere produced by the Cambridge group for the history of population and social structure (CamPop), seehttp://www.campop.geog.cam.ac.uk/research/projects/transport/data/railwaystationsandnetwork.html.

15We identify if a market existed at some point between 1600 and 1850. This applies to 746 of the 9489units. The centroid is taken as the unit center if there was no market. It should be noted that little erroris introduced by using the market or the centroid since units are so small. For a description of towns seehttp://www.campop.geog.cam.ac.uk/research/projects/transport/data/towns.html.

9

and the standard deviation in elevation slope. Appendix 3 provides a description of rugged-

ness measures, specially constructed for this paper. The share of parish land in 10 soil types

is from Avery (1980) and Clayden and Hollis (1985) and digitized by the National Soils Map

of England and Wales.16 Rainfall, temperature, and wheat suitability come from FAO.17

Of special signi�cance, Satchell and Shaw Taylor (2013) have generated a GIS shape�le of

exposed coal�elds from British Geological Survey data. It identi�es geographic areas in

England and Wales where coal bearing strata are not concealed by rocks laid down during

the Carboniferous Period. In economic terms, they represent the coal�elds that were known

in the nineteenth century and could be exploited by contemporary technology.18

We have another set of unit-level variables which we call second nature factors. These

include distance to the nearest major city in 1801. Major cities include the top ten cities in

terms of 1801 population. Also included are 1851 male employment shares. We use 5 main

occupational categories: (1) tertiary, (2) agriculture, (3) secondary, (4) mining/forestry, and

(5) unspeci�ed.

Table 1 reports summary statistics for the main variables. The population variables are

expressed in natural log di�erences between the years 1841 and 1891 and 1841 and 2011.

These approximate population growth but diminish outliers. As a check we can show that

our data replicates accepted national trends. For example, the share of the population living

in `urban units' (places with at least 400 persons per square km) increased from 42% in 1841

to 68% by 1891. It is thought that population generally declined outside the major towns.

The median population growth from 1841 to 1891 across all our units was -0.09%. Finally,

there is a view that population became more concentrated. The share of the total population

16The 10 soil categories are (1) Raw gley, (2) Lithomorphic, (3) Pelosols, (4) Brown, (5) Podzolic,(6) Surface-water gley, (7), Ground-water gley, (8) Man made, (9) peat soils, and (10) other. Seehttp://www.landis.org.uk/downloads/classi�cation.cfm#Clayden_and_Hollis. Brown soil is the most com-mon and serves as the comparison group in the regression analysis.

17See the Global Agro-Ecological Zones data at http://www.fao.org/nr/gaez/about-data-portal/agricultural-suitability-and-potential-yields/en/.

18The GIS does not capture a handful of tiny post carboniferous coal deposits,such as that at Cleveland (Yorkshire) which was worked in the 19th century. Seehttp://www.campop.geog.cam.ac.uk/research/projects/transport/data/coal.html for more details.

10

living in the top 1% of units increased from 5.4% in 1841 to 11.3% in 1881.

The summary statistics indicate that railway access di�ered across space in 1851. The

mean distance to an 1851 station is 10.4 km and 13.6% of units had a station within 2 km.

It was more common to be within 2 km of inland waterways and turnpike roads. In fact,

66% of units were within 2 km of a turnpike by 1830. As expected only 2.7% of units were

within 2 km of port.

The summary statistics for second nature controls indicate that the average unit had

a male agricultural employment share equal to 0.55. This �gure is much higher than the

aggregate agricultural share reported by Shaw-Taylor and Wrigley (2014). They report that

19% of adult males in England and Wales worked primarily in agriculture in 1871. The

reason is that secondary employment was far more concentrated than agricultural employ-

ment. In our units the top 1% accounted for 57% of male secondary employment in 1851.

Not surprisingly, our average unit was not representative of the EW economy in terms of

occupational structure.

Is there any visual evidence that proximity to railways a�ected population growth?

Figure 2 provides an a�rmative answer. It shows railway lines and stations in 1851 along

with each unit's population growth from 1851 to 1881 (darker is higher growth). It is clear

that many areas which grew rapidly from 1851 to 1881 were close to railway stations. This is

most clear near Manchester, Birmingham, and London. However, the East Anglia region in

the upper right shows that having railway stations nearby did not guarantee high population

growth.

There are two other stylized facts worth mentioning. First, on average units that had

lower population density in the early nineteenth century grew more. However, there is one

density range where this was not true. Medium to high density units in the early nineteenth

century grew more than the middle or very large density units. The left panel in �gure 3

illustrates. It plots the log di�erence in 1891 and 1841 population density on the y-axis and

11

Table 1: Summary statisticsVariable Obs. Mean Std. Dev. Min Max

Population growth and occupational change variables

Ln di�. population 1841 to 1891 9489 0.0100 0.5136 -3.0796 4.8742

Ln di�. population 1841 to 2011 9488 0.5554 1.1906 -5.0739 5.9113

Infrastructure variables

Distance to rail station in in 1851 km 9489 10.456 11.065 0.0215 73.129

Distance to GM LCP km (IV) 9489 11.861 16.548 0.0001 116.38

Indicator distance to rail station in 1851<2km 9489 0.1361 0.3429 0 1

Indicator distance to inland waterway in 1830 <2km 9,489 0.2334 0.4230 0 1

Indicator distance to turnpike road in 1830<2km 9,489 0.6628 0.4727 0 1

Indicator distance to port in 1842 <2km 9,489 0.0274 0.1632 0 1

First-nature controls

Indicator exposed coal 9489 0.0802 0.2716 0 1

Indicator coastal unit 9489 0.1479 0.355 0 1

Elevation 9,489 89.721 74.025 -1.243 524.38

Average elevation slope within unit 9489 4.7675 3.6157 0.4849 37.427

SD elevation slope within unit 9489 3.4324 2.7174 0 23.175

Average rainfall 9484 755.71 191.77 555 1424.3

Average temperature 9,484 8.9582 0.6580 5.5 10

Wheat suitability (low input level rain-fed) 9484 2188.17 273.25 272 2503

Land area in sq. km. 9484 15.638 22.181 0.0032 499.84

Perc. of land with Raw gley soil 9489 0.0847 1.3279 0 76.496

Perc. of land with Lithomorphic soil 9489 8.6151 19.830 0 100

Perc. of land with Pelosols soil 9489 8.2038 20.637 0 100

Perc. of land with Podzolic soil 9489 4.6249 14.326 0 99.565

Perc. of land with Surface-water gley soil 9489 24.632 29.460 0 100

Perc. of land with Ground-water gley soil 9489 10.187 20.117 0 100

Perc. of land with Man made soil 9489 0.3638 3.2621 0 94.990

Perc. of land with Peat soil 9489 1.1875 5.2798 0 91.440

Perc. of other soil 9489 0.5354 1.9668 0 65.153

Second nature controls

Ln 1841 population per sq. km 9489 4.2090 1.3461 0.8052 11.53

Share of male tertiary empl. in 1851 9489 0.1496 0.1095 0 0.941

Share of male secondary empl. in 1851 9489 0.1960 0.1230 0 0.800

Share of male agricultural empl. in 1851 9489 0.5534 0.2278 0 1

Share of male mining & forestry empl. in 1851 9489 0.0254 0.0767 0 0.745

Share of male unspeci�ed empl. in 1851 9489 0.0749 0.0901 0 0.760

Ln distance to major city 9487 4.7567 0.6203 0.5944 6.037

Sources: see text.

12

Figure 2: Railways and parish population growth from 1851 to 1881

Sources: see text.

the log of 1841 population density on the x-axis. It also plots the locally weighted curve

�tting this data. There is a `hump' in the curve for medium to high 1841 density units,

roughly in the 75th to 90th percentiles.

The second fact to note is that units with a higher share of male agricultural employment

grew less. The right-hand graph in �gure 3 illustrates. It plots the share of male 1851

agricultural employment on the x-axis and the log di�erence in population on the y-axis.

The locally weighted curve has a negative slope between 0.2 and 0.9 agricultural shares.

Later we will see that the e�ects of railways stations are related to these two facts.

4 Baseline results

In this section, we examine how population growth was a�ected by access to infrastructure.

We begin by analyzing the following long di�erences speci�cation:

13

Figure 3: Plots of population growth against initial population density and agriculturalemployment shares

Sources: see text.

yi1891 − yi1841 = βI(Station < 2km)i1851 + β2I(Prerail < 2km)i1830 + γxi + di + εij (1)

where yi1891 − yi1841 is the natural log di�erence in 1891 and 1841 population. The initial

year of 1841 is chosen because there were few railway stations open in the census year

1831. 1891 was chosen because it is the last historical date for which we have data. The

main explanatory variable is the indicator I(Station < 2km)i1851 equal to one if unit i is

within 2 km distance of a railway station in 1851. 1851 is chosen because the rail network

underwent a major expansion in the 1840s due to the Railway Mania. 2 km is chosen

because it takes approximately 30 minutes to walk 2 km. We think 30 minutes represents

an average commute time for individuals who worked near the station or for �rms carting

their goods to the station for quick delivery. Below we also consider greater distances than

2 km and variables for station density to see if the main conclusions change. The summary

variable I(Prerail < 2km)i1830 includes I(turnpike < 2km)i1830, I(waterway < 2km)i1830, and

I(port < 2km)i1842. They are 3 indicators identifying whether a unit is within 2 km distance

from turnpike roads, canals, and ports c.1840. Together the coe�cients measure whether

14

walking distance to stations had a larger e�ect than walking distance to other transport

infrastructures.

There are two sets of control variables included in xi. The �rst nature controls are listed

in the summary statistics and capture geographic endowments. Coal is perhaps the most

crucial as it thought to be linked with industrialization (Wrigley 2010). The second nature

controls are also listed in the summary statistics and include the log of population density

in 1841 and the log distance to the nearest major city in 1801. The coe�cient on the log

of 1841 population density tests whether smaller units in 1841 grew faster than larger units

over the next 50 years. The second nature controls also include the share of 1851 male

occupations in 4 main categories. The omitted group is the share in secondary.

Our list of explanatory variables is large but nevertheless there are some factors that

cannot be measured. Therefore it is useful to include district FEs di controlling for any

unobservables that are speci�c to the district surrounding the unit. In particular, we are

netting out di�erences in market access that are shared by all units in a district. For

example, if a district is close to the main rail line to Manchester or London then all units

in the district will have greater market access to a degree. Thus we are identifying the local

e�ect of being near a station.

The main coe�cient estimates for equation 1 are shown in table 2. Robust standard

errors are reported in the �rst four columns because they omit district FEs. Column (1) is

our most parsimonious speci�cation. It only includes the indicator for units within 2 km of

stations in 1851. The estimate shows that units within walking distance of stations in 1851

had 24.5 higher log points of population growth from 1841 to 1891. Column (2) is the same

as (1) but also includes the indicators for distance to pre-rail infrastructures. Being within

2 km of inland waterways, turnpike roads, and ports had positive and signi�cant e�ects on

population growth equal to 9.1, 6.4, and 13.0 log points respectively. The railway e�ect

falls to 18.6 log points. It is worth emphasizing what is learned from speci�cation (2). The

15

e�ect of railway station access is overstated without including measures for access to pre-rail

infrastructures.

Table 2: Access to railways, infrastructures, and local population growth: baseline estimates

Dep. var.: unit pop. growth 1841 to 1891 (1) (2) (3) (4) (5)

coe� coe� coe� coe� coe�

variable (t-stat) (t-stat) (t-stat) (t-stat) (t-stat)

Indicator dist. to rail station in 1851<2km 0.245*** 0.186*** 0.173*** 0.199*** 0.159***

(10.62) (8.03) (8.53) (10.00) (8.06)

Indicator dist. to inland waterway in 1830 <2km 0.0906*** 0.0807*** 0.0680*** 0.0560**

(6.07) (5.69) (4.84) (2.91)

Indicator dist, to turnpike road in 1830<2km 0.0642*** 0.0643*** 0.0468*** 0.0387***

(6.65) (6.47) (4.97) (3.77)

Indicator dist. to port in 1842 <2km 0.130* 0.00922 0.0411 0.0949

(2.31) (0.17) (0.80) (1.54)

First nature controls No No Yes Yes Yes

Second nature controls No No No Yes Yes

District Fixed e�ects No No No No Yes

N 9489 9489 9484 9482 9482

Notes: * p<0.05, ** p<0.01, *** p<0.001. Robust standard errors are reported in columns (1)-(3). Thestandard errors in column (4) are clustered on the district.

We now consider speci�cations that include more controls. Column (3) is the same as

(2) but includes the �rst nature controls. The main infrastructure coe�cients change little

except for ports, which becomes close to zero. Examining this speci�cation more closely

we �nd that being coastal is correlated with being within 2 km of a port. The estimates

show that being coastal is associated with 25.9 log points more population growth and

must capture the e�ects of port access in part. Column (4) is the same as (3) but includes

second nature controls. The main infrastructure coe�cients change little. We do �nd that

occupational structure in 1841 mattered. Consistent with the graph above, areas with a

higher share of 1851 male agricultural employment grew less.

The speci�cation in column (5) is our preferred because it adds 616 district �xed e�ects

(FEs). In this model, the standard errors are clustered on registration districts. The co-

16

e�cients on stations, turnpike roads, and inland waterways decline somewhat. Now being

within walking distance to railway stations is estimated to have increased growth by 15.9

log points, while walking distance to inland waterways and turnpike roads is estimated to

have increased growth by 5.6 and 3.8 log points. These coe�cients can be converted into

a level e�ect by 1891. The coe�cient on the log of 1841 population density is -0.138 in

model (5). Its absolute value can be interpreted as the rate of convergence. Dividing the

station coe�cient 0.159 by 0.138 gives a value of 1.15. Thus population levels for units near

stations are estimated to have been 115 log points higher in 1891. By the same calculation

population levels for units near waterways and turnpikes were 40 and 30 log points higher

in 1891.

Overall the estimates suggest that being within walking distance of railways had larger

e�ects that walking distance to other infrastructures. The magnitudes suggest the e�ect of

proximity to railways was twice as large as proximity to waterways and more than three times

as large as proximity to turnpike roads. While railways were important, it is nonetheless

signi�cant that pre-rail infrastructures mattered for population growth from 1841 to 1891.

The latter had been built up decades or even a century before railways.

The conclusions are similar using di�erent indicators for walking distance to railway

stations. In the appendix we report models that use indicators for units within 2 km of

1841 or 1861 stations instead of 1851 stations. For context, 4.6% of units had an 1841

railway station within 2 km, 13.6% had an 1851 station within 2 km, and 19.7% had an

1861 station within 2 km. The coe�cients on railway station access are similar. It does

not seem to matter a great deal whether we use 1841, 1851, or 1861 stations to de�ne our

railway treatment group.

Another speci�cation considers whether a unit has any railway line in its boundaries.

The same model with all controls and FEs shows they had 16.4 more log points of growth.

One might �nd it surprising that indicators for being within 2 km of a station and having any

17

railway line have similar estimated e�ects. We think the high station density and relative

uniformity across lines in England and Wales meant that units close to railway lines were

generally close to stations. We also check whether station density within a unit matters.

We de�ne an indicator if the unit has exactly one station and another if the unit has more

than one station. The results show that units with more than one station had 27 log points

higher growth than units without any stations. This result makes sense since greater station

density o�ered more local and long distance connections.

Our baseline model considers walking distance to infrastructures and treats all other

distances the same. Next we consider a model using three distance bins: 0 to 2 km, 2 to 4

km, and 4 to 6 km. These represent approximately 30 minutes, 60 minutes, and 90 minutes

walking distance to rail stations, turnpikes, waterways, etc. Some individuals would have

been willing to commute 60 or 90 minutes and so population growth could be higher at 4

or 6 km from a station. Also �rms with lower value goods might be willing to locate 60 or

90 minutes away from the station because it was not essential to deliver goods quickly.

The results for the main variables are reported in table 3. The omitted group in these

regressions are units more than 6 km from infrastructures. The model also includes �rst and

second nature controls and district FEs. We �nd that being 2-4 and 4-6 km from a railway

station increased population growth, but less so compared to the e�ect of being 0-2 km from

stations. In other words, population growth diminished as distance to stations increased

up to 6 km. The same pattern is found for waterways although the e�ects of population

growth are smaller than railways in all distance bins. Being 2-4 km from a port has the same

e�ect as being 0-2 km. Interestingly, being 2-4 km from a turnpike signi�cantly decreased

population growth relative to areas beyond 6 km. Note that 7% of units were more than 6

km from an 1830 turnpike so this comparison could be biased by small numbers.

18

Table 3: Population growth at increasing distances from railways and infrastructures

Dep. var.: unit pop. growth 1841 to 1891 (1) (2) (3)

coe� coe� coe�

variable (t-stat) (t-stat) (t-stat)

rail station in 1851 <2km 0.216*** inl. waterway in 1830 <2km 0.0764***

(9.38) (3.38)

rail station in 1851 >2km & <4km 0.0999*** inl. waterway in 1830>2km & <4km 0.0479**

(5.74) (2.88)

rail station in 1851 >4km & <6km 0.0350* inl. waterway in 1830>4km & <6km 0.0361**

(2.42) (2.80)

Port in 1842 <2km 0.164* turnpike road in 1830 <2km -0.0124

(2.34) (-0.57)

Port in 1842>2km & <4km 0.165*** turnpike road in 1830>2km & <4km -0.0516*

(3.64) (-2.45)

Port in 1842>4km & <6km 0.0155 turnpike road in 1830>4km & <6km -0.0343

(0.68) (-1.78)

First nature controls Yes

Second nature controls Yes

District Fixed e�ects Yes

N 9482

Notes: * p<0.05, ** p<0.01, *** p<0.001. The standard errors are clustered on the district.

Overall these results suggest that being within 30, 60, or 90 minute walk of a railway

stations signi�cantly increased growth and had larger e�ects than waterways, turnpikes, and

ports. However, because the building of railway stations were more recent to 1841 there is a

concern that proximity to stations could be correlated with the error term The next section

addresses this endogeneity issue.

5 Addressing endogeneity

What factors in�uenced the placement of railway stations? The historical literature suggests

that railway stations were selected into units based on their level of population in the

early nineteenth century (see Simmons 1986). The e�ect of prior population growth is not

19

discussed, but it is most important for our analysis as growth is our dependent variable. We

formally test whether units close to railway stations had higher population growth decades

before getting railway stations. A simple di�erence in means test shows that population

growth from 1801 to 1831 was 6.8 log points higher for parish units within 2 km of railway

stations in 1851 (p-value is 0.000). The same `pre-trend' result holds in a regression including

district FEs and �rst and second nature controls. The implication is that our baseline

estimates for the e�ect of station distance are potentially biased.

We address this issue using two approaches. The �rst applies propensity score matching.

The treatment variable is being within 2 km of an 1851 railway station. The outcome

variable is population growth from 1841 to 1891. We use a parsimonious set of covariates:

(1) population density in 1841, (2) the share of male agricultural employment in 1851, (3)

having exposed coal, and (4) population growth from 1801 to 1831. We match exactly one

nearest neighbor to the units within 2 km of 1851 stations using the logit model. This set

of covariates yields balanced matched sample. Table 4 shows the standard di�erences in

the covariate means are close to zero in the matched sample but not in the raw data. The

bottom rows of table 4 show the average di�erence in means for population growth from

1841 to 1891. In the raw data units within 2 km of stations have 24.4 log points higher

population growth. In the matched sample they have 20.6 log points higher growth. Thus a

parsimonious matching exercise implies slightly larger e�ects as the regression models using

a large number of control variables.

Our second and primary approach to endogeneity of stations uses an instrumental vari-

able derived from the `inconsequential places' approach.19 The key assumption is that some

inconsequential units became close to railway stations simply because they were on the orig-

inal line designed to connect larger towns at a low capital cost. In other words, they were

not selected because of their potential for future growth. The �rst step in creating the in-

19See Chandra and Thompson (2000), Michaels (2008), Faber (2014), and Lipscombe et. al. (2013).

20

Table 4: Matching estimator for e�ect of walking distance to railway stationsUnits within 2km 1851 stations (1 vs. 0)

Covariate Standardized di�erences�raw Variance ratio�raw

Ln pop. per sq. km 1841 1.0714 8.1842

Has exposed coal 0.2919 2.123

Share of 1851 male emp. in agric. -1.137 1.7532

Ln di�erence pop. 1831 and 1801 0.2301 2.5178

N 9,489

Covariate Standardized di�erences�matched Variance ratio�matched

Ln pop. per sq. km 1841 0.0069 1.010

Has exposed coal -0.0201 0.9384

Share of 1851 male emp. in agric. 0.0045 0.9222

Ln di�erence pop. 1831 and 1801 -0.0159 1.1530

N 9,485

Units within 2km 1851 stations (1 vs. 0)

Av. Ln di�, pop. 1891 and 1841 Di�erence in means�raw data Di�erence in means�matched data

(standard error) (robust standard error)

0.01004 0.244 0.206

(0.0151)*** (0.0218)***

N 9,489 9,485

Notes: * p<0.05, ** p<0.01, *** p<0.001.

strument is to select the towns that will be connected by railways. We start with all English

and Welsh towns having a population greater than 5000 in 1801.20 Their larger size meant

they were almost certain to get at least one railway line connecting them with another town

above 5000. But not all large town-pairs would be connected to each other. One reason is

that existing levels of trade and communication were often lower between distant towns or

towns of moderate size. Therefore a pro�t-seeking railway promoter would see little value

in building a railway to connect them. We use a simple gravity model (GM) to calculate

the relative value of connecting any town-pairs each with a population above 5000. The

equation for town pairs i and j is GMij =PopiPopjDistij

, where Distij is the straight line distance

between town i and j.

Next we identi�ed a least cost path (LCP) connecting town pairs above the threshold

GMij > 10, 000. We assume that in considering their routes, railway companies tried to

20The data come from Law (1967) and Robson (2006)

21

minimize the construction costs considering distance and elevation slope. The baseline

model uses construction cost data for railways built in the 1830s and early 1840s. We

measure the distance of the lines and total elevation changes between towns at the two ends

of the line. The construction cost is then regressed on the distance and the elevation change

to identify the parameters (the details are in appendix 2). Based on this analysis we �nd a

baseline construction cost per km when the slope is zero and for every 1% increase in slope

the construction cost rises by three times the baseline (costperkm = 1+ 3 ∗ slope%). Next,

we use this formula to identify the least cost path connecting our town pairs. The end result

is a network of candidate railway lines linking towns.

The LCP network is shown in the right hand panel of �gure 4. The left hand panel shows

the real railway network in 1851. The overlap of the LCP and the 1851 rail network is fairly

high. Locations close to the LCP are also generally close to railway stations because they

were so numerous along the line.

The instrument must predict proximity to railway stations and it must satisfy the exclu-

sion restriction. First, we show that being within 2 km of the LCP predicts the likelihood of

a unit being within 2 km of a railway station. We estimate a regression similar to equation

(1) which includes �rst and second nature controls and district FEs. We �nd a strong posi-

tive relationship between the indicator for being within 2km of the LCP and the probability

of being within 2 km of an 1851 railway station. The coe�cient 0.082 with a t-stat equal to

6.01.

Second, we test whether distance to the LCP is correlated with population growth be-

tween 1801 and 1831 once the e�ect of pre-rail infrastructure distance, geography, and

district FEs are accounted for. If there is a signi�cant correlation that would raise questions

about the exclusion restriction. The results from several speci�cations are shown in table 5.

Note we exclude 364 units within 2 km of the the town nodes used to construct the LCP.

In column (1), being within 2 km of the LCP is positively and signi�cantly associated with

22

Figure 4: The rail network in 1851 and the least cost path (LCP) network

Sources: see text.

higher population growth from 1801 to 1831. Columns (2) and (3) show the same result

holds after including district FEs and �rst nature controls. However, in columns (4) and (5),

which include pre-rail infrastructures and second nature controls, being within 2 km of the

LCP is no longer signi�cantly associated with higher population growth from 1801 to 1831.

Why? Distance to the LCP is correlated with distance to turnpikes and inland waterways,

which themselves contributed to population growth from 1801 to 1831. Therefore, omitting

them leads to a spurious correlation between LCP distance and population growth from

1801 to 1831. Thus the exclusion restriction for within 2 km of the LCP is only defensible

in a model that includes distance to pre-rail infrastructures as controls.

23

Table 5: Pre-trend tests for the validity of the distance to LCP instrument

Dep. var.: unit pop. growth 1801 to 1831 (1) (2) (3) (4) (5)

coe� coe� coe� coe� coe�

variable (t-stat) (t-stat) (t-stat) (t-stat) (t-stat)

Distance to LCP for railways <2k 0.0303*** 0.0188* 0.0178* 0.00982 0.00577

(4.85) (2.53) (2.32) (1.24) (0.81)

Indicator dist. to inland waterway in 1830 <2km 0.0362*** 0.0181*

(4.32) (2.44)

Indicator dist. to turnpike road in 1830<2km 0.0325*** 0.0111*

(5.49) (2.07)

Indicator dist. to port in 1842 <2km 0.0806** 0.0566*

(3.02) (2.18)

Units with 2 km of LCP nodes removed? Yes Yes Yes Yes Yes

First nature controls No No Yes Yes Yes

Second nature controls No No No No Yes

District Fixed e�ects No Yes Yes Yes Yes

N 9121 9121 9116 9116 9114

Notes: * p<0.05, ** p<0.01, *** p<0.001. Robust standard errors are reported in columns (1). Thestandard errors in columns (2)- (5) are clustered on the district.

The IV results are shown in table 6 along with the OLS for comparison. The OLS model

in (2) is identical to table 2 column (5), except it excludes units within 2 km of the LCP. The

Kleibergen-Paap F statistic is fairly large indicating that the �rst stage does not su�er from

a weak instruments problem. The IV estimate implies that being within 2 km of a railway

station caused population growth to rise by 35.5 log points. In OLS the same estimate is

16.5 log points. Note that the IV estimate is less precise, but it is statistically signi�cant at

the 10% level. We think the lower precision in the IV is expected. We need an instrument

to predict whether a unit is within 2 km of a station. That is highly demanding.

24

Table 6: Railway stations and population growth: IV estimates

Dep. var.: unit pop. growth 1841 to 1891 IV OLS

(1) (2)

coe� coe�

variable (t-stat) (t-stat)

Indicator distance to 1851 railway station <2km 0.355 0.165***

(1.83) (8.49)

Indicator distance to 1830 inland waterway <2km 0.0313 0.0488*

(1.18) (2.57)

Indicator distance to 1830 turnpike road<2km 0.0273* 0.0337***

(2.50) (3.55)

Indicator distance to 1842 port <2km 0.139* 0.141*

(2.08) (2.04)

Kleibergen-Paap rk Wald F statistic 33.12

Units with 2 km of LCP nodes removed? Yes Yes

First nature controls Yes Yes

Second nature controls Yes Yes

District Fixed e�ects Yes Yes

N 9118 9118


Why is the IV estimate larger? One speculation is that individuals anticipated the build-

ing of railways. In order to gain from increased property values or employment prospects

they might have moved to future railway units prior to 1841. In that case, one might expect

OLS to yield a downward estimate for population growth from 1841 to 1891. A similar

phenomenon has been documented for US railroads (Atack et. al. 2011).

6 Heterogeneous e�ects and mechanisms

The historical literature describes varying e�ects of railway stations on towns and rural areas

in England and Wales. Simmons (1986) says "the railway did not necessarily produce growth

in population or business. It might take people or business away (p. 16)." Heterogeneity

25

makes sense according to the new economic geography (NEG) literature.21 It argues that

agglomeration can increase as transport costs fall. The key factor is increasing returns to

scale. As an area gets larger it becomes more productive and hence more attractive for

consumers and �rms. The economy will not completely degenerate to a single location

however. Congestion and land constraints provide a check on the size of the largest cities.

Applying this framework to England and Wales, being close to railway stations should have

increased population growth more in initially dense units, although not the most dense units.

In order to test this hypothesis we estimate the e�ect of being within 2 km of railway

stations depending on a unit's 1841 population density. We estimate the following model.

yi1891−yi1841 = β0I(Station < 2km)i+β1I(Station < 2km)ilnpop41i+β2I(Station < 2km)i(lnpop41i)2+γxi+dj+εij

(2)

where the natural log of 1841 population density and its square are interacted with the

indicator for being within 2 km of an 1851 station. The quadratic formulation is �exible

and allows for non-linear e�ects. Note that 1841 population density and its square are

included as controls in xi, along with district FEs and �rst and second nature controls.

The estimates reveal an important result: being close to railway stations had a signi�-

cantly larger growth e�ect for units with medium to large population density in 1841. To

illustrate, we plot our predicted population growth from 1841 to 1891 for units between the

5th and 95th percentiles in 1841 population density. One prediction is for units less than 2

km from 1851 stations and the other is for units more than 2 km from stations (see �gure

5). Railways have their largest e�ect for population densities above the mean and approx-

imately between the 75th and 90th percentiles. Thus our results are consistent with NEG

forces, which suggest that by lowering transport costs railways leads to more agglomeration

in dense locations.

21See Fujita et. al. (2001) and Desment and Rossi-Hansberg (2014) for details.

26

Figure 5: Heterogeneity I: initial population density

Sources: see text.

There is additional evidence to support the NEG mechanism. Units with other infras-

tructures, like inland waterways and turnpikes, also had lower transport costs. Thus if the

same mechanism is at work we should also expect that being near inland waterways or being

near turnpikes should have increased growth more in units with medium to large popula-

tion density in 1841. We test this prediction for waterways using the same methodology as

equation (2). We interact the log of 1841 pop. density and its square with the indicator

for being within 2 km of an inland waterway. The results show a similar pattern. They are

summarized in the right hand panel of �gure 5.

Next we examine how heterogeneity was related to occupational structure drawing on

the globalization literature. As railways spread through EW, the global economy was be-

coming more open. England then exported more manufactured goods and imported more

agricultural goods. There were many reasons for these changes in trade. One was compar-

ative advantage. England had the most productive manufacturing sector in the world by

1850. Transport improvements were another. Steamships and railways provided better con-

nections between inland areas and the international economy. (O'Rourke and Williamson

2002). The connection between railways and grain imports is supported in tra�c data.

Hawke (1970, p. 128) estimates that imported wheat represented at least half of all wheat

27

hauled by English railways in 1865. All of this suggests being near railways created more

growth if the unit was more specialized in secondary sector and less growth if it was more

specialized in agriculture. We test this hypothesis with the following model

yi1891 − yi1841 = β0I(Station < 2km)i +

4∑k=1

βkI(Station < 2km)ioccshareki + γxi + dj + εij (3)

where occshareki is the share of 1851 male occupations in category k. Note that occupational

shares and pre-rail infrastructure are included as controls in xi.

The estimates show that being close to railway stations had a signi�cantly lower e�ect on

population growth for units with a higher share of agricultural employment. The coe�cient

estimates are reported in the appendix and here we illustrate the e�ects using an example.

We consider the occupational structure of the average unit (where average is de�ned by the

mean occupational shares across units) along with hypothetical units more specialized in

secondary, agriculture, and tertiary. The occupational shares for each type of unit are shown

in table 7. Next we predict population growth from 1841 to 1891 for each hypothetical unit

type with and without the treatment of being close to 1851 railway stations. The calculations

are reported in table 7. The hypothetical average unit grows 13 percentage points more being

close to stations, while the agricultural unit grows 7.4 percentage points more being close

to stations. The secondary unit grows the most from being close to stations, speci�cally 19

percentage points more.

These �ndings suggest that railways enhanced globalization forces in EW. Units more

specialized in the secondary sector had the greatest comparative advantage in world markets.

Railways helped these advantaged units to grow more. On the other hand, units more

specialized in agriculture had the least comparative advantage. Railways slowed their growth

and encouraged out-migration.

28

Table 7: Heterogeneity II: occupational structure in 1851Occ. shares in unit types

Occupation categories `average' `agricultural' `secondary' `tertiary'

agriculture 0.55 0.75 0.35 0.35

secondary 0.2 0.0 0.4 0.2

tertiary 0.15 0.15 0.15 0.35

mining/forestry 0.025 0.025 0.025 0.025

unspeci�ed 0.075 0.075 0.075 0.075

Predicted pop. growth <2km from stations 0.1213 -0.1381 0.3808 0.3697

(standard error) (0.0157) (0.0334) (0.0328) (0.0573)

Predicted pop. growth >2km from stations -0.0111 -0.2128 0.1905 0.2141

(standard error) (0.0043) (0.0196) (0.0197) (0.0192)

Di�erence Predicted pop. growth 0.1324 0.0747 0.1903 0.1556

7 Persistence

As a �nal exercise we examine the current e�ects of infrastructures formed in the past. Here

we merge our historical continuous units with 2011 data on 34,753 Lower Super Output

Areas (LSOA). The intersect function in ArcMap is applied to the boundary lines of LSOAs

and the boundary lines of units. The result is that we can study the change in population

from 1841 to 1891 to 2011 for the same spatial units. We estimate the following `very' long

di�erences speci�cation.

yi2011 − yi1841 = βI(Station < 2km)i1851 + β2I(Prerail < 2km)i1830 + γxi + di + εij (4)

where the dependent variable yi2011− yi1841 measures population growth from 1841 to 2011.

The OLS results are reported in column (1) of table 8. We �nd that being close to railway

stations in the mid-nineteenth century increases population density in 2011. More strikingly

the same is true for being close to turnpike roads and inland waterways around 1830. Also

striking is the large long-term e�ect of turnpike roads. The coe�cient on turnpike roads

nearly as large as railways. Column (2) shows the IV estimate using the indicator for being

29

within 2 km of the LCP as the instrument. As before the IV estimate for access to stations

is much larger. While there are many questions as to how persistence worked, we take this

as strong evidence for a long-term e�ect of infrastructures in the EW economy.

Table 8: Infrastructure access population growth over the very long run

Dep. var.: unit pop. growth 1841 to 1891 OLS IV

(1) (2)

coe� coe�

variable (t-stat) (t-stat)

Indicator distance to 1851 railway station <2km 0.292*** 0.947*

(7.79) (2.05)

Indicator distance to 1830 inland waterway <2km 0.134** 0.0614

(3.27) (1.01)

Indicator distance to 1830 turnpike road<2km 0.207*** 0.177***

(8.26) (6.75)

Indicator distance to 1842 port <2km 0.115 0.166

(1.19) (1.46)

Kleibergen-Paap rk Wald F statistic 33.12 33.12

Units with 2 km of LCP nodes removed? No Yes

First nature controls Yes Yes

Second nature controls Yes Yes

District Fixed e�ects Yes Yes

N 9481 9117


8 Conclusion

This paper examines how infrastructure access a�ected long-run population growth in Eng-

land and Wales. It introduces a new data set and methodologies to the study of infras-

tructures and growth in the nineteenth century. It makes several main points. First, it is

important to study all infrastructures in the nineteenth century not just railways. We �nd

that the estimated e�ects of being close to railway stations are over-stated if indicators for

being close to other infrastructures like roads and canals are not included. We also �nd that

30

the validity of least cost path instruments may be under-mined if other infrastructures are

ignored. Our estimates still show that railways had the largest e�ect on population growth,

but turnpike roads and canals still mattered a great deal.

Second, the e�ects of being close to railway stations depended on initial population

density and occupational structure. Speci�cally we �nd that the e�ects of railways and

other infrastructures were larger for more densely populated units in 1841 and units that

had a lower share of male agricultural employment. The latter results are consistent with

New Economic Geography (NEG) models that emphasize increases in agglomeration with

lower transport costs. They are also consistent with the literature on nineteenth century

globalization, which emphasizes EW's shift out of agriculture and into manufacturing as

trade costs fell.

Third, we argue that the local e�ects of infrastructures can persist for centuries. We

show that the population distribution in 2011 England and Wales is still in�uenced by

infrastructures in the mid-nineteenth century. This implies that the policy decisions made

today to promote or manage urbanization will have e�ects for decades to come, perhaps

even centuries.

References

1. Alvarez, Eduard, Xavi Franch, and Jordi Martí-Henneberg. "Evolution of the territorialcoverage of the railway network and its in�uence on population growth: The case of Englandand Wales, 1871�1931." Historical Methods: A Journal of Quantitative and InterdisciplinaryHistory 46.3 (2013): 175-191.

2. Alvarez, E., Dunn, O., Bogart, D., Max Satchell, Leigh Shaw-Taylor, 'Ports of England andWales, 1680-1911', 2017.

3. Armstrong, John. The Vital Spark: The British Coastal Trade, 1700-1930. InternationalMaritime Economic History Association, 2009.

4. Atack, Jeremy, Fred Bateman, Michael Haines, and Robert A. Margo. "Did railroads induceor follow economic growth?." Social Science History 34, no. 2 (2010): 171-197.

31

5. Atack, Jeremy, and Robert A. Margo. "The Impact of Access to Rail Transportation onAgricultural Improvement: The American Midwest as a Test Case, 1850-1860." Journal ofTransport and Land Use 4.2 (2011).

6. Avery, Brian William. Soil classi�cation for England and Wiles: higher categories. No.631.44 A87. 1980.

7. Baines D. Migration in a mature economy: emigration and internal migration in Englandand Wales 1861-1900. Cambridge University Press; 2002.

8. Baum-Snow, N., Brandt, L., Henderson, J. V., Turner, M. A., & Zhang, Q. (2017). Roads,railroads, and decentralization of Chinese cities. Review of Economics and Statistics, 99(3),435-448.

9. Becker, Sascha O., Erik Hornung, and Ludger Woessmann. "Education and catch-up in theindustrial revolution." American Economic Journal: Macroeconomics (2011): 92-126.

10. Berger, Thor, and Kerstin En�o. "Locomotives of local growth: The short-and long-termimpact of railroads in Sweden." Journal of Urban Economics (2015).

11. Bogart, Dan. �The Transport Revolution in Industrializing Britain,� in Floud, Roderick, JaneHumphries, and Paul Johnson, eds. The Cambridge Economic History of Modern Britain:Volume 1, Industrialisation, 1700�1870. Cambridge University Press, 2014.

12. Campbell, Gareth, and John D. Turner. "Dispelling the Myth of the Naive Investor duringthe British Railway Mania, 1845�1846." Business History Review 86.01 (2012): 3-41.

13. Campbell, Gareth, and John D. Turner. "Managerial failure in mid-Victorian Britain?:Corporate expansion during a promotion boom." Business History 57.8 (2015): 1248-1276.

14. Casson, Mark. The world's �rst railway system: enterprise, competition, and regulation onthe railway network in Victorian Britain. Oxford University Press, 2009.

15. Casson, Mark. "The determinants of local population growth: A study of Oxfordshire in thenineteenth century." Explorations in Economic History 50.1 (2013): 28-45.

16. Chandra, Amitabh, and Eric Thompson. "Does public infrastructure a�ect economic ac-tivity?: Evidence from the rural interstate highway system." Regional Science and UrbanEconomics 30.4 (2000): 457-490.

17. Clayden, Benjamin, and John Marcus Hollis. Criteria for di�erentiating soil series. No. TechMonograph 17. 1985.

18. Cormen, Thomas H., Charles E Leiserson, Ronald L Rivest and Cli�ord Stein: Introductionto Algorithms, Cambridge, MA, MIT Press (3rd ed., 2009) pp.695-6.

19. Crafts, Nicholas, and Abay Mulatu. "How did the location of industry respond to fallingtransport costs in Britain before World War I?." The Journal of Economic History 66.03(2006): 575-607.

32

20. Crafts, Nicholas, and Nikolaus Wolf. "The location of the UK cotton textiles industry in1838: A quantitative analysis." The Journal of Economic History 74.04 (2014): 1103-1139.

21. Del Río, Eloy, Jordi Martí-Henneberg, and Antònia Valentín. "La Evolución de la red fer-roviaria en el Reino Unido (1825-2000)." Treballs de La Societat Catalana de Geogra�a 65(2008): 654-663.

22. Demographia World Urban Areas, 14th Annual Edition" (PDF). April 2018

23. Desmet, Klaus, and Esteban Rossi-Hansberg. "Spatial development." The American Eco-nomic Review 104.4 (2014): 1211-1243.

24. Donaldson, Dave. Railroads of the Raj: Estimating the impact of transportation infrastruc-ture. No. w16487. National Bureau of Economic Research, 2010.

25. Donaldson, Dave, and Richard Hornbeck. "Railroads and American economic growth: A�market access� approach." The Quarterly Journal of Economics 131.2 (2016): 799-858.

26. Duranton, Gilles, and Matthew A. Turner. "Urban growth and transportation." The Reviewof Economic Studies 79.4 (2012): 1407-1440.

27. Faber, Benjamin. "Trade integration, market size, and industrialization: evidence fromChina's National Trunk Highway System." Review of Economic Studies 81.3 (2014): 1046-1070.

28. Fernihough, Alan, and Kevin Hjortshøj O'Rourke. Coal and the European industrial revolu-tion. No. w19802. National Bureau of Economic Research, 2014.

29. Fishlow, Albert. American Railroads and the Transformation of the Ante-bellum Economy.Vol. 127. Cambridge, MA: Harvard University Press, 1965.

30. Fogel, R. "Railways and American Economic Growth." Baltimore: Johns Hopkins Press.(1964).

31. Freeman, Michael J., and Derek H. Aldcroft, eds. Transport in Victorian Britain. ManchesterUniversity Press, 1991.

32. Fujita, Masahisa, Paul R. Krugman, and Anthony Venables. The spatial economy: Cities,regions, and international trade. MIT press, 2001.

33. Gregory, Ian N., and Jordi Martí Henneberg. "The railways, urbanization, and local demog-raphy in England and Wales, 1825�1911." Social Science History 34.2 (2010): 199-228.

34. Gutberlet, Theresa. "Cheap Coal versus Market Access: The Role of Natural Resources andDemand in Germany's Industrialization." (2014).

35. Hawke, Gary Richard. Railways and economic growth in England and Wales, 1840-1870.Clarendon Press, 1970.

36. Heblich, Stephan, and Alex Trew. "Banking and Industrialization." (2017).

33

37. Herranz-Loncán, Alfonso. "Railroad impact in backward economies: Spain, 1850�1913." TheJournal of Economic History 66.04 (2006): 853-881.

38. Hornung, Erik. "Railroads and growth in Prussia." Journal of the European Economic As-sociation 13.4 (2015): 699-736.

39. Jedwab, Remi, Edward Kerby, and Alexander Moradi. "History, path dependence and de-velopment: Evidence from colonial railroads, settlers and cities in Kenya." The EconomicJournal (2015).

40. Jarvis A., H.I. Reuter, A. Nelson, E. Guevara (2008). Hole-�lled seamless SRTM data V4, In-ternational Centre for Tropical Agriculture (CIAT), available from http://srtm.csi.cgiar.org.

41. Jaworski, Taylor, and Carl T. Kitchens. "National Policy for Regional Development: Histor-ical Evidence from Appalachian Highways." (2017).

42. Klein, Alexander, and Nicholas Crafts. "Making sense of the manufacturing belt: determi-nants of US industrial location, 1880�1920." Journal of Economic Geography 12.4 (2012):775-807.

43. Law, Christopher M. "The growth of urban population in England and Wales, 1801-1911."Transactions of the Institute of British Geographers (1967): 125-143.

44. Leunig, Timothy. "Time is money: a re-assessment of the passenger social savings fromVictorian British railways." The Journal of Economic History 66.3 (2006): 635-673.

45. Lipscomb, Molly, Mush�q A. Mobarak, and Tania Barham. "Development e�ects of electri�-cation: Evidence from the topographic placement of hydropower plants in Brazil." AmericanEconomic Journal: Applied Economics 5.2 (2013): 200-231.

46. Long, Jason. "Rural-urban migration and socioeconomic mobility in Victorian Britain."Journal of Economic History (2005): 1-35.

47. Michaels, Guy. "The e�ect of trade on the demand for skill: Evidence from the interstatehighway system." The Review of Economics and Statistics 90.4 (2008): 683-701.

48. Odlyzko, Andrew. "Collective hallucinations and ine�cient markets: The British RailwayMania of the 1840s." University of Minnesota (2010).

49. O'Rourke, Kevin H. "The European grain invasion, 1870�1913." The Journal of EconomicHistory 57.4 (1997): 775-801.

50. O'Rourke, Kevin H., and Je�rey G. Williamson. "When did globalisation begin?." EuropeanReview of Economic History 6.1 (2002): 23-50.

51. Pascali, Luigi. "The wind of change: Maritime technology, trade and economic development."American Economic Review (2016).

52. Pascual Domènech, P. (1999). Los caminos de la era industrial: la construcción y �nanciaciónde la red ferroviaria catalana, 1843-1898 (Vol. 1). Edicions Universitat Barcelona.

34

53. Pope, Alexander, and D. SWANN. "The pace and progress of port investment in England1660�1830." Bulletin of Economic Research 12.1 (1960): 32-44.

54. Poveda, G. (2003). El antiguo ferrocarril de Caldas. Dyna, 70 (139), pp. 1-10.

55. Purcar, Cristina. "Designing the space of transportation: railway planning theory in nine-teenth and early twentieth century treatises." Planning Perspectives 22.3 (2007): 325-352.

56. Ravenstein, Ernest George. "The laws of migration." Journal of the statistical society ofLondon 48.2 (1885): 167-235.

57. Redding, Stephen J., and Matthew A. Turner. Transportation costs and the spatial organi-zation of economic activity. No. w20235. National Bureau of Economic Research, 2014.

58. Redford, Arthur. Labour migration in England, 1800-1850. Manchester University Press,1976.

59. Riley, S. J., S. D. Gloria, and R. Elliot (1999). A terrain Ruggedness Index that quanti�esTopographic Heterogeneity, Intermountain Journal of Sciences, 5(2-4), 23-27.

60. Robson, Brian T. Urban growth: an approach. Vol. 9. Routledge, 2006.

61. Rosevear, A., Satchell, A.E.M., Bogart, D., Shaw Taylor, L., 'Turnpike roads of England andWales,' 2017.

62. Satchell, A.E.M. 'Navigable waterways and the economy of England and Wales 1600-1835,'2017.

63. Satchell, A.E.M., Kitson, P.M.K., Newton, G.H., Shaw-Taylor, L., Wrigley E.A., 1851 Eng-land and Wales census parishes, townships and places. Working paper (2016).

64. Satchell, A.E.M. and Shaw-Taylor, L., Exposed coal�elds of England and Wales (2013).

65. Schurer, K., Higgs, E. (2014). Integrated Census Microdata (I-CeM), 1851-1911. [datacollection]. UK Data Service. SN: 7481, http://doi.org/10.5255/UKDA-SN-7481-1.

66. Shaw-Taylor, L. and Wrigley, E. A. �Occupational Structure and Population Change,� inFloud, Roderick, Jane Humphries, and Paul Johnson, eds. The Cambridge Economic Historyof Modern Britain: Volume 1, Industrialisation, 1700�1870. Cambridge University Press,2014.

67. Simmons, Jack. The railway in town and country, 1830-1914. (1986).

68. Storeygard, Adam. "Farther on down the road: transport costs, trade and urban growth insub-Saharan Africa." The Review of Economic Studies 83.3 (2016): 1263-1295.

69. Tang, John P. "Railroad expansion and industrialization: evidence from Meiji Japan." TheJournal of Economic History 74.03 (2014): 863-886.

70. Tang, John P. "The Engine and the Reaper: Industrialization and mortality in late nineteenthcentury Japan." Journal of health economics 56 (2017): 145-162.

35

71. United Nations, Department of Economic and Social A�airs, Population Division (2014).World Urbanization Prospects: The 2014 Revision, Highlights (ST/ESA/SER.A/352)

72. Wellington, A.M. The Economic Theory of the Location of Railways: An Analysis of theConditions Controlling the Laying Out of Railways to E�ect the Most Judicious Expenditureof Capital. Ed. J. Wiley & sons, 1877.

73. Willan, Thomas Stuart. River navigation in England, 1600-1750. Psychology Press, 1964.

74. Wrigley, Edward Anthony. Energy and the English industrial revolution. Cambridge Uni-versity Press, 2010.

75. Wrigley, E. A. �The PST system of classifying occupations,� Working paper 2015.

76. You, Xuesheng. Women's employment in England and Wales, 1851-1911, University of Cam-bridge, unpublished phd dissertation, 2014.

A Appendices:

A.1 The least cost path instrument

In this section, we describe how we construct the instrument for distance to railway stations.

The �rst step is to select the nodes of the hypothetical network and then which nodes will

become origins and destinations connected by the least cost path (LCP). The candidate

nodes are all the towns with a population over 5,000 inhabitants in 1801. These were the

major population centers. Each pair of towns, both with a population above 5000, is a

potential origin and destination for railway lines. A gravitational model selects the origins

and destinations that will be connected based on an approximation for the value of trade

between the potential origin and destination. We assume the value of connecting an origin

and destination pair is given by GMij =PopiPopjDistij

, where GMij is the gravitational potential

between town i and j, Popi is the 1801 population of town i, and Distit is the straight line

distance between i and j. We chose the town pair i and j as origins and destinations in our

LCP if GMij > 10, 000.

The second step is to identify the LCP connecting our nodes. The main criteria used to

plan linear projects is usually the minimization of earth-moving works. Assuming that the

track structure (composed by rails, sleepers and ballast) is equal for the entire length, it is

36

in the track foundation where more di�erences can be observed. Thus, terrains with higher

slopes require larger earth-moving and, in consequence, construction costs become higher

(Pascual 1999, Poveda 2003, Purcar 2007). The power of traction of the locomotives and

the potential adherence between wheels and rails could be the main reason. Besides, it is

also important to highlight that having slopes over 2% might imply the necessity of building

tunnels, cut-and-cover tunnels or even viaducts. The perpendicular slope was also crucial.

During the construction of the track section, excavation and �lling have to be balanced in

order to minimize provisions, waste and transportation of land. Nowadays, bulldozers and

trailers are used, but historically workers did it manually. It implied a direct linkage between

construction cost, wages and availability of skilled laborers. In fact, it is commonly accepted

in the literature that former railways were highly restricted by several factors. The quality of

the soil, the necessity of construction tunnels and bridges or the inference with preexistences

(building and land dispossession) were several. Longitudinal and perpendicular slope were

the more signi�cant ones and we focus on these below.

Slopes are determined using elevation data. Several DEM rasters have been analyzed

in preliminary tests, but we �nally chose the Shuttle Radar Topography Mission (SRTM)

obtained in 90 meter measurements (3 arc-second). Although being a current raster data

set, created in 2000 from a radar system on-board the Space Shuttle, the results o�ered

in historical perspective should not di�er much from the reality. The LCP tool calculates

the route between an origin and a destination, minimizing the elevation di�erence (or cost

in our case) in accumulative terms. The method developed was based on the ESRI Least-

Cost-Path algorithm, although additional tasks were implemented to optimize the results

and to o�er di�erent scenarios. The input data was the SRTM elevation raster, converted

into slope. This conversion was necessary in order to input di�erent construction costs.

The third step is to specify the relationship between construction costs and slope. One

approach is to use the historical engineering literature. Wellington (1877) discusses elevation

37

slope (i.e. gradients), distance, and operational costs of railways, but this is not ideal as we

are interested in construction costs. We could not �nd an engineering text that speci�ed

the relationship between construction costs and slopes. As an alternative we use historical

construction cost data. The following details our data and procedure.

A select committee on railways in 1844 published a table on the construction costs of 54

railways.22 There were 45 with a clear origin and destination, to which we can measure total

elevation change along the route (details are available). For these 45 railways we calculate

the distance of the railway line in meters and the total elevation change (all meters of ascent

and descent). We then ran the following regression for railway i:

ConstructionCostsi = αDistance100Metersi + βElevationchangeMetersi + εi. (5)

where construction costs are measured in pounds. This regression produces unsatisfactory

results, with total elevation change having a negative sign. We think the main reason is

that the sample includes railways with London as an origin and destination. Land values

in London were much higher than elsewhere and thus construction costs were higher there.

Therefore, we omit railways with a London connection. We also think it is important to

account for railways in mining areas as they were typically built to serve freight tra�c rather

than a mix with passenger.

Our extended model uses construction costs for 36 non-London railways and follows the

following speci�cation:

22See the Fifth report from the Select Committee on Railways; together with the minutes of evidence,appendix and index (BPP 1844 XI). The speci�c section with the data is appendix number 2, report to thelords of the committee of the privy council for trade on the statistics of British and Foreign railways, pp.4-5.

38

ConstructionCostsi = αDistance100Metersi+βElevationchangeMetersi+µminingrailwayi+εi

(6)

The results imply that for every 100 meters of distance construction costs rise by 128.9

(st. err 45.27) and holding distance constant construction costs rise by 382.6 (st. err.

274.5) for every 1 meter increase in total elevation change. Construction costs for min-

ing railways are 340,418 pounds less (st. err. 179,815). For our LCP model we assume

a non-mining railway, re-scale the �gures into construction costs per 100 meters, and nor-

malize so that costs per 100 meters are 1 at zero elevation change. The formula becomes

NormalizedCostper100meters = 1+2.96∗(ElevationChangeMeters/Distance100meters).

The elevation change divided by distance can be considered as the slope in percent, in which

case our formula becomes Cost = 1 + 2.96 ∗%slope. We think this is a reasonable approxi-

mation of the relationship between construction costs, distance, and elevation slope.

For computational purposes it is convenient to divide slope into bins of 0 to 1%, 1 to 2%,

and so on. The following table gives the costs over a standardized distance for di�erent slope

bins in our preferred, which is labeled scenario 2. For comparison, we also show parameters

assuming a constant unitary linear cost in slope (scenario 1) and case where slope costs

are graded, and are constant up to 2 to 3% and then rise up to 6-7% when costs become

constant (scenario 3).

39

slope % cost scenario 1 cost scenario 2 (preferred) cost scenario 3

0 0 1 1

0-1 1 4 1

1-2 2 7 1

2-3 3 10 4

3-4 4 13 7

4-5 5 16 11

5-6 6 19 15

6-7 7 22 19

7-8 8 25 19

8-9 9 28 19

9-10 10 31 19

>10 ... 34 19

The LCP algorithm is implemented using ESRI python, using as initial variables the

elevation slope raster, the reclassi�cation table of construction costs, and the node origin-

destination nodes. The cost distance and the back-link rasters using the formulation below:

GMij = ((CostSurface(a) ∗HF (a)) + CostSurface(b) ∗HF (b))

2)∗SurfaceDistance(ab)∗V F (ab)

(7)

where CostSurface(j) is the cost of travel for cell j, HF (j) is the horizontal factor for cell

j, SurfaceDistance(ab) is the surface distance for a to b, and V F (ab) is the vertical factor

from a to b. Note that the division by 2 of the friction of the segments is deferred until

the horizontal factor is integrated. Finally, we implemented the least-cost-path function to

obtain the LCP corridors. These corridors were converted to lines, exported, merged and

post-processed. Maps of our preferred LCP using scenario 2 are shown in the text.

A.2 Elevation, slope, and ruggedness variables

The aim of this appendix is to explain the creation of the elevation variables, including the

original sources and method we followed to estimate them. There are several initiatives

working on the provision of high-resolution elevation raster data across the world. The

40

geographical coverage, the precision of the data and the treatment of urban surroundings

concentrate the main di�erences between databases.

In order to carry on this work, we have downloaded several elevation DEM rasters,

preferably DTM , covering the entire England and Wales. In decreasing order in terms of

accuracy, the most precise one database was LIDAR (5x5m.), Landmap Data set contained

in the NEODC Landmap Archive (Centre for Environmental Data Archival). In second

instance, we used EU-DEM (25x25m.) from the GMES RDA project, available in the

EEA Geospatial Data Catalogue (European Environment Agency). The third dataset was

the Shuttle Radar Topography Mission (SRTM 90x90m), created in 2000 from a radar

system on-board the Space Shuttle Endeavor by the National Geospatial-Intelligence Agency

(NGA) and NASA. And �nally, we have also used GTOPO30 (1,000x1,000m) developed by

a collaborative e�ort led by sta� at the U.S. Geological Survey's Center for Earth Resources

Observation and Science (EROS). All those sources have been created using satellite data,

which means all of them are based in current data. The lack of historical sources of elevation

data obligate us to use them, although the involved contradictions. This simpli�cation may

be considered reasonable for rural places but it is more inconsistent in urban surroundings

where the urbanization process altered the original landscape. Even using DTM rasters, the

construction of buildings and technical networks involved a severe change in the surface of

the terrain. Several tests at a local scale were conducted with the di�erent rasters in order to

establish a balance between precision and operational time spend in the calculations. Total

size of the �les, time spend in di�erent calculations and precision in relation to the �nest

data were some of the comparisons carried on. After these, we opted for SRTM90.

As stated in the appendix on mappable units, the spatial units used as a basis for the

present paper were civil parishes, comprising over 9000 continuous units. In this regard,

we had to provide a method to obtain unique elevation variables for each unit, keeping

the comparability across the country. We estimated six variables in total: elevation mean,

41

elevation std, slope mean, slope std, ruggedness mean and ruggedness std. Before starting

with the creation of the di�erent variables, some work had to be done to prepare the data. In

order to obtain fully coverage of England and Wales with SRTM data, we had to download

7 raster tiles. Those images were merged together, projected into the British National Grid

and cut externally using the coastline in ArcGIS software.

Having the elevation raster of England and Wales, we proceed to calculate the �rst two

variables: the elevation mean and its standard deviation. A python script was written to

split the raster using the continuous units, to calculate the raster properties (mean and

standard deviation) of all the cells in each sub-raster, and to aggregate the information

obtained in a text �le. These �les were subsequently joined to the previous shape�le of civil

parishes, o�ering the possibility to plot the results.

The second derivative of those results aimed to identify the variability of elevation be-

tween adjacent cells. In this regard, two methods were developed to measure this phe-

nomenon: ruggedness and slope. Ruggedness is a measure of topographical heterogeneity

de�ned by Riley et al (1999). In order to calculate the ruggedness index for each unit, a

python script was written to convert each raster cell into a point keeping the elevation value,

to select the adjacent values using a distance tool, to implement the stated equation to every

single point, to spatially join the points to their spatial units and to calculate aggregated

indicators (mean and standard deviation) per each continuous units.

Slope was an alternative measure of topographical heterogeneity. In order to calculate

the slope variable for each unit, a python script was written to convert the elevation into a

slope raster, to split the raster using the continuous units, to calculate the raster properties

(mean and standard deviation) of all the cells in each sub-raster, and to aggregate the

information obtained in a text �le. The obtained results for both ruggedness and slope are

displayed at the end of this note. As the reader will appreciate, the scale of the indices

is di�erent (1 - 2 times) but the geographical pattern is rather similar. In this regard, we

42

Figure 6: Slope and ruggedness measures

used for the paper those variables derived from slope measures because the time spend in

calculations was rather lower.

43

A.3 Additional results

Table 9: Di�erent speci�cations for railway variables

Dep. var.: unit pop. growth 1841 to 1891 (1) (2) (3) (4)

coe� coe� coe� coe�

variable (t-stat) (t-stat) (t-stat) (t-stat)

Indicator distance to railway station in 1841 <2km 0.186***

(3.96)

Indicator distance to railway station in 1861 <2km 0.177***

(11.12)

Indicator any rail line in 1851 0.164***

(11.34)

Exactly one railway station in 1851 0.168***

(8.11)

More than one railway station in 1851 0.270***

(6.46)

First nature controls Yes Yes Yes Yes

Second nature controls Yes Yes Yes Yes

District Fixed e�ects Yes Yes Yes Yes

N 9482 9482 9482 9482

notes: * p<0.05, ** p<0.01, *** p<0.001. The standard errors are clustered on the district.

44

Table 10: Heterogeneity speci�cations: railway access and 1851 occupational shares

Dep. var.: unit pop. growth 1841 to 1891 (1)

coe�

variable (t-stat)

Indicator distance to railway station in 1851 <2km 0.309*

(2.27)

Indicator distance to railway station in 1851 <2km -0.289*

*Share agricultural employment (-2.06)

Indicator distance to railway station in 1851 <2km 0.198

*Share mining and forestry employment (0.67)

Indicator distance to railway station in 1851 <2km -0.174

*Share tertiary employment (-0.46)

Indicator distance to railway station in 1851 <2km 0.0433

*Share un-speci�ed employment (0.18)

First nature controls Yes

Second nature controls Yes

District Fixed e�ects Yes

N 9482

notes: * p<0.05, ** p<0.01, *** p<0.001. The standard errors are clustered on the district.

45

Infrastructure access and population growth · Dan Bogart , Xuesheng You y, Eduard Alvarez z, Max...

Documents

Transcript of Infrastructure access and population growth · Dan Bogart , Xuesheng You y, Eduard Alvarez z, Max...