6. Lecture Introduction to Databases
Advances in databases: NoSQL
databases
Cheng Fu
Dept. of Geography, University of Zürich
Rolf Meile
Eidg. Forschungsanstalt für Wald, Schnee und Landschaft
Zhiyong Zhou, Tutor
Dept. of Geography, University of Zürich
kudos to Dr. Haosheng Huang for providing this document
Geo874 | HS20
Universität Zürich
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
2
Summary 5.1
|
ER model Step Relational model
Entity type 1,2 Relation
Binary 1:1 relationship type 3Add PK on the total participation
side to the othe side
Binary 1:N relationship type 4Add PK on the 1-side to the N-
side
Binary M:N relationship type 5 Relation and two FKs
n-ary relationship type 7 Relation with n FKs
Simple attribute 1, ... Attribute
Composite attribute 1, ... Set of simple attributes
Multivalued Attribute 6 Relation and FK
Correspondence between ER- and relational models
Elmasri & Navathe (2014): Ch 9, pp. 287-296.
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
3
So far we covered...
Relational databases– DB Design: Requirements -> conceptual -> logical ->
physical
– Entities and relationships
– Relational model
– SQL – the language of relational databases
| Intro
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
4
So far we covered...
Relational databases– DB design process: requirements -> conceptual ->
logical -> physical design
– Basic assumptions:
• Requirements: Use-cases can be defined and data can be structured at start
• Conceptual: modelling entity types through fixed number of attributes, and their relationships
• Logical: relational model, dense, row-based (tuple) data structures, row-first access, and SQL as access language
• Physical: designed for a single server, ACID (atomicity, consistency, isolation, durability) transactions.
So – what if some of above do not hold...
| Intro
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
5
Learning objectives
You will understand current developments in data collection,
storage and processing – Why NoSQL?
You will understand the basic characteristics of NoSQL
databases
You will be aware of different NoSQL database types: key-
value, document, column, graph
| Intro
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Contents – NoSQL Databases
1. Introduction
2. Characteristics
3. NoSQL database types: key-value, document, column,
graph
6
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
The shift to digital economy
• Economy powered by the Internet and other 21st century technologies – the cloud, mobile, social media and big data
• Key to every digital economy business: Web/mobile-based applications need to– Support large numbers of concurrent users (tens of thousands, millions)
– Deliver highly responsive experiences to a globally distributed base of users
– Be always available – no downtime
– Handle semi- and unstructured data
– Rapidly adapt to changing requirements with new features and frequent updates
• Relational databases: structures and data types are fixed in advance
• NoSQL: Applications can add new fields on the fly.
Relational databases are unable to meet these new
requirements!
7 | Intro
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Big Data – V V V
8 | Intro
Buzzword, but...– Volume: large amounts of data.
• Large: beyond what can reasonably fit into a singlememory/ hard drive
– Velocity: data arriving at a high rate and need to bestored and possibly processed fast
• E.g., streaming data from sensors, from social networkfeeds, at the stock exchange, created on the Web...
• Delays are costly
– Variety/Variability: data change their schema, formats, are inconsistent, and are of different types...
• Need approaches that allow for easy adaptation in whatis stored and how it is processed
– ... Some talk also of Veracity (how trustful the data can be, data quality)
Early talk on Big Data (coining the term): http://www.quora.com/Who-coined-the-
term-big-data (Roger Magoulas, 2009)
http://www.quora.com/Who-coined-the-term-big-data
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Scalability
9 | Intro
– To solve Volume and Velocity we need scalable solutions: systems that can grow at least proportionally to the needs.
– Purpose-built computers are $$$ (scale up, vertical scalability)
– What if the data outgrow it?
– Ability to add more small/cheap resources as requirements grow (scale out, horizontal scalability);
– Allowing partial failure by providing redundancy
– Consequence: databases are partitioned between multiple physical systems (computers).
Relational databases are poor in supporting these!
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
• IBM mainframe: Z server
10
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
NoSQL
• Definition from http://www.nosql-database.org/– Next Generation Databases mostly addressing some of the points: being
non-relational, distributed, open-source and horizontally scalable.
– “not only SQL”
– Typically no good at joins
– Triggered by the needs of web 2.0 companies: Facebook, Google, Amazon.com
– Increasingly used in big data and real-time web applications
– More than 225 NoSQL databases• CouchDB
• MongoDB
• Hadoop & Hbase
• Neo4J
• Cassandra
• …
11 | Intro
http://www.nosql-database.org/
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Contents – NoSQL Databases
1. Introduction
2. Characteristics
3. NoSQL database types: key-value, document, column,
graph
12
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Characteristics of NoSQL databases
• Simplicity of design: schema-less– SQL databases: structures and data types are fixed in advance
– NoSQL databases: dynamic schemas; Applications can add new fields/attributes on the fly; Able to store large volumes of rapidly changing structured, semi-structured (e.g., JSON, XML), and unstructured data (e.g., video, audio, image, books, documents, journals, …)
• Horizontal scaling (distribution) to clusters of machines– SQL databases: vertical scaling (add/remove resources to/from a computer server)
– NoSQL databases: potentially thousands of machines, potentially distributed around the world
• Be always available, highly responsive queries for global users– NoSQL databases: DB servers distributed globally
13 | Characteristics
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Transaction properties in SQL databases: ACID
– Atomicity• The set of operations on the database is either executed in its entirety, or not at all.
• Entire transaction succeeds or fails (all-or-nothing)
• Example: transfer of funds from account A to account B: either both, the debit on A and the deposit into B are executed, or none of the two.
– Consistency• After a transaction has been executed, the integrity constraints have to be satisfied.
• Valid state of DB before AND after transaction
• Example: During the execution, there may be violations, but if they remain until the end, the transaction has to be undone (“aborted”).
– Isolation• Enables transactions to operate independently of and transparently to each other.
• Other transactions cannot access data that has been modified during a transaction that has not yet completed.
– Durability (persistence)• After the successful completion of a transaction, the DBMS commits to make the outcome of
the transaction permanent, even in the presence of concurrency and/or breakdowns
– These are problematic with distributed database systems (NoSQL)!• Partitioning data into different servers
| Transaction
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Partitioning consequences
• Partitioning introduces communication overhead
to assure database consistency after transactions –
during synchronisation or replication process.
• In physically distributed DBs network latency
matters (10km = 67μs = 67x10-6s vs. cache write =10-20μs!)
• Simultaneous updates may occur
• Locking – concurrency control mechanism that
assures exclusive access to a resource (value, row,
table…)
• BUT! Locking impacts on availability (the resource
not available for other users when updated [locked])
15 | Transactions
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Brewer’s CAP Theorem
• CAP theorem: A distributed computer system can provide only two of the following guarantees at the same time.– Consistency (all nodes see the same data at the same time)
– Availability (a guarantee that every request receives a response about whether it succeeded or failed)
– Partition tolerance (the system continues to operate despite partitioning due to network failures)
16 | Transactions
Gilbert, S., & Lynch, N. (2002). Brewer's conjecture and the feasibility of consistent, available,
partition-tolerant web services. ACM SIGACT News, 33(2), 51-59.
C
A
P
X
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
CAP illustration – Consistency violation
17 | Transactions
P1 P2
U1 U2
• T1 V: 1000
• T2 Sync
• T3 V: 1000 1000
• T4 U1: V+500 = 1500
• T5 U2: Read V = 1000
• T6 Sync
No locking:
Accessible
Can be partitioned
Reads can be stale
(old)
AP system:Ensuring Availability and Partition tolerance
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
CAP illustration – Availability violation
18 | Transactions
P1 P2
U1 U2
• T1 V: 1000
• T2 Sync
• T3 V: 1000 1000
• T4 U1: V+500 = 1500 Lock
• T5 Sync
• T6 U2: Read V = 1500
With locking:
Consistent
Partitionable
Temporarily inaccessible
CP system:Ensuring Consistency and Partition tolerance
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
CAP illustration – Partitioning violation
19 | Transactions
P1 P2
U1 U2
• Putting in a mechanism similar to ACID transaction checks in a single
server database and not accounting for failure of the communication
network and its latency
Not resilient
to network
failure
CA system:Ensuring Consistency and Availability
Relational DBs are CA systems
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Transaction in NoSQL
• NoSQL: BASE instead of ACID
– Basically Available: The system does guarantee the availability of the data; there will be a response to any request. But, that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state.
20 | Transactions
– Soft state: The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’
– Eventual consistency: The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one.
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Contents – NoSQL Databases
1. Introduction
2. Characteristics
3. NoSQL database types: key-value, document, column,
graph
21
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
NoSQL databases types
• Key-value
• Document-based
• Column-based
• Graph-based
22| Database Types
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Key-value database• Key-value database: like a files system where the path acts as the key
and the file acts as the value.
• It pairs keys to values– The key is unique, and can be auto-generated.
– The value can be String, JSON, BLOB (Binary Large OBject), etc.• Key-value DBs just store these values, without caring or knowing what's inside; it's
the responsibility of the application to understand what was stored.
• Key-value DBs often use hash tables to map keys to values
• Example databases: Riak, Redis, Memcached, …
Key Value
“India” {“B-25, Sector-58, Noida, India – 201301”}
“Romania”
{“IMPS Moara Business Center, Buftea No. 1, Cluj-Napoca, 400606”,
“City Business Center, Coriolan Brediceanu No. 10, Building B,
Timisoara, 300011”}
“US” {“3975 Fair Ridge Drive. Suite 200 South, Fairfax, VA 22033”}
23| Database Types
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Document-based database
• Document-based DBs store records as “documents” where a document can generally be thought as a collection of key-value pairs.– Format (encoding) of documents: XML, JSON, BSON (Binary JSON), …
– The documents provide some structure and encoding of the managed data.
• Compared to Key-value DBs, document-based DBs embed attribute metadata associated with stored content, which essentially provides a way to query the data based on the contents.
• In a document DB, each document carries its own schema — unlike a relational database, in which every row in a given table must have the same columns.
• Examples: – MongoDB: used by LinkedIn, Foursquare, eBay, …
– CouchDB
– …
24| Database Types
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
CouchDB JSON example: a ‘customer’ document
{
"_id": "1189802380023",
"_rev": "314159",
"customer": "peter",
"gender": "male",
"likes": ["Biking", "Photography"],
"address": {
"Country": ”Switzerland",
"City": ”Zurich"
}
}
Global Unique Identifier, Passed in or
generated by CouchDB
Revision number, versioning mechanism
Arbitrary tags, schema-less,
could be validated by the
programmers
• Each customer is a “document”.
– A database might have tens of thousands or even millions of “documents”.
• Different documents do not need to have the same structure.
– e.g., another customer document might have a new tag like “telephone”.
– schema-less: different from relational databases
25| Database Types
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Column-based database
• Relational databases store all the data in a particular table’s rows together on-disk, making retrieval of a particular row fast.
• Column-based databases generally store all the values of a particular column together on-disk, which makes retrieval of a large amount of a specific attribute fast.
• This approach is very suitable for aggregate queries and analytics scenarios where you might run range queries over a specific field/attribute.
• Example database: Google’s BigTable, Hadoop’s HBase, …
Id Tweet … …
1 “...”
2 “…”
3 “…”
26| Database Types
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Graph-based database
• Graph-based databases are good at dealing with interconnected data.
• Graph databases consist of connections, or edges, between nodes. Both nodes and
their edges can store additional properties such as key-value pairs.
– Node: an instance of an object
• The strength of a graph database is in traversing the connections between the nodes.
• Graph databases are very suited to problem spaces where we have connected data, such as social networks, routing information for goods and money, recommendation engines
• Example databases: Neo4J, Infinite Graph…
27| Database Types
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
28
https://www.youtube.com/watch?v=jyx8iP5tfCI
Seven Databases in Song
https://www.youtube.com/watch?v=jyx8iP5tfCI
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Summary
• NoSQL databases
– schema-less: no predefined schema
– horizontal scaling (distribution) to clusters of machines
– ensure availability
– Transaction properties: BASE instead of ACID
• SQL database: Atomicity, Consistency, Isolation, Durability
• NoSQL database: Basically Available, Soft state, Eventual consistency
• Different NoSQL database types
– key-value pairs, documents, graphs, ...
– difficult to switch to another NoSQL database (Disadvantage)
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Recap
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
What we covered in this semester
• Why we need databases
• Basics of relational DB
• Relational DB design: requirement analysis conceptual DB design logical DB design physical DB design
• Structured Query Language (SQL), PostgreSQL
• NoSQL
We hope you will find these useful in further studies and career.
Hands-on in Geo 875
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Next lecture session
• Intro to big data processing
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
Later, in lab
• 10:15-12:00
• Y25-J-09, Y25-J-10
• Physical DB design and realization
Geo874 | Intro to Databases | HS20
C. Fu, R. Meile, Uni Zürich
L6 | Advances in Databases
34
Next week – exam!
• About 60 mins
• Stuff covered in lectures and practicals
• Not allowed to take lecture notes
• Pencil and paper
• Friday 30.10.2020 8:40 am - 9:40 am in Y15-G-40
• Be here at 8:30 am!
• No lab on 01.11.2019
• Good Luck!
| Summary
Top Related