alice = Account()
bob = Account()
alice.withdraw(100)
bob.deposit(100)
Now imagine you want Alice to send 100 moneys to Bob.
Maybe your code would look like this.
ACID by example
alice = Account()
bob = Account()
alice.withdraw(100)
bob.deposit(100) # fails!
Now imagine you want Alice to send 100 moneys to Bob.
Maybe your code would look like this.
But something breaks! Maybe you had a bug. Maybe there was a power outage. But Bob doesn't get the money.
What's wrong?
ACID by example
alice = Account()
bob = Account()
deftransaction(alice, bob, amt):
alice.withdraw(amt)
bob.deposit(amt)
previous_state = save_states(alice, bob)
try:
transaction(alice, bob, 100)
except: # but on steroids
restore_states(alice, bob, previous_state)
The solution is to wrap all the changes up in a transaction, where a transaction, if it succeeds, moves the database from one valid state to another valid state.
Why ACID?
It should be clear that ACID is helpful in this model of the "operational database" - a database that is used by software to record, mutate, and display data to users.
This is a paradigm where there are many users at once and generally they record/mutate very few things at a given time. They read few things and what they read must be in perfect shape. Data is sacred and must be kept in perfect shape at all times, so that the software application itself to rely on the data.
The Dominance of Relational Systems
Relational databases were invented in the early 70's to solve a lot of problems of basic systems including:
A declarative language for accessing data (what evolved into SQL) that abstracts from underlying data structures.
A "workable abstraction" of the data into "normalized tables."
ACID, too, became a core feature of these relational systems. These systems completed dominated the world of operational databases for many years and are still dominate today!
3NF for Humans
Third Normal Form (3NF) is what is commonly referred to as "normalized" data. It has some fancy terms if you look it up, but it's actually simple:
Data should be in "long" format (each column one type, each row one value).
Each table has a "primary key". This is a unique identifier and can be a composite of one or more columns.
All columns in a given table should depend directly on the primary key (not indirectly).
Normalizing Examples
Let's define this.
Wide Data
id
name
follows_1
follows_2
follows_3
1
foo
873
738
3098
2
bar
983
999
348
3
baz
2789
389
987
Long Data
userid
name
follows
1
foo
873
1
foo
738
1
foo
3098
2
bar
983
2
bar
999
2
bar
348
Primary Key (simple)
id
userid
name
follows
1
1
foo
873
2
1
foo
738
3
1
foo
3098
4
2
bar
983
5
2
bar
999
6
2
bar
348
Remove Dependencies (simple PK)
id
userid
follows
1
1
873
2
1
738
3
1
3098
4
2
983
5
2
999
6
2
348
userid
name
1
foo
2
bar
Remove Dependencies (composite PK)
userid
follows
1
873
1
738
1
3098
2
983
2
999
2
348
userid
name
1
foo
2
bar
Normalized Data
Normalized data has two advantages:
Saves space.
Makes things easy to update and avoid invalid states, everything exists in only one place!
ACID + Normalization -- why they work well together.
SQL
SQL as a language has proven very effective, even when the underlying system is not a relational database and not normalized, it provides a single language to do many things we want to do with data!
It's important to remember, even though we often refer to these traditional, relational, ACID databases as "SQL" databases, that SQL is nothing more than a language that they happen to use!