Understanding Database Concurrency and Isolation Levels
Written on
Databases serve as the foundation for applications, providing persistent storage for the essential data generated by those applications. Given that many users rely on these applications, it is vital for the database to manage substantial workloads, which include various queries and read/write operations. As multiple users interact with the application, numerous transactions can occur simultaneously. If these transactions are processed sequentially, the last transaction will be stalled until the preceding ones finish. This necessity leads us to the notion of concurrency within databases.
What Constitutes a Database Transaction?
A transaction is defined as a collection of operations executed as a single logical unit of work. Transactions are governed by the principle of atomicity, ensuring that all operations either succeed together or fail as a whole. If any error arises during the execution of a transaction, the entire process is rolled back. Conversely, if the transaction completes successfully, it is committed. (T1 ? Transaction 1, T2 ? Transaction 2)
Concurrency in DBMS
In the context of a Database Management System (DBMS), concurrency refers to the ability to execute multiple transactions at once, while ensuring that each transaction operates in isolation. Each transaction operates under the assumption that it is the only one running within the database. This approach minimizes wait times, enhances response times, and optimally utilizes resources.
Challenges Associated with Concurrency
Concurrency-related issues arise primarily when multiple transactions attempt to access the same data item. When transactions work on different items, such problems are negligible. However, when transactions operate concurrently in an unpredictable manner, various complications can surface. Database transactions mainly involve two actions: "Read" and "Write." Efficiently managing these actions during concurrent execution on the same data item is vital for maintaining data integrity.
The Dirty Read Issue
A dirty read occurs when a transaction reads data that another transaction has altered but has not yet committed. In essence, one transaction accesses uncommitted data from another, potentially resulting in inaccurate or inconsistent outcomes. (Example: T1 ? Transaction 1, T2 ? Transaction 2, A ? Shared Data Item)
The Phantom Read Problem
Phantom reads occur when the same transaction executes two identical queries at different times and receives different results due to changes made by another transaction that has committed. (Example: T1 ? Transaction 1, T2 ? Transaction 2, X ? Shared Data Item)
The Non-Repeatable Read Problem
A non-repeatable read happens when a transaction retrieves the same row multiple times and encounters different values due to updates from other transactions. Essentially, it involves reading committed data altered by another transaction in between the two reads. (Example: T1 ? Transaction 1, T2 ? Transaction 2, X ? Shared Data Item)
These issues stem from concurrency in DBMS. The subsequent discussion will focus on resolving these problems through transaction isolation levels and concurrency control mechanisms that ensure database consistency.
Types of Isolation Levels
There are four primary transaction isolation levels designed to mitigate the issues arising from concurrency:
READ_UNCOMMITTED
In the READ_UNCOMMITTED isolation level, transactions can access uncommitted data from other transactions. This is the weakest level of isolation, allowing reads without locking mechanisms. As a result, transactions may encounter dirty reads.
Example:
In two terminals with the isolation set to READ_UNCOMMITTED, if Terminal 1 updates a row but does not commit, Terminal 2 can read this uncommitted change, leading to potential inconsistencies if the transaction in Terminal 1 is rolled back.
READ_COMMITTED
This isolation level prevents dirty reads by permitting transactions to read only committed data from other transactions. It offers a higher level of isolation compared to READ_UNCOMMITTED but is slightly less strict than REPEATABLE READ. Under this level, ongoing commits can lead to varying results when executing the same query multiple times in a single transaction.
Example:
With the isolation level set to READ_COMMITTED, if Terminal 1 updates a row but does not commit, Terminal 2 will not see the change until it is committed, ensuring stability.
REPEATABLE_READ
This level prevents both dirty and non-repeatable reads. REPEATABLE READ is the default setting in MySQL's InnoDB engine, maintaining a consistent snapshot from the start of the transaction. All queries within the transaction return the same data, unaffected by concurrent commits.
Example:
If Terminal 1 deletes a row and commits, Terminal 2 will still see the row because it is operating on a snapshot taken at the beginning of the transaction.
SERIALIZABLE
This is the most stringent isolation level, preventing dirty reads, non-repeatable reads, and phantom reads. While SERIALIZABLE enhances consistency, it can introduce complex locking requirements within MySQL.
Example:
In two terminals set to SERIALIZABLE, a SELECT query can be executed concurrently, but an UPDATE or DELETE will fail unless the transaction holding the lock is committed first.
I hope you found this article enlightening. Thank you for taking the time to read it. I regularly publish articles on technology topics, so feel free to explore my profile for more content. Happy coding!