Up to now I’ve been concentrating on saving a few bytes here and there, but we still have those unfortunate update blocks and the need to traverse every update block on a commit. In a sense, this is doubling the amount of work that you have to do for an update because you have to write each update value twice --once to the update block and then once to the cell in the row. And rolled-back updates are not much cheaper than committed updates since both involve that expensive traversal of history. Can we avoid some of that work?
Getting Rid of Update Blocks
Instead of creating an update block, the writer could just write the new value in place. Then there would be no need to traverse the list of update blocks on a commit. Instead, new transactions that start after the commit would see that they don’t want the historic value so they would just use the value in the cell.
Safe multithreaded access would work like this: the writer first puts the history in place, then updates the cell to the new value. A reader first reads the value and then looks for history. If there is no history then the reader has the current value. If there is history, then the reader traverses the history list to decide whether to use a historic value or the value in the cell. Since the history was set before the value was changed, the reader is ensured not to pick up an uncommitted value and miss the history.
There is a problem with the above approach and that is that when the reader has a start time later than any of the history, then it has to know whether to use the value in the cell or the latest historic value. But that decision depends on whether the transaction that put the new value in the cell committed before the reader’s transaction started. If there is no update block then there is no information telling the reader whether the new value is committed or not.
To fix this, the history information needs to contain a transaction ID for the transaction that did the latest update. The reader then checks to see if it started after that transaction committed or not. If the reader started after the commit, then it uses the value in the cell, otherwise it uses a historic value.
Since ADS had update blocks, it could go through and touch each updated row as part of a commit. In ADS if there is no timestamp on the update block, then the update is not committed. The new scheme can’t use that method (since the whole point is to avoid that traversal) so the transaction ID has to be a pointer to a transaction descriptor or an index into an array of transaction descriptors. To find whether the transaction has committed and to find the commit timestamp, you have to access the transaction descriptor through the transaction ID.
Avoiding the clean up of history blocks
In ADS, each history block contained a timestamp. When there were no longer any running transactions with a start time earlier than that timestamp, then the history block could be cleaned up. History blocks were all linked together in a huge linked list and the cleanup process had to traverse the entire list removing old history blocks and leaving the new ones.
If instead we used arena allocation for history blocks, then all of the blocks could be freed in one chunk, saving all of the work of a list traversal. The reason ADS could not do that is because it would leave dangling pointers in the history lists that readers would then try to follow. For ADS, the timestamp in a history block is the the timestamp of the commit of the transaction that replaced the value.
Consider the following history chain (where T1<T2):
The start time of the reader is t. If t<T1 then the reader is younger than the transaction that last updated the value so it should use the current committed value, 15. If t>T1 and t<T2 then the reader is older than the transaction that changed the value from 10 to 5, but younger than the transaction that changed the value from 5 to 10, so it should use the middle value, 10. If t>T2 then the reader is older than the transaction that changed the original 5 to a 10, so the reader should use the original value 5.
Notice, though that the reader can’t decide to use the value for T1 until it goes to the next history block and sees the timestamp T2 in that history block. The reader in general has to read one history block past the one that it eventually uses. This is why even though there may be no transactions left that are older than T2, ADS can’t just drop the history block for T2 because readers are still going to look at it.
But suppose that we shift all of the timestamps up one cell in the history:
T1 (current committed)
older timestamp or 0
In other words, we label each history block, not with the transaction that created the history, but with the transaction that originally created the value. We use the special timestamp 0 when the value is so old that we no longer know what transaction created it.
With this structure, the reader does the following: if t<T1 then use the current value, 15. Otherwise if t<T2 then use the middle value, 10. Otherwise use the original value, 5. There is never a need to traverse past the value that the reader wants, so as soon as there is no transaction left older than T2, it is safe to drop the history block for the original value, because no reader will ever traverse a chain that far any more.