Chapter Summary

This is an interesting and exciting chapter! Finally the book starts to talk about data systems. The discussion begins with the illustration of how data store can be designed using a simple text file. Extending from the concept, a simple storage engine with append-only log is introduced - hash index. The basic knowledge about hash index makes the understanding of the subsequent classic LSM-tree storage engine easy. Complement to the LSM-tree storage engine, the classic B-tree storage engine is also explained in detail.

In the second section of the chapter, the difference between OLAP and OLTP databases is discussed. Diving deeper into the OLAP and data warehouse concept, column-oriented data storage engine is introduced.

Mind Map

Questions Summary

💡 How does Hash Indexes work?

Key-value stores are used (similar to the hashmap data structure). The key is the index key and the value is byte offset of the value in the data file. The hash index is stored in memory for fast access. When reading the data from the index, we can read the byte offset of the key from the index, and we only need to load the value data from the disk with just one disk seek. The data file is append only and compaction and segment merging are performed to reduce the storage

💡 What is compaction (of a key-value update log)?

For a log-structured database, compaction on the log file segments means throwing away duplicated keys in the log and keeping only the most recent update for each key.

💡 How to handle partially written records in hash index?

Use checksums to delete the corrupted parts of the log

💡 How to handle the data deleting in hash index?

Append a special deletion record to the data file (called tombstone). When log segments are merged, the tombstone tells the merging process to discard any previous value for the deleted key

💡 What is checksums?

A checksum is an alphanumeric value that uniquely represents the content of a file. Checksums are often used to verify the integrity of files downloaded from external source.

💡 What are the limitations of hash index?

💡 What are the two ways to solve the issue of a key-value index where the key is not unique?

💡 What is the reason that in-memory data storage performs faster than disk-based data storage?

The overhead resides mainly in the data encoding and decoding process as data stored in very different structure in memory and in disk. Surprisingly the read speed to memory and disk is not the deciding factor although they are in huge difference. (Anyway, the disk-based storage could avoid read from disk given enough memory provided and data cached)

💡 What is LSM tree?

Log-structured Merge-tree. Storage engines that are based on the principle of merging and compacting sorted files are often called LSM storage engines.

💡 How does the LSM-tree storage engine works (SSTable + Memtable)?

💡 What is a memtable?

It is a in-memory cache of the latest set of record writes applied to the database. It is normally in a data structure (e.g. red-black tree) that supports log(n) time for insert and lookup. The memtable will be write to disk as an SSTable after the size passes some threshold.

💡 How does the LSM-tree storage engine improve the read efficiency for non-existing key?

Use a bloom filter to check if key exists before searching for the keys in memtable and SSTables

💡 How does a LSM-tree storage engine handle server crash?

The server crash will cause the lost of memtable. In this case, the service will keep a separate append-only log file in disk to record each operation write to the memtable. This is only used to recover the memtable from crash.

💡 How does a B-tree storage engine work?

💡 What is a heap file?

When the db index stores the value’s identifier, the actual data of the value is stored in a place called heap file. Often it stores data in no particular order. It is common because it avoids duplicating data when multiple secondary indexes are present.

💡 What is a clustered index?

It is refer to the type of index that the value stored directly with the key in the index, instead of an identifier of the value stored. This will ensure a better performance than the later. One example is that MySQL InnoDB storage engine, the primary key of a table is always a clustered index, and the secondary indexes refer to the primary key (not using heap file)

💡 Compare B-trees and LSM-trees?

💡 Why some wide-column database e.g. BigTable, Cassandra, Hbase are not column-oriented database?

These database are not column stores in the original sense of the term, since their two-level structures do not use a columnar data layout. They introduces the column-family concept. Within each column family, they store all columns from a row together, along with a row key, and they do not use column compression.

💡 What are OLTP and OLAP? What are the main differences?

Online transactional processing vs online analytical processing.

Highlighted Topics

I find the discussion about the LSM-tree storage engine is really interesting and inspiring in this chapter. Here I use illustrations to describe how a simplest storage solution (a plain file) can be extend and evolved to an efficient and full-functional storage engine.

Simple Storage

A plain text file as a simple storage engine is illustrated below. The write operations (insert/update/delete) are always to append to the end of the log file. The read operation is always start from the beginning and scan to the end. This results to a simple structure, extremely terrible read performance and unbeatable efficient write performance.

Hash Index Storage

To improve the read efficiency, the concept of hash index is introduced. It is a in-memory hash map to store the key and the offset to the log file where the value is appended. Use the offset information, one disk seek is needed to fetch the value. Compaction and merging strategy is used to reduce the log file size resulting from the append-only write strategy.

LSM Tree storage

One problem of hash index storage engine is that the hash index has to fit into the memory. Memtable and SSTable are introduced in LSM-tree storage to solve the problem. Sorted key concept here is very important. It makes the searching and merging efficient with the sorted nature of the key. Bloom filter and write-ahead log (WAL) are some common technics to improve the performance and reliability.

Extended Topics

B-tree and B+tree

B-tree is the most widely used indexing structure. It was introduced in 1970 and have stood th test of time. It remains the standard index implementation in almost all relational database, and many nonrelational databases too (one of the popular - MongoDB). How B-tree works?

To better visualize:

What is B+tree and what is the advantage of that?

B+tree has almost the same structure as B-tree, but it has the leave nodes linked to each with pointers. This gives a huge advantage in range-search since we could traverse through the leave nodes directly instead of doing a backtracking search for each leave node within the search range.

Buy Me A Coffee