The book ‘Design Data Intensive Application’ is a really popular book recently years. Why? The reason is that it is relatively ‘easy’ for the inexperience application software engineers to follow and get a glance of what distributed world looks like. This is my second time to read this book. The first time reading was kind of a casual glance through and I gave up halfway. After I reflected on my learning strategy, I come up with a plan for me to read and digest this book.

I will share the learning notes in a series of blogs. In each post, I will start with a brief Chapter Summary, followed up a Mind Map that summarizes the main topics and its relations. The topics in the mind map are attached with a list of questions what can be answered by the knowledge in the book. The questions and answers are listed in plain text in the Questions Summary section. In the Highlighted Topics section, a few interesting topics in the chapter are picked up to explain in details. In the last Extended Topics section, one or two topics that is not covered in detail in the book is briefly discussed and explored.

This is the third post of the series.

Chapter Summary

This is an interesting and exciting chapter! Finally it starts to talk about data systems. The discussion starts from the illustration of how data store can be done with a simple text file. Extending from the concept, a simple storage engine with appending-only log is introduced - hash index. The basic knowledge makes the understanding of the classic LSM-tree storage engine easy. Complement to the LSM-tree storage engine, another classic B-tree storage engine is discussed in detail.

In the second section of the chapter, the difference between OLAP and OLTP databases is discussed. Diving deeper into the OLAP and data warehouse concept, column-oriented data storage engine is introduced.

Mind Map

Questions Summary

1. How does Hash Indexes work?

Key-value stores are used (similar to the hashmap data structure). The key is the index key and the value is byte offset of the value in the data file. The hash index is stored in memory for fast access. When reading the data from the index, we can read the byte offset of the key from the index, and we only need to load the value data from the disk with just one disk seek. The data file is append only and compaction and segment merging are performed to reduce the storage

2. What is compaction (of a key-value update log)?

For a log-structure database, compaction on the log file segments means throwing away duplicate keys in the log and keeping only the most recent update for each key.

3. How to handle partially written records in hash index?

Use checksums to delete the corrupted parts of the log

4. How to handle the data deleting in hash index?

Append a special deletion record to the data file (called tombstone). When log segments are merged, the tombstone tells the merging process to discard any previous value for the deleted key

5. What is checksums?

A checksum is an alphanumeric value that uniquely represents the contents of a file. Checksums are often used to verify the integrity of files downloaded from external source.

6. What are the limitations of hash index?
7. What are the two ways to solve the issue of a key-value index where the key is not unique?
8. What is the reason that in-memory data storage performs faster than disk-based data storage?

The overhead resides mainly in the data encoding and decoding process as data stored in very different structure in memory and in disk. Surprisingly the read speed to memory and disk is not the deciding factor although they are in huge difference. (Anyway, the disk-based storage could avoid read from disk given enough memory provided and data cached)

9. What is LSM tree?

Log-structured Merge-tree. Storage engines that are based on the principle of merging and compacting sorted files are often called LSM storage engines.

10. How does the LSM-tree storage engine works (SSTable + Memtable)?
11. What is a memtable?

It is a in-memory cache of the latest set of record writes applied to the database. It is normally in a data structure (e.g. red-black tree) that supports log(n) time for insert and lookup. The memtable will be write to disk as an SSTable after the size passes some threshold.

12. How does the LSM-tree storage engine improve the read efficiency for non-existing key?
13. How does a LSM-tree storage engine handle server crash?

The server crash will cause the lost of memtable. In this case, the service will keep a separate append-only log file in disk to record each operation write to the memtable. This is only used to recover the memtable from crash.

14. How does a B-tree storage engine work?
15. What is a heap file?

When the db index stores the value’s identifier, the actual data of the value is stored in a place called heap file. Often it stores data in no particular order. It is common because it avoids duplicating data when multiple secondary indexes are present.

16. What is a clustered index?

It is refer to the type of index that the value stored directly with the key in the index, instead of an identifier of the value stored. This will ensure a better performance than the later. One example is that MySQL InnoDB storage engine, the primary key of a table is always a clustered index, and the secondary indexes refer to the primary key (not using heap file)

17. Compare B-trees and LSM-trees?
18. Why some wide-column database e.g. BigTable, Cassandra, Hbase are not column-oriented database?

These database are not column stores in the original sense of the term, since their two-level structures do not use a columnar data layout. They introduces the column-family concept. Within each column family, they store all columns from a row together, along with a row key, and they do not use column compression.

19. What are OLTP and OLAP? What are the main differences?

Online transactional processing vs online analytical processing.

Highlighted Topics

I find the discussion about the LSM-tree storage engine is really interesting and inspiring in this chapter. Here I use illustrations to describe how a simplest storage solution (a plain file) can be extend and evolved to an efficient and full-functional storage engine.

Simple Storage

A plain text file as a simple storage engine is illustrated below. The write operations (insert/update/delete) is always append to the end of the log file. The read operation is always start from the beginning and scan to the end. This results to a simple structure, extremely terrible read performance and unbeatable efficient write performance.

Hash Index Storage

To improve the read efficiency, the concept of hash index is introduced. It is a in-memory hash map to store the key and the offset to the log file where the value is appended. Use the offset information, one disk seek is required to fetch the value. Compaction and merging strategy is used to reduce the log file size resulting from the append-only write strategy

LSM Tree storage

One of the problem of hash index storage engine is the hash index has to be fit into the memory. Memtable and SSTable are introduced in LSM-tree storage to solve the problem. Sorted key concept here is very important. It makes the searching and merging efficient with the sorted nature of the key. Bloom filter and write-ahead log are some common technics to improve the performance and reliability.

Extended Topics

B-tree and B*-tree storage engines

TODO; just want to publish this post sooner to motivate myself to continue to write