The book ‘Design Data Intensive Application’ is a really popular book recently years. Why? The reason is that it is relatively ‘easy’ for the inexperience application software engineers to follow and get a glance of what distributed world looks like. This is my second time to read this book. The first time reading was kind of a casual glance through and I gave up halfway. After I reflected on my learning strategy, I come up with a plan for me to read and digest this book.
I will share the learning notes in a series of blogs. In each post, I will start with a brief Chapter Summary, followed up a Mind Map that summarizes the main topics and its relations. The topics in the mind map are attached with a list of questions what can be answered by the knowledge in the book. The questions and answers are listed in plain text in the Questions Summary section. In the Highlighted Topics section, a few interesting topics in the chapter are picked up to explain in details. In the last Extended Topics section, one or two topics that is not covered in detail in the book is briefly discussed and explored.
This is the first post of the series.
This is the first chapter of the book. It explains the fundamental concepts of application design. In designing an application, we need to consider two aspects - functional requirement and non-functional requirement. Functional requirements vary case by case, however non-functional requirements can be generalized. In this Chapter, three most commonly discussed non-functional requirements of data application are explained in detail - reliability, scalability and maintainability.
1. What does scalability of software mean?
The ability of the application to be able to handle increasing load without downgrading performance
2. What does reliability mean for software? List four main descriptions
Generally means that the software is continuing to work correctly, even when things go wrong.
- able to perform the functions as required
- able to perform with the expected load
- tolerant user mistake (error handling)
- prevent unauthorized access (security)
3. What does an ‘elastic’ system mean?
A system that is able to automatically scale up or down by detecting the load change
4. What does ‘tail latency’ mean and why is it important?
Tail latency refer to the high percentile response time (p95, p99 and p999). It is important as they affect the users’ experience of service, most often the more valuable customers’ experience as they are more likely to experience higher latency with large data volume or high throughput
5. What does ‘fan-out’ mean in the software application context?
In the transaction processing system, we use it to describe the number of requests to other services that we need to make in order to serve one incoming request. (This is a term borrowed from electronic engineering, where it describes the number of logic gates inputs that are attached to another gate’s output. The output needs to supply enough current to drive all the attached inputs)
6. What are the three major types of errors that cause reliability issues of a software?
- hardware fault: the failure of the computer hardware, it is always happening but the well-designed application normally should be able to cope with the single hardware instance failure
- human operation mistake: this is the leading cause of outages according to research
- software bugs or performance issues: it is much harder to be detected and prevented
7. What are the three main design principles for a maintainable software?
- operability: easy for operation teams to keep the system running
- simplicity: easy for new engineers to understand and contribute
- evolvability: easy for engineers to make changes in the future
8. What are the examples of new tools that do not fit exactly into the traditional data systems (databases, queues, caches)
- redis: a data stores that can be used as message queues
- kafka: a mq that has database-like durability
9. What are the common functionalities of a data-intensive application?
- store the data for later retrieval (database)
- remember the result of expensive operation to speed up read (cache)
- allow users to search data by keyword and filters (search index)
- stream data to other process for it to be handled asynchronously (stream processing)
- periodically crunch a large amount of data (batch processing)
10. What are some of the commonly used load parameters when describing the load of an application?
- query per second of web service (qps)
- ratio of reads to write (read/write ratio) of database
- hit rate on a cache (cache hit ratio)
- number of simultaneous users (concurrency / throughput)
11. How to define the “percentiles” of response time?
- p50 (50th percentile) refers to the halfway point of the response time after sorting the response times from fastest to slowest. This means that half of the users experience the response time less than the p50 response time.
- similarly p95, p99, p999 refers to the 95%, 99%, and 99.95% percentile response time of the system
Twitter’s timeline fan-out architecture
A classic architecture design problem is discussed in this chapter to demonstrate how an architecture decision should be made based on the load pattern of a service. The example is the Twitter’s home timeline service.
There’s two most commonly used operations by twitter users - post tweet and load the home timeline. The load to these two operations are (in 2012) are 4.6k qps (12k peak) for post write and 300k qps for home timeline read. From the data we can observe that the read operation has two orders of magnitude higher than the write. There are two approaches to support the home timeline read:
1. Query on read
This approach queries the home timeline post when read request is received. The querying logic is represented in the relational database tables as shown below.
2. Cache on write
Another approach is to cache the home timeline on write. In this architecture, the system maintains a home timeline cache for each user which works like a mailbox. When a user post a tweet, the system will fan out the tweet (typical a tweet id) to the cache of all the followers of this user. In this case, the heavy computation is shifted to the write operation, whereas the read operation is just a simple query to the cache.
Since the application is ready-heavy, it is better to do more work on write and less on read. In this case, approach 2 is preferable. However the approach 2 has its own constraint that in cases that a user with large number of followers (e.g. celebrities), a single tweet post write will generates a massive number of following writes to the cache, results a huge spike of load to the cache. So twitter had worked on approaches that combines approach 1 and approach 2 to achieve optimized result.
There’s some more interesting topics covered in the infoQ talk Timelines at Scale, where Raffi Krikorian explains design trade offs in time timeline fan-out architecture, compared it with the search architecture and the possibility of merge them to support challenging use cases. See some notes I summarized from the talk in this section.
Percentiles of response times
Percentiles is a very important concept in describing the performance of a real-time service. Since the response time of a real-time service varies due to a lot of random parameters (disk mechanical vibration, network vibration, GC and software bugs), it is not possible to user a single number to describe the performance of such a service. In this case, a percentiles is a very useful term - sort the response time from fastest to slowest, the median in the sorted list is the 50th percentile (p50), similar, the p95, p99 and p999 are the 95th, 99th and 99.9th positioned data in the list. This number tells us how well the end users’ experience is.
In reality, tail percentiles (p95, p99, p999) should be seriously concerned. Because normally the clients with large data volume or higher throughput are more likely to experience higher latency, while these clients are those that are more valuable to business. In production practice, we are often monitoring and alerting on p99 or p999 latency based on how we define our SLA and SLO.
More on Twitter’s architecture
Notes from Raffi Krikorian’s infoQ talk Timelines at Scale
1. How timeline is supported
- user timeline stores in database, but home timeline is stored in cache only. The cache mechanism is fan-out on write, the fan-out happens asynchronously
- product trade-off: the timelines stops at 800 tweets because the fan-out service push the post to redis for cache. The 800 is a product trade-off that limited by the technical constraint
- fan-out only cache for active users
See the simple architecture diagram shared in the talk
- Gizmoduck: the user service. It has its own cache. It actually has the entire user base in memcached cluster
- TweetyPie: tweet post service, it has about one month and half data in cache in memcached cluster
2. How search is supported
- three tiers: ingester -> earlybird -> blender
- ingester: tokenization and construct the content to be indexed
- earlybird: a modified Lucene to support the index, replicated to balance the load. The Lucene cluster stores in ram
- blender: the search service which queries each shard of the earlybird and construct the query result through sorting, filtering and reranking.
3. Compare timeline and search
Fan-out and search are two paths that are behave opposite to each other, as shown
To solve the expensive fan-out cost of high value users (users has massive number of followers), the experiment being done is to stop fanning out the tweets from those users, instead fanning out only the tail users. When querying home timeline of a user, the user’s home timeline data will be merged with the high value user’s user timeline and delivered.