[DDIA] Chapter 2 - Data Models and Query Languages

Chapter Summary

This chapter briefly discusses three types of data models: relational, document and graph database. Some histories about database are provided to give us a better ideas why these databases are developed and what types of problems they solve.

Personally I think this chapter is loosely structured and provides too much detail about the query languages. Advice given to anyone who read this chapter: do not spend much time in understanding the query language syntax here, because the information given in the book will not help anyone to master a particular language (if you want to get into one query language, follow a well designed tutorial or a specific book). Instead, you should try to understand the philosophy of each type of data model and identify the scenarios to apply them.

Mind Map

Questions Summary

💡 1. What is ORM frameworks?

Object-relational mapping framework, e.g. Hibernate

💡 2. What is data locality?

Data locality is the process of moving computation to the node where that data resides, rather than moving data to save bandwidth. This helps to minimize network congestion and improve computation throughput.

In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time.

💡 3. What is RDBMS?

Relational database management system

💡 4. What are three types of ways to store one-to-many relationship in database?

normalized representation of the data: to store the data to multiple tables and use foreign key reference to link the data from one source to multiple others
store the data in the predefined data structure that supported by the data store engine, e.g. json format supported by MySQL or PostgreSQL. The data is being able to be indexed and queried.
store the data as pure text field and allow the application to interpret the data. In this case, the data will not be able to be indexed.

💡 5. What are the typical use cases of graph-like data models?

social media
web graph
road or rail networks
recommendation engine

💡 6. What are the difference between JPA and Hibernate and Spring Data JPA?

JPA (Java persistence API, nothing to do with Spring) is an interface (specification).

Hibernate is a JPA provider or the implementation.

Spring Data JPA is either an interface nor implementation, it is an abstraction used to reduce the amount of boilerplate code required to implement data access layers. Spring Data JPA cannot work without a JPA provider.

💡 7. What are schema-on-write and schema-on-read?

Schema-on-write refers to the data systems that requires the explicit definition of data schema. When data is write to the database, it will be verified against the defined schema, any content that does not conforms to the schema will not be allowed to write.

Schema-on-read refers to the data systems that does not require the explicit definition of data schema. The data schema will be interpreted from the data when it is read.

💡 8. What are NoSQL databases? What are the different types of NoSQL databases?

NoSQL database is a non-relational DBMS that does not require a fixed schema, avoid joins and easy to scale. There are four types of NoSQL database: Document databases (MongoDB), key-value stores (DynamoDB, Redis), wide-column database (Bigtable, Cassandra, Hbase) and Graph database (Neo4j).

💡 9. In a property graph database model, what are the two data concepts? And what does each of them consists of?

node (vertex) and relationship (edge)

node consists of labels and properties
relationship consists of a source node, a target node, a type and properties

💡 10. What does ployglot persistence mean?

When design an application data storage, choose multiple types of storage based on the different use cases of the components of the application.

💡 11. What is impedance mismatch?

This is the term used to refer to the problems that object-oriented programming language’s data model has a mismatch of the SQL database model, that an awkward translation is required between them.

ORM frameworks like Hibernate reduce the amount of boilerplate code for this translation layer.

Highlighted Topics

Property Graph model (Neo4j and Cypher) and Triple-store model (SPARQL) are two classic graph data models that is widely used. Here’s a brief summary and comparison of these two data model.

Extended Topics

I had some hands on with Neo4j beginner’s tutorials, however I felt little I can learn and master without a practical project to work on. Unfortunately I have not got a chance to use graph database in production level projects yet. It is just valuable additions to my toolbox for building applications in the future.