I picked this book up after reading Fred Brooks’ The Design of Design. I had posted a review previously. My intention was to see database design from a beginner’s perspective. I had been exposed to C.J.Date‘s An Introduction to Database Systems, Handbook of Relational Database Design, and Applied Mathematics for Database Professionals. The latter is supposed to be the follow-on book from Churcher’s book according to the APress RoadMap. I think the gap between these two (2) books is just too great.
While I was reading Churcher’s book at the local coffee shop, the owner came up to me and started talking about database design. He had done some design himself in payroll systems. He did not elaborate. He was fascinated by the concept of Business Intelligence. He was enthralled of a TV show called Numb3rs in which the stories about determining human behaviour from numbers. I was dubious because I have done several modelling experiments myself and realise the difficulties in determining the model. But it is so hard to counter unbridled enthusiasm for just pumping numbers into a computer and getting a magical answer out. So I did not try.
Churcher starts straight off on why getting the database design wrong is so bad:
- “…the inability to make the best use of valuable data.” (p. xxii)
- “…inaccuracies in the data.” (p. xxii)
She says that:
The first thoughts about how to design a database may be influenced by a particular report or by a particular method of input. This can lead to a design that cannot cope with different requirements later on. It is important to think about the underlying data and design the database to reflect the information being stored rather than what you might want to do with the data in the short term. (p. 9)
The last database project I had to install in a production environment had this problem being partially designed. It was designed beautifully for the loading of data from multiple data sources. The problem was that the enquiry of data was a nightmare. Simple queries involved eleven (11) and nine (9) table joins! That alone told me that the data model was stuffed. I think the data extraction from the loaded data was being done on the fly.
There are really four (4) data models involved in this project:
- data source;
- data cleansing;
- reference data; and
- data enquiry.
The data sources have their own data models. It is usually designed for rapid data collection. The cleansed data is the collected data corrected for inaccuracies and inconsistencies against the reference data. The enquiry data model is based around how the end-user perceives the data. The problem is how to roll all of these competing data models into one data model. It is this goal that creates so many problems for data modellers.
This model is mine, not Churcher’s, and derives from a combination of the ETL (Extract, Transform, and Load) and data marts. All databases are really data warehouses. Only the size varies. The thinking behind the design should reflect this.
On p. 12, she provides a simple model of her process. This is a framework for doing the database design around the three (3) steps (p. 29) of
This is really a follow-on from Yourdon and Constantine‘s Structured Design which evolved into the Waterfall Model. All of the Agile Developers are going to freak out at this stage. I think the problem is that people tend to view the database design as a monolith.
The approach taken by Churcher is to follow the Object Analysis path and create a relational model from that. This approach is fraught with danger as the object model is hierachical while the relational model is democratic.
In Chapter 8: Normalization, she says that:
- “Each class is represented by a table.
- “Each object becomes a row in a table.
- “For each table, we determine a primary key, which is a field(s) that uniquely identifies each row.
- “We use the primary key field(s) to represent relationships between tables by way of foreign keys.” (p. 139)
I think Churcher does a good of explaining Normal Forms (pp. 145-155), Functional Dependencies (pp.151-152), and Join Dependencies (pp. 153-155). The latter two concepts are usually found in advanced database theory books such as Date’s one.
Churcher does a quick overview of the implementation process which I think is good enough for beginners.
Now, my quibbles are:
- No direction for readers to develop their knowledge further. An annotated bibliography would have been nice.
- The use of Selection (pp. 173-174) when the correct technical term is restriction. This will make further reading difficult as those who want to develop their knowledge will be faced with a problem in terminology.