Reflections on NULLs in Response to “Much Ado About Nothing” by van Wijk


From @eddieawad, the unending DBA controversy about NULLs continues at Much Ado About Nothing? by Rob van Wijk.

My quick response is that NULLs do NOT belong in the logical design (or data model) of a database, but they are useful in the implementation.

Several commentators tried to emphasise that NULL means “Unknown”. My response is good luck with that because I have seen NULLs mean, as well, “Missing” and “Not Applicable”.

The example, by Rob van Wijk, is based on Hugh Darwen’s paper, How To Handle Missing Information Without Using NULL. I think van Wijk misses the distinction between data model and implementation when he writes that:

On the other hand, the theorists don’t have to build database applications for end users who like reasonable response times, and I do. Avoiding nulls at all cost typically leads to a data model that has more tables than needed, requiring more joins and therefore making queries slower. So I have to make a trade off. In general I try to avoid nullable columns as much as possible, for example by chosing subtype implementations instead of supertype implementations, and by modelling entity subtypes in the first place, but I will never let it noticeably slow down my application. At my current job, I’m making a data model right now. Having read all use cases, I know how the data will be used and so I know where in the model there is room to avoid an extra nullable column. One thing I’ll never voluntarily do though, is make up strange outlier values just to get rid of the null.

Emphasis Mine

Darwen (2006) proposed a data model, while van Wijk is talking about the implementation. I am following the definitions from Date (2004, p.52):

Definition: A Data Model (first sense) is an abstract, self-contained, logical definition of the objects, operators, and so forth. that together constitute the abstract machine with which the users interact. The objects allow us to model the structure of data. The operators allow us to model its behaviour.

Definition: An implementation of a given data model is a physical realization on a real machine of the components of the abstract machine that together constitute that model.

(Italics and emphasis in original)

Date (2004, p.66) appears to criticise SQL products by promoting this confusion when SQL statements, such as CREATE TABLE, conflates the logical structure of a table with its physical implementation—he calls it a direct-image system. He says that [t]hose direct image systems suffer from a huge number of problems, including serious performance and complexity problems.

When I get time, I should repeat van Wijk’s experiment with my own implementation.

Bibliography

Darwen, H. (2006), How To Handle Missing Information Without Using NULL, http://www.dcs.warwick.ac.uk/%7Ehugh/TTM/Missing-info-without-nulls.pdf, viewed 30 May 2012.

Date, C.J. (2004), ‘On the Logical Difference Between Model and Implementation’, in Date, C.J. (Ed.), Date on Database: Writings 2000-2006 (2006), APress: USA.

van Wijk, R. (2012), Much Ado About Nothing?, http://rwijk.blogspot.com.au/2012/05/much-ado-about-nothing.html, viewed 30 May 2012.

Advertisements

One thought on “Reflections on NULLs in Response to “Much Ado About Nothing” by van Wijk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s