IIUG Conference 2017


Thursday, January 16, 2014

The myth of the new and other fallacies

In Kohelet (aka Ecclesiates) King Solomon of ancient Israel wrote:  
What has been is what will be, and what has been done is what will be done, and there is nothing new under the sun.
No, I haven't gone all proselytizing on you, I just want to make a point.  Most of the "new" technologies that everyone is so hot to get into to save the world are not new at all.  Some are even very old.  Many were tried in the past and while some were good ideas and successful in their first incarnations, many were either not successful at all or were later supplanted by other technologies that got the job done better.  A few examples:

Object Oriented Languages

I know, '80s technology, not new anymore.  But they are still 'hot' and coders coming up today don't know their history, so here's some.  Object oriented languages like C++, C##, Java, etc. encapsulate the operations associated with a data structure together with the structure itself.  In the 1960s, two new programming paradigms emerged.  These were Structured Programming and Modular Programming. Structured programming concentrated on eliminating non-linear controls from software code.  It was characterized most simply as "No goto statements, anywhere, anytime!" and was championed by such notables as C.Bohm, G. Jacopini, Edsgar Dijkstra, Michael Jackson (no, not the singer) and Issac Nassi and Ben Schneiderman (who together developed the Structure Chart or Nassi-Schneiderman Chart sometimes called Chapin Chart).

At around the same time other notables like Niklaus Wirth (the developer of Pascal, Modula2, Modula2, Oberon, and the Lilith computer system) were proposing the concept of Modular programming.  The idea was to encapsulate distinct operations into subroutines and to gather like subroutines with similar or related functionality and/or purpose into reusable concrete units or 'modules'.  Wirth went so far as to develop a programming language that supported both structured programming and modular programming techniques (and did not include a GOTO statement at all!)  To this day, Modula2 is still my favorite of the dozen or more programming languages I have learned and used.

These two paradigms sought and succeeded to reduce the complexity and redundancy in large software projects and to permit larger groups of programmers to cooperate in the production of complex projects.  Combine Structured Programming with Modular Programming and add language support for their concepts and voila you have an Object Oriented Language.

Columnar Database

In the early 1980's I needed to write a custom database for an application.  After research I found something that matched the requirements of the project perfectly.  It was called, variously, and Inverted Index, Inversion Table (no, not the exercise/torture device), or Inversion Database.  The idea was to build indexes of all of the fields of a set of data records without having the entire record present at all, only the indexes.  this meant that as long as you only wanted the values from a subset of the fields in a record you could access the data very quickly with reduced IO which was very expensive at the time.  Can you say Columnar Database?

Key:Value Store or Database

In 1979, Ken Thompson of UNIX fame wrote a database library for UNIX called dbm which used a hash table of key values to index data records.  The records had no particular structure.  Any interpretation of the contents of the record were application dependent.  This library is the UNIX dbm database library system.

In 1986 the Berkeley UNIX team developed ndbm which added the capability to have more than one database open in an application.  Unlike dbm, ndbm stores the data value in the hash table directly which limits the size of the key:value pair and of the database as a whole.

In the late 1980's and early 90s Margo Seltzer and Ozan Yigit worked to improve on dbm and ndbm and created Berkley DB.  BDB was still a hash managed key:value store.  It's main innovation was the ability to be accessed concurrently by multiple users without corrupting the database.

This is virtually the same as the "ground breaking" technology in MongoDB, Cassandra, Virtuoso, etc.

Graph Database

In 1969 Charles Bachman published a paper describing a new database model to replace the popular herarchical model at the Conference on Data Systems Languages (CODASYL).  He named it the Network model.  Wikipedia defines the network model as database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.

Wikipedia defines a graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. By definition, a graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent element and no index look ups are necessary. 

Sounds awful familiar to me.  By the late 1980's network databases and hierarchical databases were largely replaced for general purpose database use by databases that ostensibly followed the Relational Model described by Edgar F. Codd in a paper published in 1969 because their schema were fixed in the physical implementation of the data making it virtually impossible to make changes to the schema as applications evolved.

Getting to the Point

The point of this blog post is that there were reasons that we used these technologies "back in the day" (as my favorite reality TV stars say) and there were reasons that we abandoned them and moved on.  Some of these technologies never went away because they have good and valid applications to solve specific data storage and retrieval problems.  But these were not the technologies that conquered the database world.  The Relational Model, or as several friends of mine insist, a poor implementation of something that vaguely resembles the Relational Model and is more properly called the SQL Model, won that battle and for good reason.  So, let's not all jump on the same bandwagon without doing due diligence and examining in detail what technology best fits the problem.  

One of my favorite sayings is "Use the right tool for the right job!"  I am also a big proponent of avoiding the Yankee Adjuster Problem where, as the saying goes, "When all you have is a hammer, everything starts to look like a nail!"

So, let me finish as I began, there is nothing new under the sun!  Or as I am fond of replying when asked "What's new?" - "There's nothing new, just the old stuff come back to haunt us!"