IIUG World 2019

IIUG 2020 Online World Conference Presentations will begin soon!

Wednesday, November 19, 2014

On Hybrid Database Development

At the recent IBM Insight conference in Las Vegas I presented a session entitled: 

"My Data is Relational But My Coders Want to Use JSON!  Help!"

Over 30 development managers and developers attended.  When asked most said they attended because this was a common problem that they are all either dealing with already or expect to have to deal with soon.  

Here, as well as I can translate a presentation to text, is what I had to say:

I assert that JSON and other semi-structured data formats present unique challenges for organizations. In order to take advantage of these new structures and the development paradigms that they support and encourage and integrate them into our existing systems, we wrestle with three alternatives:

  1. Keep these new data and the apps that utilize them independent of our legacy data and keep our legacy apps independent of this new data
  2. Perform ETL/ELT frequently to maintain all of this data in both relational and semi-structured databases.
  3. Develop applications that can access multiple data sources concurrently.
I maintain that none of these alternatives is acceptable and I want to propose a better path to future application and database development. First a digression to define terms which you can feel free to skip.

There are many classes of data:

  • Structure data
  • Unstructured data
  • Semi-structured data
  • Time stamped data composed from any data class
  • Geographically located data composed from any data class
  • Spatially located data composed from any data class
Structured Data refers to the data we have in relational databases for the most part. Despite the predictions of pundits relational databases are here to stay.  There are several well defined application segments for which relational databases are the best tool for the job.

Semi-Structured Data is data that may have a natural structure to it but where the structure may vary from one data element to another.  This can also include elements where part of the information is structured while other sub-elements are unstructured.

Currently the most popular format for semi-structured data is known as JSON for JavaScript Object Notation.  JSON is a key:value record or document format that is quickly becoming a new standard for data interchange replacing CSV, XML, and other interchange formats.  JSON is the basis for the RESTFUL web service protocol.  It is more space efficient than XML and easier to parse.  In addition, it is supported by a binary format, BSON, that is even faster to parse and extract individual sub-elements.  JSON is a dynamic schema format where each record, or document in JSON terminology, contains the schema details for the document which can be different from one document to another.  This promotes and supports RAD or Rapid Application Development.

MongoDB has become the most popular of the JSON databases or stores.  It saves JSON documents in BSON format and produces JSON output to consuming clients.  There are several sets of interoperable development tool sets known as "stacks" the most popular of which is call the MEAN stack as well as client libraries for most common languages.

MEAN stands for MongoDB, ExpressJS, AngularJS, and Node.js.

Class time over:

On to the problem.  We all have TBs of relational data.  Our experienced developers already know SQL and our schema. Some new development still fits with structured data. There is no sense in moving OLTP or structured DW data out of relational systems.  There is no financially supportable way to rewrite all of our existing systems from scratch just to take advantage of new paradigms.  Many new applications written primarily to use semi-structured data will need to present the relational data we've been collecting for the past 30+ years.

Many of us are capturing TBs of semi-structured data already. New developers that we want to hire only want to work in the NoSQL space.  Some new applications are natural fits for semi-structured data. RAD techniques require schema flexibility to succeed.  Many new and existing relational based applications could take advantage of all of that semi-structured data to expose new features without a massive redevelopment effort.

How to get relational data into MEAN stack applications and JSON data into C/C++/C# applications is the problem.  Traditionally when faced with data in multiple silos we would either use ETL techniques to copy the data from one silo to another or we would write data access layers using multiple libraries.  

ETL is time consuming and error prone.  We may have to sacrifice precision or internal relationships in the target system. Maintaining a timely, consistent, view is a major constant effort and expense.  Data duplication is itself an expense.  Fast storage is not cheap!

We can't just move everything into MongoDB either.  MongoDB like most NoSQL stores do not support multiple operation transactions, inter-object relationships (references and sub-documents are not relationships), XA transactions (so they cannot participate in transactions across multiple silos).  

The solution:

The solution is Hybrid Application Development using a Hybrid Database system using both traditional and RAD development tools.  Here is what is needed from a hybrid database for this to work:
  • Access from traditional development stacks (C, C++, C#, Perl, etc.)
  • Access from MEAN and other RAD development stacks
  • Full ACID compliant transaction support for all data
  • Full relational integrity and data normalization support for all data 
  • Ability to store structure data with a predefined schema
  • Ability to store semi-structured data with a dynamic schema
  • Ability to present both types of data to both stacks in the native format of the data
    • JSON as JSON
    • Tables as tables
  • Ability to present both types of data to each stack in its native format
    • JSON as table data for traditional tools
    • Table data as JSON for RAD tools
Some NoSQL stores can handle ACID consistency but most can only manage "eventually consistent" transactions.  Most, like MongoDB, can only guarantee consistent transactions and rollbacks for a single document, not for multiple documents in a collection nor for transactions that span multiple collections. Some NoSQL stores can support SQL or an SQL-like query language, however, they do not return data as rows and columns which is what is expected by SQL database access code. 

Some RDBMS systems can:
  • Store JSON (or a binary form of it)
  • Store a collection as a collection (rather than as a column type)
  • Manipulate JSON fields within a document
  • Create indexes on JSON fields within the documents in a collection
  • Support referential integrity between JSON document fields and relational columns
  • Join JSON documents to relational tables
But most cannot do it all.  I believe that these capabilities are the future of database and application development. Today there is one RDBMS product that CAN do it all:
  • Accept connections from MEAN stack and other MongoDB clients without modifying the application code
  • Accept connections from relational SQL clients
  • Support transactions on relation tables and JSON collections that include multiple tables/collections and multiple rows/documents
  • Allow joins between JSON collections and other JSON collections or relational tables
  • Enhance MEAN applications to support issuing SQL statements and return data in JSON format for those clients
  • Permit SQL clients to treat fields in JSON collection documents as ordinary columns
  • Support JSON and BSON as class 1 native data types for use as columns in relational tables or as a type to define a collection
  • Direct RESTFUL interface to expose all data as web services without middleware
  • Store JSON collections as collections
  • Permit any MongoDB aware client to treat relational tables collections
  • Support most MongoDB DDL and server management commands (including sharding)
  • Auto create databases and collections just as MongoDB does, on the fly
I know I'm sounding like a salesman, but I have nothing to sell you other than an idea and my help to achieve it in your organizations, so, bear with me.  What is this magical database system?  It is the latest incarnation of one of the first RDBMS products in the marketplace:

Informix Dynamic Server v12.10.xC4 from IBM

Informix, aka IDS, has all of the required features I mention above but it brings much more to the table:
  • Larger documents than MongoDB
  • Larger databases than MongoDB
  • Industry leading data replication technology
  • Four classes of data replication that can all work together
  • More capable time proven sharding technology than MongoDB
  • Data distribution and centralization capability
  • Support for Timeseries data
  • Support for Geospatial data
  • The only database system that can combine Timeseries and Geospatial data to track your IoT data through SpaceTime
  • User defined data types that perform as well as native types
  • World class OLTP transaction rates
  • Traditional and MEAN stack applications can connect and use both JSON and SQL data in their own native formats
  • Centralized databases up to 128PB
  • Distributed databases without size limits
  • Server fail over and load balancing fully configurable using SLA specifications
  • Five 9's reliability
  • Near Zero downtime
  • Upgrade server versions without downtime
  • Instantly bring up additional servers to load balance during peak periods
  • Support large numbers of concurrent transactions
  • Row level locking (MongoDB locks entire collections only)
  • Extremely tunable engine
  • Autonomic features for low maintenance overhead and to permit unmonitored operations
  • Highly embeddable
  • Optional compression of data and index keys
  • Advanced query optimizer
  • Multiple tenancy
  • Informix Warehouse Accelerator returns data up to 1200X faster than the base server for complex queries over huge data sets
  • Timeseries in JSON
  • GeoJSON support in addition to native GeoSpacial support
  • Lucene text search on JSON documents and relational tables
  • High speed data loading technology for faster loads of streaming data
  • MQTT integration to link to IoT devices

Wrap up:

The future of software development will require hybrid applications!
The future of software development will require hybrid databases!

The future is here now!
Why wait?

Monday, November 3, 2014

China is standardizing all new database applications on Informix!

Here's the "world" shaking news I promised. IBM and China have come to an agreement whereby IBM will share the code for Informix with a Chinese software house, GBASE, who will modify its security code to conform to Chinese government standards. In turn China will be building all future database projects using Informix.

The agreement allows for sharing innovations to the Informix code made by either party to be shared by both. If anyone wanted proof that Informix is here to stay and is NEVER going away, this is it. China is the world's second largest market and probably the fastest growing technology market. The government of China does not want any software or hardware used for government projects or enterprises that it does not control, fearing Western spying through the systems it uses. So, they are standardizing on Informix. Cool!

IBM Press release dated: October 29, 2014

  GBASE and IBM to collaborate on locally innovated database in China 

IBM (NYSE: IBM) and General Data Technology Co.,Ltd, known as GBASE announces an agreement today, to create a locally innovated database product for the China marketplace. In support of the China government's national technology agenda, GBASE will develop and sell this local innovation based on IBM Informix technology. The agreement extends the reach of IBM’s Informix database technology into this market by providing a localized solution to better address the rapidly evolving needs of Chinese businesses. The agreement permits GBASE to build and license their own version of IBM Informix database, which will be offered to clients in the Chinese market as a stand alone product solution. The China market for database technology is estimated to be in excess of $700m according to IDC. The partnership between IBM and GBASE can fuel the growth of this market by creating a best in class database solution tailored to the unique requirements of the China marketplace. “This agreement confirms IBM’s innovation and commitment to growth in emerging markets in general and China specifically”, states Sean Poulley, Vice President, IBM Databases and Database Management. "Our intent is to help partners and clients gain competitive edge and transform their business with innovative database technologies in China, with China, for China”. Informix is one of the world’s most widely used databases, supporting clients who range from the largest multinational corporations to many regional and local small businesses. It is widely deployed in the retail, banking, manufacturing and government segments, all of which are important growth areas in China today. Informix is well known for its innovative design, which enables a single platform that powers both OLTP and OLAP workloads for real-time analytics, scales easily for cloud environments and provides continuous availability. It's renowned for extremely high levels of performance and availability, distinctive capabilities in data replication and scalability, and minimal administrative overhead. With this partnership agreement these advantages will be made more readily available to the fast-growing Chinese market.

Sunday, November 2, 2014

News from the Informix World

And I do mean world. Well it's been a long time since my last post. Over six months. I just got back from the IBM Insight Conference in Las Vegas NV (formerly the IBM Information On Demand Conference) and there is some big news. IBM announced their new vertical cloud development stacks called BlueMix. There is a BlueMix for timeseries data, BlueMix for mobile apps, BlueMix for in-memory Warehouse data analytics, BlueMix for Watson, BlueMix for ... well you get the idea. The stacks include access to a cloud server with an application appropriate database system (Cloudant, DB2 BLU Accelerator, Informix, etc.), development tools, access to IBM and some third party analytics tools, etc. You can create a BlueMix account for free and try it out. If you like it and develop something on it for yourself, your company, or for distribution then you can work out a commercial deal with IBM for unlimited access and more resources. "Wait", you say, "did you say Informix as part of an IBM Cloud service offering? They just slapped that in there to make us noisy Informix grumps happy. Right?" Nope. If you set up a BlueMix for Timeseries Data configuration you get a fully functional Informix database server (not just timeseries support, though it's not clear if this includes IWA). While DB2 is available, you have to ask for it, the only full function RDBMS with a BlueMix configuration ready to go is Informix. Very cool! More soon when I will address the "world" teaser.