IIUG Conference Ad Link

2014 IIUG Informix Conference

Share it

Wednesday, November 19, 2014

On Hybrid Database Development

At the recent IBM Insight conference in Las Vegas I presented a session entitled: 

"My Data is Relational But My Coders Want to Use JSON!  Help!"

Over 30 development managers and developers attended.  When asked most said they attended because this was a common problem that they are all either dealing with already or expect to have to deal with soon.  

Here, as well as I can translate a presentation to text, is what I had to say:

I assert that JSON and other semi-structured data formats present unique challenges for organizations. In order to take advantage of these new structures and the development paradigms that they support and encourage and integrate them into our existing systems, we wrestle with three alternatives:

  1. Keep these new data and the apps that utilize them independent of our legacy data and keep our legacy apps independent of this new data
  2. Perform ETL/ELT frequently to maintain all of this data in both relational and semi-structured databases.
  3. Develop applications that can access multiple data sources concurrently.
I maintain that none of these alternatives is acceptable and I want to propose a better path to future application and database development. First a digression to define terms which you can feel free to skip.

There are many classes of data:

  • Structure data
  • Unstructured data
  • Semi-structured data
  • Time stamped data composed from any data class
  • Geographically located data composed from any data class
  • Spatially located data composed from any data class
Structured Data refers to the data we have in relational databases for the most part. Despite the predictions of pundits relational databases are here to stay.  There are several well defined application segments for which relational databases are the best tool for the job.

Semi-Structured Data is data that may have a natural structure to it but where the structure may vary from one data element to another.  This can also include elements where part of the information is structured while other sub-elements are unstructured.

Currently the most popular format for semi-structured data is known as JSON for JavaScript Object Notation.  JSON is a key:value record or document format that is quickly becoming a new standard for data interchange replacing CSV, XML, and other interchange formats.  JSON is the basis for the RESTFUL web service protocol.  It is more space efficient than XML and easier to parse.  In addition, it is supported by a binary format, BSON, that is even faster to parse and extract individual sub-elements.  JSON is a dynamic schema format where each record, or document in JSON terminology, contains the schema details for the document which can be different from one document to another.  This promotes and supports RAD or Rapid Application Development.

MongoDB has become the most popular of the JSON databases or stores.  It saves JSON documents in BSON format and produces JSON output to consuming clients.  There are several sets of interoperable development tool sets known as "stacks" the most popular of which is call the MEAN stack as well as client libraries for most common languages.

MEAN stands for MongoDB, ExpressJS, AngularJS, and Node.js.

Class time over:


On to the problem.  We all have TBs of relational data.  Our experienced developers already know SQL and our schema. Some new development still fits with structured data. There is no sense in moving OLTP or structured DW data out of relational systems.  There is no financially supportable way to rewrite all of our existing systems from scratch just to take advantage of new paradigms.  Many new applications written primarily to use semi-structured data will need to present the relational data we've been collecting for the past 30+ years.

Many of us are capturing TBs of semi-structured data already. New developers that we want to hire only want to work in the NoSQL space.  Some new applications are natural fits for semi-structured data. RAD techniques require schema flexibility to succeed.  Many new and existing relational based applications could take advantage of all of that semi-structured data to expose new features without a massive redevelopment effort.

How to get relational data into MEAN stack applications and JSON data into C/C++/C# applications is the problem.  Traditionally when faced with data in multiple silos we would either use ETL techniques to copy the data from one silo to another or we would write data access layers using multiple libraries.  

ETL is time consuming and error prone.  We may have to sacrifice precision or internal relationships in the target system. Maintaining a timely, consistent, view is a major constant effort and expense.  Data duplication is itself an expense.  Fast storage is not cheap!

We can't just move everything into MongoDB either.  MongoDB like most NoSQL stores do not support multiple operation transactions, inter-object relationships (references and sub-documents are not relationships), XA transactions (so they cannot participate in transactions across multiple silos).  

The solution:


The solution is Hybrid Application Development using a Hybrid Database system using both traditional and RAD development tools.  Here is what is needed from a hybrid database for this to work:
  • Access from traditional development stacks (C, C++, C#, Perl, etc.)
  • Access from MEAN and other RAD development stacks
  • Full ACID compliant transaction support for all data
  • Full relational integrity and data normalization support for all data 
  • Ability to store structure data with a predefined schema
  • Ability to store semi-structured data with a dynamic schema
  • Ability to present both types of data to both stacks in the native format of the data
    • JSON as JSON
    • Tables as tables
  • Ability to present both types of data to each stack in its native format
    • JSON as table data for traditional tools
    • Table data as JSON for RAD tools
Some NoSQL stores can handle ACID consistency but most can only manage "eventually consistent" transactions.  Most, like MongoDB, can only guarantee consistent transactions and rollbacks for a single document, not for multiple documents in a collection nor for transactions that span multiple collections. Some NoSQL stores can support SQL or an SQL-like query language, however, they do not return data as rows and columns which is what is expected by SQL database access code. 

Some RDBMS systems can:
  • Store JSON (or a binary form of it)
  • Store a collection as a collection (rather than as a column type)
  • Manipulate JSON fields within a document
  • Create indexes on JSON fields within the documents in a collection
  • Support referential integrity between JSON document fields and relational columns
  • Join JSON documents to relational tables
But most cannot do it all.  I believe that these capabilities are the future of database and application development. Today there is one RDBMS product that CAN do it all:
  • Accept connections from MEAN stack and other MongoDB clients without modifying the application code
  • Accept connections from relational SQL clients
  • Support transactions on relation tables and JSON collections that include multiple tables/collections and multiple rows/documents
  • Allow joins between JSON collections and other JSON collections or relational tables
  • Enhance MEAN applications to support issuing SQL statements and return data in JSON format for those clients
  • Permit SQL clients to treat fields in JSON collection documents as ordinary columns
  • Support JSON and BSON as class 1 native data types for use as columns in relational tables or as a type to define a collection
  • Direct RESTFUL interface to expose all data as web services without middleware
  • Store JSON collections as collections
  • Permit any MongoDB aware client to treat relational tables collections
  • Support most MongoDB DDL and server management commands (including sharding)
  • Auto create databases and collections just as MongoDB does, on the fly
I know I'm sounding like a salesman, but I have nothing to sell you other than an idea and my help to achieve it in your organizations, so, bear with me.  What is this magical database system?  It is the latest incarnation of one of the first RDBMS products in the marketplace:

Informix Dynamic Server v12.10.xC4 from IBM


Informix, aka IDS, has all of the required features I mention above but it brings much more to the table:
  • Larger documents than MongoDB
  • Larger databases than MongoDB
  • Industry leading data replication technology
  • Four classes of data replication that can all work together
  • More capable time proven sharding technology than MongoDB
  • Data distribution and centralization capability
  • Support for Timeseries data
  • Support for Geospatial data
  • The only database system that can combine Timeseries and Geospatial data to track your IoT data through SpaceTime
  • User defined data types that perform as well as native types
  • World class OLTP transaction rates
  • Traditional and MEAN stack applications can connect and use both JSON and SQL data in their own native formats
  • Centralized databases up to 128PB
  • Distributed databases without size limits
  • Server fail over and load balancing fully configurable using SLA specifications
  • Five 9's reliability
  • Near Zero downtime
  • Upgrade server versions without downtime
  • Instantly bring up additional servers to load balance during peak periods
  • Support large numbers of concurrent transactions
  • Row level locking (MongoDB locks entire collections only)
  • Extremely tunable engine
  • Autonomic features for low maintenance overhead and to permit unmonitored operations
  • Highly embeddable
  • Optional compression of data and index keys
  • Advanced query optimizer
  • Multiple tenancy
  • Informix Warehouse Accelerator returns data up to 1200X faster than the base server for complex queries over huge data sets
  • Timeseries in JSON
  • GeoJSON support in addition to native GeoSpacial support
  • Lucene text search on JSON documents and relational tables
  • High speed data loading technology for faster loads of streaming data
  • MQTT integration to link to IoT devices

Wrap up:


The future of software development will require hybrid applications!
The future of software development will require hybrid databases!

The future is here now!
Why wait?


Monday, November 3, 2014

China is standardizing all new database applications on Informix!

Here's the "world" shaking news I promised. IBM and China have come to an agreement whereby IBM will share the code for Informix with a Chinese software house, GBASE, who will modify its security code to conform to Chinese government standards. In turn China will be building all future database projects using Informix.

The agreement allows for sharing innovations to the Informix code made by either party to be shared by both. If anyone wanted proof that Informix is here to stay and is NEVER going away, this is it. China is the world's second largest market and probably the fastest growing technology market. The government of China does not want any software or hardware used for government projects or enterprises that it does not control, fearing Western spying through the systems it uses. So, they are standardizing on Informix. Cool!

IBM Press release dated: October 29, 2014

  GBASE and IBM to collaborate on locally innovated database in China 

IBM (NYSE: IBM) and General Data Technology Co.,Ltd, known as GBASE announces an agreement today, to create a locally innovated database product for the China marketplace. In support of the China government's national technology agenda, GBASE will develop and sell this local innovation based on IBM Informix technology. The agreement extends the reach of IBM’s Informix database technology into this market by providing a localized solution to better address the rapidly evolving needs of Chinese businesses. The agreement permits GBASE to build and license their own version of IBM Informix database, which will be offered to clients in the Chinese market as a stand alone product solution. The China market for database technology is estimated to be in excess of $700m according to IDC. The partnership between IBM and GBASE can fuel the growth of this market by creating a best in class database solution tailored to the unique requirements of the China marketplace. “This agreement confirms IBM’s innovation and commitment to growth in emerging markets in general and China specifically”, states Sean Poulley, Vice President, IBM Databases and Database Management. "Our intent is to help partners and clients gain competitive edge and transform their business with innovative database technologies in China, with China, for China”. Informix is one of the world’s most widely used databases, supporting clients who range from the largest multinational corporations to many regional and local small businesses. It is widely deployed in the retail, banking, manufacturing and government segments, all of which are important growth areas in China today. Informix is well known for its innovative design, which enables a single platform that powers both OLTP and OLAP workloads for real-time analytics, scales easily for cloud environments and provides continuous availability. It's renowned for extremely high levels of performance and availability, distinctive capabilities in data replication and scalability, and minimal administrative overhead. With this partnership agreement these advantages will be made more readily available to the fast-growing Chinese market.

Sunday, November 2, 2014

News from the Informix World

And I do mean world. Well it's been a long time since my last post. Over six months. I just got back from the IBM Insight Conference in Las Vegas NV (formerly the IBM Information On Demand Conference) and there is some big news. IBM announced their new vertical cloud development stacks called BlueMix. There is a BlueMix for timeseries data, BlueMix for mobile apps, BlueMix for in-memory Warehouse data analytics, BlueMix for Watson, BlueMix for ... well you get the idea. The stacks include access to a cloud server with an application appropriate database system (Cloudant, DB2 BLU Accelerator, Informix, etc.), development tools, access to IBM and some third party analytics tools, etc. You can create a BlueMix account for free and try it out. If you like it and develop something on it for yourself, your company, or for distribution then you can work out a commercial deal with IBM for unlimited access and more resources. "Wait", you say, "did you say Informix as part of an IBM Cloud service offering? They just slapped that in there to make us noisy Informix grumps happy. Right?" Nope. If you set up a BlueMix for Timeseries Data configuration you get a fully functional Informix database server (not just timeseries support, though it's not clear if this includes IWA). While DB2 is available, you have to ask for it, the only full function RDBMS with a BlueMix configuration ready to go is Informix. Very cool! More soon when I will address the "world" teaser.

Thursday, April 17, 2014

Heading down to Miami soon!

It is the middle days of Passover week and I am trying to squeak some work out between the first Seder days and the coming Sabbath. Part of that I am making final preparations to go down to Miami to the 2014 IIUG Conference. As others have said, this is the premiere conference for users of IBM's Informix database systems. I am excited about the coming conference, not just because I get to participate in one of my favorite activities - that being talking by presenting two sessions and helping out with a third - but I am hoping to hear about new Informix features for the next release. Already this year we have gotten some neat stuff in v12.10 through the first three releases: - Support for JSON and BSON data types. - Support for developing hybrid applications that incorporate both structured and semi-structured data together. - The ability to dynamically expand the buffer pools to handle unexpected data demands. - Improved dynamic logical log management. - Automatic management of the number of CPU VPs. - Automatic expansion of the physical log to handle unexpected transaction volumes. - The ability to specify that new tables should be placed in a specific dbspace or should be automatically partitioned across a number of dbspaces instead of defaulting to the dbspace in which the database was created. - SHMADD now automatically doubles every time 16 new segments have been added to the server. - The client side environment variables INFORMIX_CONTIME and INFORMIX_CONRETRY are now also ONCONFIG parameters that set the server's default connection timeout behavior. - The CPU VP private memory cache is now dynamic using VP_MEMORY_CACHE_VP as a starting value. The optional parameter STATIC can be added to the end of the size to revert to the previous fixed size behavior. - New onstat -g bth & -g BTH reports to better monitor resource contention causing blocked sessions. - Compression for indexes. - Automated compression for new tables. I hope to see more in Miami. I hope to see you all there as well!

Thursday, January 16, 2014

The myth of the new and other fallacies

In Kohelet (aka Ecclesiates) King Solomon of ancient Israel wrote:  
What has been is what will be, and what has been done is what will be done, and there is nothing new under the sun.
No, I haven't gone all proselytizing on you, I just want to make a point.  Most of the "new" technologies that everyone is so hot to get into to save the world are not new at all.  Some are even very old.  Many were tried in the past and while some were good ideas and successful in their first incarnations, many were either not successful at all or were later supplanted by other technologies that got the job done better.  A few examples:

Object Oriented Languages

I know, '80s technology, not new anymore.  But they are still 'hot' and coders coming up today don't know their history, so here's some.  Object oriented languages like C++, C##, Java, etc. encapsulate the operations associated with a data structure together with the structure itself.  In the 1960s, two new programming paradigms emerged.  These were Structured Programming and Modular Programming. Structured programming concentrated on eliminating non-linear controls from software code.  It was characterized most simply as "No goto statements, anywhere, anytime!" and was championed by such notables as C.Bohm, G. Jacopini, Edsgar Dijkstra, Michael Jackson (no, not the singer) and Issac Nassi and Ben Schneiderman (who together developed the Structure Chart or Nassi-Schneiderman Chart sometimes called Chapin Chart).

At around the same time other notables like Niklaus Wirth (the developer of Pascal, Modula2, Modula2, Oberon, and the Lilith computer system) were proposing the concept of Modular programming.  The idea was to encapsulate distinct operations into subroutines and to gather like subroutines with similar or related functionality and/or purpose into reusable concrete units or 'modules'.  Wirth went so far as to develop a programming language that supported both structured programming and modular programming techniques (and did not include a GOTO statement at all!)  To this day, Modula2 is still my favorite of the dozen or more programming languages I have learned and used.

These two paradigms sought and succeeded to reduce the complexity and redundancy in large software projects and to permit larger groups of programmers to cooperate in the production of complex projects.  Combine Structured Programming with Modular Programming and add language support for their concepts and voila you have an Object Oriented Language.

Columnar Database

In the early 1980's I needed to write a custom database for an application.  After research I found something that matched the requirements of the project perfectly.  It was called, variously, and Inverted Index, Inversion Table (no, not the exercise/torture device), or Inversion Database.  The idea was to build indexes of all of the fields of a set of data records without having the entire record present at all, only the indexes.  this meant that as long as you only wanted the values from a subset of the fields in a record you could access the data very quickly with reduced IO which was very expensive at the time.  Can you say Columnar Database?

Key:Value Store or Database

In 1979, Ken Thompson of UNIX fame wrote a database library for UNIX called dbm which used a hash table of key values to index data records.  The records had no particular structure.  Any interpretation of the contents of the record were application dependent.  This library is the UNIX dbm database library system.

In 1986 the Berkeley UNIX team developed ndbm which added the capability to have more than one database open in an application.  Unlike dbm, ndbm stores the data value in the hash table directly which limits the size of the key:value pair and of the database as a whole.

In the late 1980's and early 90s Margo Seltzer and Ozan Yigit worked to improve on dbm and ndbm and created Berkley DB.  BDB was still a hash managed key:value store.  It's main innovation was the ability to be accessed concurrently by multiple users without corrupting the database.

This is virtually the same as the "ground breaking" technology in MongoDB, Cassandra, Virtuoso, etc.

Graph Database

In 1969 Charles Bachman published a paper describing a new database model to replace the popular herarchical model at the Conference on Data Systems Languages (CODASYL).  He named it the Network model.  Wikipedia defines the network model as database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.

Wikipedia defines a graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. By definition, a graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent element and no index look ups are necessary. 

Sounds awful familiar to me.  By the late 1980's network databases and hierarchical databases were largely replaced for general purpose database use by databases that ostensibly followed the Relational Model described by Edgar F. Codd in a paper published in 1969 because their schema were fixed in the physical implementation of the data making it virtually impossible to make changes to the schema as applications evolved.

Getting to the Point

The point of this blog post is that there were reasons that we used these technologies "back in the day" (as my favorite reality TV stars say) and there were reasons that we abandoned them and moved on.  Some of these technologies never went away because they have good and valid applications to solve specific data storage and retrieval problems.  But these were not the technologies that conquered the database world.  The Relational Model, or as several friends of mine insist, a poor implementation of something that vaguely resembles the Relational Model and is more properly called the SQL Model, won that battle and for good reason.  So, let's not all jump on the same bandwagon without doing due diligence and examining in detail what technology best fits the problem.  

One of my favorite sayings is "Use the right tool for the right job!"  I am also a big proponent of avoiding the Yankee Adjuster Problem where, as the saying goes, "When all you have is a hammer, everything starts to look like a nail!"

So, let me finish as I began, there is nothing new under the sun!  Or as I am fond of replying when asked "What's new?" - "There's nothing new, just the old stuff come back to haunt us!"

Tuesday, November 12, 2013

Folks,

Everyone is invited to attend our next Webcast on Tuesday December 17, 2013 at 2PM EST.  
Please share this with your other Informix DBA's.

I will present UPDATE STATISTICS in depth, dostats, and my other free utilities for monitoring and tuning an Informix database in his utils2_ak package. Topics covered will include:

- Table Level statistics and data distributions
- Fragment Level statistics and data distributions
- Update statistics for procedure/function
- Providing data distributions for User Defined Types
- Dostats and the art of balancing distribution quality with runtime performance
- Other useful utilities in the utils2_ak package

Click here to register - http://www.advancedatatools.com/Informix/NextWebcast.html

To view our past Informix Performance Tuning Webcasts visit -http://www.advancedatatools.com/Informix/Webcasts.html

Wednesday, September 25, 2013

Informix 12.10.xC2 Rules - all others drool!

If you missed the version 12.10.xC2 Beta EVP and the IBM Informix Chat with the Labs on 9/25 then you are in for a shock!

IBM has been talking about Big Data in reference to Informix for several years ever since someone realized that the TimeSeries Datablade capabilities of Informix, developed in the mid '90s, is quite literally about very big data.  However, some members of the Informix user community have taken IBM to task pointing out that when folks talk about Big Data they are usually discussing unstructured data stored in NoSQL databases like Hadoop, MongoDB, and others.  So, what did you miss?

Informix is now the ONLY database that successfully integrates unstructured and structured data together in a single database platform.  Working with the developers of MongoDB, IBM has integrated MongoDB's BSON data type into IDS as a native type and implemented  the MongoDB interface protocols within Informix.  You can now do the following:

  • Store MongoDB collections in Informix as BSON types.
  • Query both MongoDB collections and Informix relational tables in Informix from both MongoDB clients and Informix clients.
    • Mongo clients see collection formatted key:value data from collections and from tables.
    • Informix clients see data rows from tables and collections.  Missing collection fields are mapped to NULL values.
    • Dynamic Informix clients can determine the current schema of collection data.
  • Shard tables and collections across multiple servers.  Not only can you query the data across the shard farm but when you insert data to sharded tables or collections the local server determines which shard the data should live on based on the definition of the various shards.  This is VERY similar to the horizontal partitioning that Informix's old XPS engine provided.
  • All data, including Mongo collection data, is included in Informix's ACID compliant transaction handling!  No one else can guarantee consistency in a NoSQL data store!
  • Informix compression works with MongoDB data. Because of the redundant key names inherent in the key:value nature of collection data compression rates are even more impressive than for table data!  Only Informix can save you 70-90% of the storage you would otherwise use for unstructured data in any other MongoDB or similar NoSQL database!
Combine this with features that Informix has had for a long time:
  • Timeseries data
  • GeoSpacial data
  • BTS web-like text search capability
  • Informix Warehouse Accelerator to improve complex query times by 100x to 4000x!
  • Hierarchical Data Replication Secondary server (active or passive).
  • Multiple Remote Secondary replicated servers (active or passive).
  • Active or passive Shared Disk Secondary servers for load balancing, application isolation, and nearly instantaneous failover.
  • Transaction survival - failover primary server functions to a secondary without loss of running transactions.  This means absolutely uninterruptable transactions!
  • Enterprise Replication
    • Multiple peer replicated servers on heterogeneous hardware
    • Master-Slave replication in both Scatter and Gather configuration
    • Combine with Sharded Query capability to maintain local data on local servers but query from a central location without a master copy
I don't think I've run out of features, but I have run out of the time I've allotted to myself for this post.

To wrap up: Informix is now positioned to be what its users have always considered it to be: The best damn database product on the market!  This should finally satisfy a certain community member based in Chicago who is usually rather vocal about Big Data.



Monday, April 8, 2013

What's the hottest new feature in Informix 12.10?

In the next edition of IBM Data Mag (http://ibmdatamag.com/), Lester Knutsen gives us a quick overview of the new Informix version 12.10 Grid Query feature. This, I think, is the hot new feature!  Lester refers the feature back to a similar XPS feature that allowed one to distribute a database across many servers and issue queries to the entire group of servers treating the entire distributed data set as a single database.  "Can't I do that with any RDBMS that supports distributed queries?", you may ask.  The simple answer is, "Yes, you can."  But it is not simple.  

What every serious RDBMS  can do

All major RDBMS's can allow you to query multiple servers using a UNION or UNION ALL to create a query that will gather the data from multiple servers.  So, what's so new in Informix v12.10? Go read Lester's article when it comes out in a couple of weeks for details.  It is really cool.

Why do we care?

In 2000, Informix Corp. leaked that it was working on a project to merge the high transaction rate and full ACID support in Informix Dynamic Server 7.31 and the Object Relational and expansion features of Informix Universal Server 9.30 with  the distributed nearly unlimited database and distributed query shared nothing capabilities of Informix eXtended Parallel Server (XPS) to create the ultimate database server in a project that Informix called Arrowhead. It didn't happen.  It didn't happen, not because the technology and the expertise were not there, but because a year later, in late 2001, Informix Corporation's database assets were purchased by IBM and the Arrowhead project was squashed.  

Since then, for nearly twelve years now, IBM, much to its own confusion, has been diligently improving the Informix database server.  For many of those twelve years Informix fans have asked IBM repeatedly when they might resurrect Arrowhead.  The answer has always been the same, "Never.  It's never going to happen."

Well, good news.  Like it or not, Grid Queries may not be the original technical design of Arrowhead, but along with the features added in Informix v11.xx that have long existed in XPS, conceptually and usability wise, Arrowhead is alive and well and it is here today!

This is a game changer.  Now transactions against OLTP tables can happen locally on a local server and only tables that affect other locations (so lookup tables, inventory and warehouse levels, shipping manifests, etc.) have to be replicated using ER/GRID and the national, regional, and company wide reporting that requires access to all of the data can get it simply and quickly.  

With the Informix Warehouse Accelerator included in Informix Advanced Enterprise Edition, even complex Business Intelligence/Analytical processing can be performed on the massive distributed data sets using up-to-the-minute data (no ELT/ETL, Cube loading, or data summarization delay) gathered simply from the entire corporate server grid.  Add to that Informix Cluster Grid's ability to cluster, replicate, and distribute data across heterogeneous servers and we have access to power and technology that no organization not using Informix can get.  Not from anyone else anywhere!  It will take decades before the industry Ninja will be able to tack on anything remotely like IBM Informix's Grid Query technology and then it will be a massive cludge.

I'm stoked!  See you all in San Diego in two weeks to hear more about all this!

Thursday, October 4, 2012

Server Performance Improvement and More

Today's post is as close to a shameless plug for my consulting services, and other services provided by Advanced DataTools, as I am ever likely to get so don't feel that you have to read it if you do not want to read an advertisement.

  • Ever wonder if your server could be performing better?
  • Want to know when conditions in your database server will affect production before it happens?
  • Ever wonder if you will be able to handle the load next year as the business grows?
  • Got only one DBA (or you ARE the one DBA) and you are wondering how your company will get through vacations?  DBA illnesses?  DBA injuries?
  • Do you have new requirements in the pipeline?  Do you need to know if you can handle them with your current configuration or if you will need to upgrade equipment and software?
  • Do you need a world class database architect to help design that complex new schema in a way that will minimize the number of application gotcha's you will run into later?
  • Have very old applications and database versions and want to consider getting up-to-date?
  • Want to modernize your old 4GL applications so you don't have to spend $$$$$$ to rewrite them so they can run on the web?  On smart phones?  How about saving on your 4GL development and run-time licensing at the same time?
  • Do you have a need to implement analytical query support but don't know enough about Big Data, Data Marts, and Data Warehouses?
  • Want to perform analysis on your production database without affecting production performance?
  • Have a newbie DBA or database application developer who needs to be brought up to speed quickly?
We can help.  Check out the range of services posted on our web site www.advancedatatools.com or better yet, just give us a call (703-256-0267 or 800-807-6732, select extension 101 and leave a voice mail) or email me (art@advancedatatools.com).  We will call you back and set up a conference call to discuss your requirements.

Monday, September 24, 2012

In search of rownum

Often people ask about various features supported by other RDBMS systems that Informix does not support.  I am rarely bothered by these issues, not because I don't use Oracle, MySQL, PostgreSQL, or MS SQL Server - I don't when I can help it - but rather because for the most part, the required functionality is either there in another form, is not really necessary, or can be emulated easily enough.

Examples are the INTERSECT and EXCEPT join operators.  Informix does not support either syntax. However, the results of an INTERSECT join are identical to the results of a normal INNER join with only columns from one of the tables included in the projection clause which is of course supported. The results of an EXCEPT are identical to those of a LEFT OUTER JOIN with only those values from the left table that do not match any rows in the right table returned.

INTERSECT example - suppliers against whom at least one order has been processed:
select distinct supplier_id
from suppliers
INTERSECT
select distinct supplier_id
from orders;
returns the same result set as:
select distinct orders.supplier_id
from orders, supplier
where orders.supplier_id = suppliers.supplier_id;
EXCEPT example - suppliers against whom no orders have been processed:
select distinct supplier_id
from suppliers
except
select distinct supplier_id
from orders;

returns the same result set as:

select suppliers.supplier_id
from suppliers
left outer join orders
on suppliers.supplier_id = orders.order_id
where orders.order_id is null;

Another is the Oracle ROWNUM pseudo column.  It returns a sequential number, starting from 1, indicating the order in which a row was processed by the server.  Most commonly it is used in the WHERE clause of a query to limit the number of rows returned or to walk through a result set in discrete chunks. 
Examples:

select * from sometable where rownum <= 100;
select * from sometable where rownum > 100 and rownum <= 200;

These are easy to emulate using the ANSI FIRST and SKIP verbs which Informix does support:
select first 100 * from sometable;
select first 100 skip 100 * from sometable;
However, sometimes ROWNUM is used in the projection clause of a query where it represents the selection order as a computed column.  To emulate this use, we need to create two SPL routines.  The procedure init_rownum() below initializes the value returned by the next call to the rownum() function to '1' and is needed if rownum() is going to be used more than once in a session and is expected to return values starting with '1' each time.  Here are the routine definitions:

CREATE FUNCTION  rownum () returning int as rownum;
define global counter int default 0;
let counter = counter + 1;
return counter;
end function;
CREATE PROCEDURE  init_rownum ();
define global counter int default 0;
let counter = 0;

end procedure; 
         
       
In use these routines look like this:

execute procedure init_rownum();
select rownum() as rownum, tabname from systables;
Now suppose you want to sort the tabnames by name but return the rownum of the original order that the rows were processed by the engine:
select *
from (select rownum() as rownum, tabname from systables)

order by tabname;
That's it, nothing profound today, just this little tidbit of technique.  Oh!  Don't forget to check out my TechTips article about Derived Tables in McPress at:

http://www.mcpressonline.com/sql/techtip-combining-aggregate-and-detail-in-the-same-sql-result-set.html