IIUG Conference Ad Link

2014 IIUG Informix Conference

Share it

Thursday, April 17, 2014

Heading down to Miami soon!

It is the middle days of Passover week and I am trying to squeak some work out between the first Seder days and the coming Sabbath. Part of that I am making final preparations to go down to Miami to the 2014 IIUG Conference. As others have said, this is the premiere conference for users of IBM's Informix database systems. I am excited about the coming conference, not just because I get to participate in one of my favorite activities - that being talking by presenting two sessions and helping out with a third - but I am hoping to hear about new Informix features for the next release. Already this year we have gotten some neat stuff in v12.10 through the first three releases: - Support for JSON and BSON data types. - Support for developing hybrid applications that incorporate both structured and semi-structured data together. - The ability to dynamically expand the buffer pools to handle unexpected data demands. - Improved dynamic logical log management. - Automatic management of the number of CPU VPs. - Automatic expansion of the physical log to handle unexpected transaction volumes. - The ability to specify that new tables should be placed in a specific dbspace or should be automatically partitioned across a number of dbspaces instead of defaulting to the dbspace in which the database was created. - SHMADD now automatically doubles every time 16 new segments have been added to the server. - The client side environment variables INFORMIX_CONTIME and INFORMIX_CONRETRY are now also ONCONFIG parameters that set the server's default connection timeout behavior. - The CPU VP private memory cache is now dynamic using VP_MEMORY_CACHE_VP as a starting value. The optional parameter STATIC can be added to the end of the size to revert to the previous fixed size behavior. - New onstat -g bth & -g BTH reports to better monitor resource contention causing blocked sessions. - Compression for indexes. - Automated compression for new tables. I hope to see more in Miami. I hope to see you all there as well!

Thursday, January 16, 2014

The myth of the new and other fallacies

In Kohelet (aka Ecclesiates) King Solomon of ancient Israel wrote:  
What has been is what will be, and what has been done is what will be done, and there is nothing new under the sun.
No, I haven't gone all proselytizing on you, I just want to make a point.  Most of the "new" technologies that everyone is so hot to get into to save the world are not new at all.  Some are even very old.  Many were tried in the past and while some were good ideas and successful in their first incarnations, many were either not successful at all or were later supplanted by other technologies that got the job done better.  A few examples:

Object Oriented Languages

I know, '80s technology, not new anymore.  But they are still 'hot' and coders coming up today don't know their history, so here's some.  Object oriented languages like C++, C##, Java, etc. encapsulate the operations associated with a data structure together with the structure itself.  In the 1960s, two new programming paradigms emerged.  These were Structured Programming and Modular Programming. Structured programming concentrated on eliminating non-linear controls from software code.  It was characterized most simply as "No goto statements, anywhere, anytime!" and was championed by such notables as C.Bohm, G. Jacopini, Edsgar Dijkstra, Michael Jackson (no, not the singer) and Issac Nassi and Ben Schneiderman (who together developed the Structure Chart or Nassi-Schneiderman Chart sometimes called Chapin Chart).

At around the same time other notables like Niklaus Wirth (the developer of Pascal, Modula2, Modula2, Oberon, and the Lilith computer system) were proposing the concept of Modular programming.  The idea was to encapsulate distinct operations into subroutines and to gather like subroutines with similar or related functionality and/or purpose into reusable concrete units or 'modules'.  Wirth went so far as to develop a programming language that supported both structured programming and modular programming techniques (and did not include a GOTO statement at all!)  To this day, Modula2 is still my favorite of the dozen or more programming languages I have learned and used.

These two paradigms sought and succeeded to reduce the complexity and redundancy in large software projects and to permit larger groups of programmers to cooperate in the production of complex projects.  Combine Structured Programming with Modular Programming and add language support for their concepts and voila you have an Object Oriented Language.

Columnar Database

In the early 1980's I needed to write a custom database for an application.  After research I found something that matched the requirements of the project perfectly.  It was called, variously, and Inverted Index, Inversion Table (no, not the exercise/torture device), or Inversion Database.  The idea was to build indexes of all of the fields of a set of data records without having the entire record present at all, only the indexes.  this meant that as long as you only wanted the values from a subset of the fields in a record you could access the data very quickly with reduced IO which was very expensive at the time.  Can you say Columnar Database?

Key:Value Store or Database

In 1979, Ken Thompson of UNIX fame wrote a database library for UNIX called dbm which used a hash table of key values to index data records.  The records had no particular structure.  Any interpretation of the contents of the record were application dependent.  This library is the UNIX dbm database library system.

In 1986 the Berkeley UNIX team developed ndbm which added the capability to have more than one database open in an application.  Unlike dbm, ndbm stores the data value in the hash table directly which limits the size of the key:value pair and of the database as a whole.

In the late 1980's and early 90s Margo Seltzer and Ozan Yigit worked to improve on dbm and ndbm and created Berkley DB.  BDB was still a hash managed key:value store.  It's main innovation was the ability to be accessed concurrently by multiple users without corrupting the database.

This is virtually the same as the "ground breaking" technology in MongoDB, Cassandra, Virtuoso, etc.

Graph Database

In 1969 Charles Bachman published a paper describing a new database model to replace the popular herarchical model at the Conference on Data Systems Languages (CODASYL).  He named it the Network model.  Wikipedia defines the network model as database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.

Wikipedia defines a graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. By definition, a graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent element and no index look ups are necessary. 

Sounds awful familiar to me.  By the late 1980's network databases and hierarchical databases were largely replaced for general purpose database use by databases that ostensibly followed the Relational Model described by Edgar F. Codd in a paper published in 1969 because their schema were fixed in the physical implementation of the data making it virtually impossible to make changes to the schema as applications evolved.

Getting to the Point

The point of this blog post is that there were reasons that we used these technologies "back in the day" (as my favorite reality TV stars say) and there were reasons that we abandoned them and moved on.  Some of these technologies never went away because they have good and valid applications to solve specific data storage and retrieval problems.  But these were not the technologies that conquered the database world.  The Relational Model, or as several friends of mine insist, a poor implementation of something that vaguely resembles the Relational Model and is more properly called the SQL Model, won that battle and for good reason.  So, let's not all jump on the same bandwagon without doing due diligence and examining in detail what technology best fits the problem.  

One of my favorite sayings is "Use the right tool for the right job!"  I am also a big proponent of avoiding the Yankee Adjuster Problem where, as the saying goes, "When all you have is a hammer, everything starts to look like a nail!"

So, let me finish as I began, there is nothing new under the sun!  Or as I am fond of replying when asked "What's new?" - "There's nothing new, just the old stuff come back to haunt us!"

Tuesday, November 12, 2013

Folks,

Everyone is invited to attend our next Webcast on Tuesday December 17, 2013 at 2PM EST.  
Please share this with your other Informix DBA's.

I will present UPDATE STATISTICS in depth, dostats, and my other free utilities for monitoring and tuning an Informix database in his utils2_ak package. Topics covered will include:

- Table Level statistics and data distributions
- Fragment Level statistics and data distributions
- Update statistics for procedure/function
- Providing data distributions for User Defined Types
- Dostats and the art of balancing distribution quality with runtime performance
- Other useful utilities in the utils2_ak package

Click here to register - http://www.advancedatatools.com/Informix/NextWebcast.html

To view our past Informix Performance Tuning Webcasts visit -http://www.advancedatatools.com/Informix/Webcasts.html

Wednesday, September 25, 2013

Informix 12.10.xC2 Rules - all others drool!

If you missed the version 12.10.xC2 Beta EVP and the IBM Informix Chat with the Labs on 9/25 then you are in for a shock!

IBM has been talking about Big Data in reference to Informix for several years ever since someone realized that the TimeSeries Datablade capabilities of Informix, developed in the mid '90s, is quite literally about very big data.  However, some members of the Informix user community have taken IBM to task pointing out that when folks talk about Big Data they are usually discussing unstructured data stored in NoSQL databases like Hadoop, MongoDB, and others.  So, what did you miss?

Informix is now the ONLY database that successfully integrates unstructured and structured data together in a single database platform.  Working with the developers of MongoDB, IBM has integrated MongoDB's BSON data type into IDS as a native type and implemented  the MongoDB interface protocols within Informix.  You can now do the following:

  • Store MongoDB collections in Informix as BSON types.
  • Query both MongoDB collections and Informix relational tables in Informix from both MongoDB clients and Informix clients.
    • Mongo clients see collection formatted key:value data from collections and from tables.
    • Informix clients see data rows from tables and collections.  Missing collection fields are mapped to NULL values.
    • Dynamic Informix clients can determine the current schema of collection data.
  • Shard tables and collections across multiple servers.  Not only can you query the data across the shard farm but when you insert data to sharded tables or collections the local server determines which shard the data should live on based on the definition of the various shards.  This is VERY similar to the horizontal partitioning that Informix's old XPS engine provided.
  • All data, including Mongo collection data, is included in Informix's ACID compliant transaction handling!  No one else can guarantee consistency in a NoSQL data store!
  • Informix compression works with MongoDB data. Because of the redundant key names inherent in the key:value nature of collection data compression rates are even more impressive than for table data!  Only Informix can save you 70-90% of the storage you would otherwise use for unstructured data in any other MongoDB or similar NoSQL database!
Combine this with features that Informix has had for a long time:
  • Timeseries data
  • GeoSpacial data
  • BTS web-like text search capability
  • Informix Warehouse Accelerator to improve complex query times by 100x to 4000x!
  • Hierarchical Data Replication Secondary server (active or passive).
  • Multiple Remote Secondary replicated servers (active or passive).
  • Active or passive Shared Disk Secondary servers for load balancing, application isolation, and nearly instantaneous failover.
  • Transaction survival - failover primary server functions to a secondary without loss of running transactions.  This means absolutely uninterruptable transactions!
  • Enterprise Replication
    • Multiple peer replicated servers on heterogeneous hardware
    • Master-Slave replication in both Scatter and Gather configuration
    • Combine with Sharded Query capability to maintain local data on local servers but query from a central location without a master copy
I don't think I've run out of features, but I have run out of the time I've allotted to myself for this post.

To wrap up: Informix is now positioned to be what its users have always considered it to be: The best damn database product on the market!  This should finally satisfy a certain community member based in Chicago who is usually rather vocal about Big Data.



Monday, April 8, 2013

What's the hottest new feature in Informix 12.10?

In the next edition of IBM Data Mag (http://ibmdatamag.com/), Lester Knutsen gives us a quick overview of the new Informix version 12.10 Grid Query feature. This, I think, is the hot new feature!  Lester refers the feature back to a similar XPS feature that allowed one to distribute a database across many servers and issue queries to the entire group of servers treating the entire distributed data set as a single database.  "Can't I do that with any RDBMS that supports distributed queries?", you may ask.  The simple answer is, "Yes, you can."  But it is not simple.  

What every serious RDBMS  can do

All major RDBMS's can allow you to query multiple servers using a UNION or UNION ALL to create a query that will gather the data from multiple servers.  So, what's so new in Informix v12.10? Go read Lester's article when it comes out in a couple of weeks for details.  It is really cool.

Why do we care?

In 2000, Informix Corp. leaked that it was working on a project to merge the high transaction rate and full ACID support in Informix Dynamic Server 7.31 and the Object Relational and expansion features of Informix Universal Server 9.30 with  the distributed nearly unlimited database and distributed query shared nothing capabilities of Informix eXtended Parallel Server (XPS) to create the ultimate database server in a project that Informix called Arrowhead. It didn't happen.  It didn't happen, not because the technology and the expertise were not there, but because a year later, in late 2001, Informix Corporation's database assets were purchased by IBM and the Arrowhead project was squashed.  

Since then, for nearly twelve years now, IBM, much to its own confusion, has been diligently improving the Informix database server.  For many of those twelve years Informix fans have asked IBM repeatedly when they might resurrect Arrowhead.  The answer has always been the same, "Never.  It's never going to happen."

Well, good news.  Like it or not, Grid Queries may not be the original technical design of Arrowhead, but along with the features added in Informix v11.xx that have long existed in XPS, conceptually and usability wise, Arrowhead is alive and well and it is here today!

This is a game changer.  Now transactions against OLTP tables can happen locally on a local server and only tables that affect other locations (so lookup tables, inventory and warehouse levels, shipping manifests, etc.) have to be replicated using ER/GRID and the national, regional, and company wide reporting that requires access to all of the data can get it simply and quickly.  

With the Informix Warehouse Accelerator included in Informix Advanced Enterprise Edition, even complex Business Intelligence/Analytical processing can be performed on the massive distributed data sets using up-to-the-minute data (no ELT/ETL, Cube loading, or data summarization delay) gathered simply from the entire corporate server grid.  Add to that Informix Cluster Grid's ability to cluster, replicate, and distribute data across heterogeneous servers and we have access to power and technology that no organization not using Informix can get.  Not from anyone else anywhere!  It will take decades before the industry Ninja will be able to tack on anything remotely like IBM Informix's Grid Query technology and then it will be a massive cludge.

I'm stoked!  See you all in San Diego in two weeks to hear more about all this!

Thursday, October 4, 2012

Server Performance Improvement and More

Today's post is as close to a shameless plug for my consulting services, and other services provided by Advanced DataTools, as I am ever likely to get so don't feel that you have to read it if you do not want to read an advertisement.

  • Ever wonder if your server could be performing better?
  • Want to know when conditions in your database server will affect production before it happens?
  • Ever wonder if you will be able to handle the load next year as the business grows?
  • Got only one DBA (or you ARE the one DBA) and you are wondering how your company will get through vacations?  DBA illnesses?  DBA injuries?
  • Do you have new requirements in the pipeline?  Do you need to know if you can handle them with your current configuration or if you will need to upgrade equipment and software?
  • Do you need a world class database architect to help design that complex new schema in a way that will minimize the number of application gotcha's you will run into later?
  • Have very old applications and database versions and want to consider getting up-to-date?
  • Want to modernize your old 4GL applications so you don't have to spend $$$$$$ to rewrite them so they can run on the web?  On smart phones?  How about saving on your 4GL development and run-time licensing at the same time?
  • Do you have a need to implement analytical query support but don't know enough about Big Data, Data Marts, and Data Warehouses?
  • Want to perform analysis on your production database without affecting production performance?
  • Have a newbie DBA or database application developer who needs to be brought up to speed quickly?
We can help.  Check out the range of services posted on our web site www.advancedatatools.com or better yet, just give us a call (703-256-0267 or 800-807-6732, select extension 101 and leave a voice mail) or email me (art@advancedatatools.com).  We will call you back and set up a conference call to discuss your requirements.

Monday, September 24, 2012

In search of rownum

Often people ask about various features supported by other RDBMS systems that Informix does not support.  I am rarely bothered by these issues, not because I don't use Oracle, MySQL, PostgreSQL, or MS SQL Server - I don't when I can help it - but rather because for the most part, the required functionality is either there in another form, is not really necessary, or can be emulated easily enough.

Examples are the INTERSECT and EXCEPT join operators.  Informix does not support either syntax. However, the results of an INTERSECT join are identical to the results of a normal INNER join with only columns from one of the tables included in the projection clause which is of course supported. The results of an EXCEPT are identical to those of a LEFT OUTER JOIN with only those values from the left table that do not match any rows in the right table returned.

INTERSECT example - suppliers against whom at least one order has been processed:
select distinct supplier_id
from suppliers
INTERSECT
select distinct supplier_id
from orders;
returns the same result set as:
select distinct orders.supplier_id
from orders, supplier
where orders.supplier_id = suppliers.supplier_id;
EXCEPT example - suppliers against whom no orders have been processed:
select distinct supplier_id
from suppliers
except
select distinct supplier_id
from orders;

returns the same result set as:

select suppliers.supplier_id
from suppliers
left outer join orders
on suppliers.supplier_id = orders.order_id
where orders.order_id is null;

Another is the Oracle ROWNUM pseudo column.  It returns a sequential number, starting from 1, indicating the order in which a row was processed by the server.  Most commonly it is used in the WHERE clause of a query to limit the number of rows returned or to walk through a result set in discrete chunks. 
Examples:

select * from sometable where rownum <= 100;
select * from sometable where rownum > 100 and rownum <= 200;

These are easy to emulate using the ANSI FIRST and SKIP verbs which Informix does support:
select first 100 * from sometable;
select first 100 skip 100 * from sometable;
However, sometimes ROWNUM is used in the projection clause of a query where it represents the selection order as a computed column.  To emulate this use, we need to create two SPL routines.  The procedure init_rownum() below initializes the value returned by the next call to the rownum() function to '1' and is needed if rownum() is going to be used more than once in a session and is expected to return values starting with '1' each time.  Here are the routine definitions:

CREATE FUNCTION  rownum () returning int as rownum;
define global counter int default 0;
let counter = counter + 1;
return counter;
end function;
CREATE PROCEDURE  init_rownum ();
define global counter int default 0;
let counter = 0;

end procedure; 
         
       
In use these routines look like this:

execute procedure init_rownum();
select rownum() as rownum, tabname from systables;
Now suppose you want to sort the tabnames by name but return the rownum of the original order that the rows were processed by the engine:
select *
from (select rownum() as rownum, tabname from systables)

order by tabname;
That's it, nothing profound today, just this little tidbit of technique.  Oh!  Don't forget to check out my TechTips article about Derived Tables in McPress at:

http://www.mcpressonline.com/sql/techtip-combining-aggregate-and-detail-in-the-same-sql-result-set.html


Thursday, September 6, 2012

On Becoming an Informix Tuning Genius

If Apple can promote a bunch of inexperienced college aged nerds to "Apple Genius" by giving them a couple of days of training and hanging a badge around their necks, why can't I create a cadre of "Informix Tuning Geniuses" by giving a class for, fully adult, experienced DBAs and by hanging a certificate on the attendee's wall?  The answer, of course, is "I can!" I just haven't called it that.  Until now.

I am often asked "How did you get to be so good at tuning Informix engines?"  When I hear questions like that my normal human modesty and my massive personal ego go to war.  Modesty wants to answer, softly, "Aw shucks, I just pay attention to what's going on is all."  Meanwhile, my admittedly huge ego wants to scream, "Yeah man. I am awesome!  You'd have to work hard for a long time to get where I am!"  Unfortunately for my ego, my modesty's take on the world far closer to the truth.  When I can hear myself over my ego's protests, I have to admit that I credit several things:
  1. Paying attention really is a big part of what I do, the hard part is what to pay attention to.  In general:
    1. RTFM - Yes, I really do Read The Fine Manuals and pay attention to what I read.
    2. Pay attention to other Informix users and users of other RDBMS products (yes Virginia, even Oracle users know a thing or two about managing a database system).
    3. Ask questions of IBM developers and support people, and pay attention to the answers.  Try to understand how Informix works so what I see in #2 below comes together.
  2. Look at the data that's available from the OS and from the server and dig down until everything makes sense. - More paying attention I guess - Hmm a theme is developing.
  3. Try things out.  Play.  We are Informix DBAs not Oracle DBAs.  We have time to try things out and test since we are not busy 24x7 keeping the damn server online.  Oh, and pay attention to the results.
  4. This was a journey that started for me with a single number on an onstat -p report (bufwaits) that didn't make sense.  I poked and analyzed and spoke to Informix developers until it did.  So, dogged determination is also a part of it.
Sometimes the questions I get are more like "How can I become as good as you?" or at least that's how my ego hears it. ;-)  Here I have a good answer.  It relates to 1.2 and 1.3 above.  Learn from others to kickstart your own journey.  For this I have a good solution, and yes it is a plug, or, actually, two plugs.

Plug #1:
Contract with me to perform a Comprehensive HealthCheck on your server's performance.  I can do that remotely, and that has its benefits, but if you have me come on-site to perform the HealthCheck in your facility, we can work on it together and I will show you what I do and how my thought processes go to pull out the information from the data and make the recommendations that I may make.

Plug #2:
At Advanced DataTools Corp. we run periodic classes including, specific to this discussion, the Advanced Informix Performance Tuning course.  Attending one of these courses you will learn from two of the best, myself and Lester Knutsen, how to tune your Informix server to scream through transactions by doing just that.  You will work hand-in-hand with Lester and me to tune two different database scenarios.  In one you will learn how to get 40 hours worth of data loading and reporting completed in under 5 minutes.  In the other you will tune an Informix instance that, by default, can only complete about 4,000 transactions per minute until it will rip through almost 60,000 transactions per minute.  Between test runs we will review the relevant portions of the Informix manuals and learn what is important to monitor and how to interpret what you are seeing in the data.

The next Advanced Performance Tuning course is scheduled for December 3-6 2012 in our Annandale, Virginia training facility.  But you don't have to travel all the way from Albuquerque or Mumbai to the US Capitol District, you can attend virtually over WebEx.  Currently about half of our typical class is comprised of remote attendees.  Class size is small.  We will only serve up to seven students at a time.  In addition, if you have three or more people from your company that you want to train, we can bring the course to you on your schedule and customizing it for your environment.

Final bit of plugging, here is the URL to ADTC's web site page that discusses out training offerings:
http://www.advancedatatools.com/Training/InformixTraining.html

So, do you want to become an Informix Genius?  Apple Geniuses learn from each other once they get behind that counter and are hands-on.  If you are already an Informix DBA or a Developer/Jack-of-All-Informix, are you already behind the counter but you don't have the Genius Badge yet?  Learn from me.  Learn from Lester.  Give us a call or send us an email and we will schedule you into a course. 



Wednesday, June 27, 2012

What filesystem type should I use for my Informix database on Linux?


The post below is a translation and repost of an article by Eric Vercelleto on his Le Village Informix Blog last August.  Here is the link to the original for my French speaking friends:

http://levillageinformix.blogspot.ca/2011/08/choix-du-type-de-file-system-sur-linux.html

This is not a literal translation, but I hope it is faithful to the Eric's meaning and intent.  Thanks to Sonny Bernier for showing the original to me and many thanks to Eric for permitting me to reprint it here and for all that he does for the Informix community in France and around the world.

Choosing the type of file system on Linux: if you have any doubt ...


Hi all,    It is under the threatening sky that I take my pen to dig into a test I have wanted to remake for some time: 

    That is the impact of the choice of the type of file system on which to implement your chunks for Informix Dynamic Server on Linux.

     As you probably know, raw devices have for some time been considered obsolete technology on most Linux distributions. It is typical of the conflict of interest between database vendors who advocate its use, and the editors of the OS and hardware manufacturers who go in the opposite direction. 

    How to implement chunks in IDS is not without effect, insofar as it directly affects the performance of input-output, and so the overall performance of the Informix server. As a reminder, it's always good to divide up the data on the maximum number of disks, and to mirror the physical devices.  That's why it is better to create the chunks on disks configured with some redundancy (ex: RAID 0/1, RAID 5, journaled file system, etc. ..), but there are costs in terms performance associated with doing so.

     I hold, that it is obvious that there is no reason to use redundant storage for your data dbspaces unless your databases are not logged or if you do not back up your logical logs.   However, in both of these cases, certainly, you should revise your strategy unless the loss of a day's work or more would not cause you problems. 

    As far as the security of your data is concerned, IDS protects you very well.  In the ability to "backup" the logical logs continuously to an external device, and of course the ability to perform regular storage space backups with ontape or onbar, you can face a total crash with a high degree of confidence about the sustainability of your employment. 

    The big advantage of this system is that IDS ensures data consistency (through transactions), and the restoration of your data- after the physical loss of a server - and of completed transactions based on information kept about these same transactions through the logical logs and their backups. The actual interval between each logical log backup to an external device is important, since it will determine how much work can be lost in case of crash of all disks in the instance and the loss of transaction details since the last logical log backup. 

    In short, it is unnecessary to secure your hosts file system of your data, it does not help much, except possibly for the rootdbs and / or the dbspace that contains physical and logical logs to ensure availability (through RAID 0/1 for example) .  (Editors note: Eric holds that the server archives and logical log backups are sufficient to prevent data loss removing the need for redundant storage.  Those who know me know Eric and I disagree on this point, but on the balance of his post we are aligned. -- Art)

     So, to determine which is the best filesystem type for Informix dbspace chunks, I performed a test which consisted of the dbimport of a database of small size (1.7 Gb), but which has very complex indexing. 

   The test plan was as follows:
  1. Creation of a dbspace on a linux ext4 file system (journaled), with the onconfig parameter DIRECT_IO set to 0, then dbimport the data into a non-Logged database.
  2. Creation of the dbspace on a linux ext4 file system (journaled), with the onconfig parameter DIRECT_IO set to 1, then dbimport the data into a non-Logged database.
  3. Creation of the dbspace on a Linux ext2 file system (not journaled), with the onconfig parameter DIRECT_IO set to 0, then dbimport the data into a non-Logged database.
  4. Creation of dbspace on a Linux ext2 file system (not journaled), with the onconfig parameter DIRECT_IO set to 1, then dbimport the data into a non-Logged database.

    I can hear the question: "What is DIRECT_IO". This is an onconfig parameter that allows Informix to override the cache layer of file systems, so IDS writes and reads using the Kernel Asynchronous IO bypassing the filesystem's caching mechanism.  This offers virtually the same benefits as if one is using raw devices, namely safely writing directly to the disks, and a performance gain from using Kernel IO. This parameter is available with the Growth and Enterprise Editions (Ed: and in the new Growth Warehouse and Ultimate Warehouse Editions), however, unfortunately not :-( in Innovator-C Edition. 


    To make this short: 
Test 1: execution time =   91m 41s 
Test 2: execution time = 195m 57s 

Test 3: execution time =   54m 49s 
Test 4: execution time =   49m 54s 

    The final winner is the EXT2 file system with direct_io activated.

   The surprise was the test with ext4 with direct_io activated.

   This turns out to be a combination to be avoided. 

   Note that each test was repeated two times, taking care in between restart the Informix instance. The execution times for each test were very consistent every time. 

    You now have the elements needed to make decisions for your future implementations.See you on our station for new challenges.

Posted by Eric at Vercelletto, August 2011


    On the subject of the need or lack thereof for redundant storage, it's not so much that I think Eric is wrong.  Essentially, he is correct.  Given Informix's famous archive and logical log recovery mechanisms, it is certainly possible to completely recover from a hard crash with minimal data loss.  Where we differ is that I am to lazy to want to go through a full restore like that if there is any way that I can avoid it.  Fully redundant disk storage (ie RAID1 or RAID10) is one such way and I minimize the possibility of ANY data loss that way.  Eric, thanks for a great post and thanks for the efforts of performing the testing. 


    I did some similar testing and reported the very similar results during one of my sessions at the IIUG Conference in San Diego.  I would include EXT3 with EXT4 as being slower than EXT2 unless you turn off the data journaling leaving only meta-data journaling active.  In addition, note that EXT4 and EXT3 with journaling set to write-back mode, are unsafe (see my presentation slides for details).

Thursday, March 15, 2012

Bitemporal Data - Is this the next big thing?

I know you.  You are all thinking "What the heck is that?  What's Art babbling about now?" The concept of bitemporal data deals with the temporal state of our data from two separate perspectives.  Hang in there, I've got to spew the buzz words before I can explain them.  Here is what that all means.  

Data changes over time.  But there are two separate timescales.  The first is the date/time at which a value was modified, and, as part of that, what the value was/is before and after that transition time.  The second timescale or perspective answers the question: For what range of dates/times was each version of the value valid?  Here's an example that I encountered once, in a previous lifetime:  Sales teams.  I had to build a reporting system for sales teams that could cope with multiple levels of reporting (I won't even deal here with the fact that the number of reporting levels was different from one division to another - that's a separate article).  The composition of the teams, the sales divisions each team belongs to, the sales region that owns which groups of divisions, and who manages each level all of those elements were changing periodically.

What happens when a salesperson moves from one team to another in the middle of a reporting period?  How do you account for the change if the reports have to show sales earned while a member of each team attributed to each team's manager and to the correct division.  Meanwhile the individual's sales need to all be attirbuted to that one salesperson.  To make matters worse, the team change was made on the 15th of the month retroactive to the 15th of the previous month and the reporting period is from the 1st of the month to its end.  Compounding it all, last month's reports have to be recalculated.  Lastly, what happens next year when the auditors want to know why the sales figures for the two divisions were recalculated after the fact, want to see an audit and the original figures?

Honestly, when I did this, management pooh pooh'd the most difficult parts of this whole scenario and produced simplified rules that got me off the hook to do it right.  But in general, that may not happen to you if you are hit with something similar.  So, what's the solution?  Bitemporal data versioning.  The ISO/IEC (JTC1 SC32 WG3) SQL committee recently published SQL extensions to cope with bitemporal data.  MS SQL Server, Oracle, and DB2 are each partially compliant since they have had some supporting features for a little while (not long actually) but us poor Informix zlubs got nothing.  OK end of complaint which isn't the purpose of this post.  

You can implement BTD (getting tired of typing it all out so BTD will have to do now) yourself without server support, though support makes things SO much easier, especially at query time.  Each table needs two pairs of date columns:
  • First_Effective_Date
  • Last_Effective_Date
  • Date_Added
  • Date_Invalidated
With these columns in place and the Date_Added and First_Effective_Date made part of the primary key of the table you can implement my nightmare scenario and survive the exercise.

"Wait", you say, "I get the effective dates, why the added and invalidated dates?"  Well, that implements the 'B' part of BTD.  What if on the 15th we change that salesman's team to team Victor.  Then on the 1st of next month we generate the sales report for this last month and regenerate the previous month's adjusted sales report.  Senior Management looks at the report at noon on the 2nd but accounting updated their reports at 9AM.  At 12:15 the Sales Manager calls his assistant and says: "Dude, you screwed up!  I said move John K. from Team Jones to Team Hector not Team Victor.  Go fix this and get me new reports!"

So by 9AM on the 2nd there will be three versions of the sales reports for the prior month and two versions of the reports for last month.  They all have to be reproducable for auditing purposes at any time.  That's why we need the four dates.  Yes, we could live with just one effective data and one valid date, but the queries to determine which row is the one we need to report will get VERY hinky indeed as you would have to look ahead for future effective and valid dates that might invalidate the version you are looking at.

Is this all immensely complex and immensely confusing?  You bet.  Would an implementation of the BTD SQL standard in Informix ease the pain?  Probably.  Do you  REALLY care about BTD?  Well, you should because one day, when you least expect it, BTD is going to jump out and bite you on your... ankle. ;-)  

There is a LinkedIn group called Temporal Data (which includes BTD: http://www.linkedin.com/groups?gid=3885228&trk=myg_ugrp_ovr) that you can join if you want to learn more and join in the fun.  There are several experts that post and lurk there and if you look through the posting history, you will find links to BLOG posts and other documents explaining the issues and solutions more thoroughly.