IIUG World 2019

IIUG 2020 Online World Conference Presentations will begin soon!

Thursday, October 4, 2012

Server Performance Improvement and More

Today's post is as close to a shameless plug for my consulting services, and other services provided by Advanced DataTools, as I am ever likely to get so don't feel that you have to read it if you do not want to read an advertisement.

  • Ever wonder if your server could be performing better?
  • Want to know when conditions in your database server will affect production before it happens?
  • Ever wonder if you will be able to handle the load next year as the business grows?
  • Got only one DBA (or you ARE the one DBA) and you are wondering how your company will get through vacations?  DBA illnesses?  DBA injuries?
  • Do you have new requirements in the pipeline?  Do you need to know if you can handle them with your current configuration or if you will need to upgrade equipment and software?
  • Do you need a world class database architect to help design that complex new schema in a way that will minimize the number of application gotcha's you will run into later?
  • Have very old applications and database versions and want to consider getting up-to-date?
  • Want to modernize your old 4GL applications so you don't have to spend $$$$$$ to rewrite them so they can run on the web?  On smart phones?  How about saving on your 4GL development and run-time licensing at the same time?
  • Do you have a need to implement analytical query support but don't know enough about Big Data, Data Marts, and Data Warehouses?
  • Want to perform analysis on your production database without affecting production performance?
  • Have a newbie DBA or database application developer who needs to be brought up to speed quickly?
We can help.  Check out the range of services posted on our web site www.advancedatatools.com or better yet, just give us a call (703-256-0267 or 800-807-6732, select extension 101 and leave a voice mail) or email me (art@advancedatatools.com).  We will call you back and set up a conference call to discuss your requirements.

Monday, September 24, 2012

In search of rownum

Often people ask about various features supported by other RDBMS systems that Informix does not support.  I am rarely bothered by these issues, not because I don't use Oracle, MySQL, PostgreSQL, or MS SQL Server - I don't when I can help it - but rather because for the most part, the required functionality is either there in another form, is not really necessary, or can be emulated easily enough.

Examples are the INTERSECT and EXCEPT join operators.  Informix does not support either syntax. However, the results of an INTERSECT join are identical to the results of a normal INNER join with only columns from one of the tables included in the projection clause which is of course supported. The results of an EXCEPT are identical to those of a LEFT OUTER JOIN with only those values from the left table that do not match any rows in the right table returned.

INTERSECT example - suppliers against whom at least one order has been processed:
select distinct supplier_id
from suppliers
INTERSECT
select distinct supplier_id
from orders;
returns the same result set as:
select distinct orders.supplier_id
from orders, supplier
where orders.supplier_id = suppliers.supplier_id;
EXCEPT example - suppliers against whom no orders have been processed:
select distinct supplier_id
from suppliers
except
select distinct supplier_id
from orders;

returns the same result set as:

select suppliers.supplier_id
from suppliers
left outer join orders
on suppliers.supplier_id = orders.order_id
where orders.order_id is null;

Another is the Oracle ROWNUM pseudo column.  It returns a sequential number, starting from 1, indicating the order in which a row was processed by the server.  Most commonly it is used in the WHERE clause of a query to limit the number of rows returned or to walk through a result set in discrete chunks. 
Examples:

select * from sometable where rownum <= 100;
select * from sometable where rownum > 100 and rownum <= 200;

These are easy to emulate using the ANSI FIRST and SKIP verbs which Informix does support:
select first 100 * from sometable;
select first 100 skip 100 * from sometable;
However, sometimes ROWNUM is used in the projection clause of a query where it represents the selection order as a computed column.  To emulate this use, we need to create two SPL routines.  The procedure init_rownum() below initializes the value returned by the next call to the rownum() function to '1' and is needed if rownum() is going to be used more than once in a session and is expected to return values starting with '1' each time.  Here are the routine definitions:

CREATE FUNCTION  rownum () returning int as rownum;
define global counter int default 0;
let counter = counter + 1;
return counter;
end function;
CREATE PROCEDURE  init_rownum ();
define global counter int default 0;
let counter = 0;

end procedure; 
         
       
In use these routines look like this:

execute procedure init_rownum();
select rownum() as rownum, tabname from systables;
Now suppose you want to sort the tabnames by name but return the rownum of the original order that the rows were processed by the engine:
select *
from (select rownum() as rownum, tabname from systables)

order by tabname;
That's it, nothing profound today, just this little tidbit of technique.  Oh!  Don't forget to check out my TechTips article about Derived Tables in McPress at:

http://www.mcpressonline.com/sql/techtip-combining-aggregate-and-detail-in-the-same-sql-result-set.html


Thursday, September 6, 2012

On Becoming an Informix Tuning Genius

If Apple can promote a bunch of inexperienced college aged nerds to "Apple Genius" by giving them a couple of days of training and hanging a badge around their necks, why can't I create a cadre of "Informix Tuning Geniuses" by giving a class for, fully adult, experienced DBAs and by hanging a certificate on the attendee's wall?  The answer, of course, is "I can!" I just haven't called it that.  Until now.

I am often asked "How did you get to be so good at tuning Informix engines?"  When I hear questions like that my normal human modesty and my massive personal ego go to war.  Modesty wants to answer, softly, "Aw shucks, I just pay attention to what's going on is all."  Meanwhile, my admittedly huge ego wants to scream, "Yeah man. I am awesome!  You'd have to work hard for a long time to get where I am!"  Unfortunately for my ego, my modesty's take on the world far closer to the truth.  When I can hear myself over my ego's protests, I have to admit that I credit several things:
  1. Paying attention really is a big part of what I do, the hard part is what to pay attention to.  In general:
    1. RTFM - Yes, I really do Read The Fine Manuals and pay attention to what I read.
    2. Pay attention to other Informix users and users of other RDBMS products (yes Virginia, even Oracle users know a thing or two about managing a database system).
    3. Ask questions of IBM developers and support people, and pay attention to the answers.  Try to understand how Informix works so what I see in #2 below comes together.
  2. Look at the data that's available from the OS and from the server and dig down until everything makes sense. - More paying attention I guess - Hmm a theme is developing.
  3. Try things out.  Play.  We are Informix DBAs not Oracle DBAs.  We have time to try things out and test since we are not busy 24x7 keeping the damn server online.  Oh, and pay attention to the results.
  4. This was a journey that started for me with a single number on an onstat -p report (bufwaits) that didn't make sense.  I poked and analyzed and spoke to Informix developers until it did.  So, dogged determination is also a part of it.
Sometimes the questions I get are more like "How can I become as good as you?" or at least that's how my ego hears it. ;-)  Here I have a good answer.  It relates to 1.2 and 1.3 above.  Learn from others to kickstart your own journey.  For this I have a good solution, and yes it is a plug, or, actually, two plugs.

Plug #1:
Contract with me to perform a Comprehensive HealthCheck on your server's performance.  I can do that remotely, and that has its benefits, but if you have me come on-site to perform the HealthCheck in your facility, we can work on it together and I will show you what I do and how my thought processes go to pull out the information from the data and make the recommendations that I may make.

Plug #2:
At Advanced DataTools Corp. we run periodic classes including, specific to this discussion, the Advanced Informix Performance Tuning course.  Attending one of these courses you will learn from two of the best, myself and Lester Knutsen, how to tune your Informix server to scream through transactions by doing just that.  You will work hand-in-hand with Lester and me to tune two different database scenarios.  In one you will learn how to get 40 hours worth of data loading and reporting completed in under 5 minutes.  In the other you will tune an Informix instance that, by default, can only complete about 4,000 transactions per minute until it will rip through almost 60,000 transactions per minute.  Between test runs we will review the relevant portions of the Informix manuals and learn what is important to monitor and how to interpret what you are seeing in the data.

The next Advanced Performance Tuning course is scheduled for December 3-6 2012 in our Annandale, Virginia training facility.  But you don't have to travel all the way from Albuquerque or Mumbai to the US Capitol District, you can attend virtually over WebEx.  Currently about half of our typical class is comprised of remote attendees.  Class size is small.  We will only serve up to seven students at a time.  In addition, if you have three or more people from your company that you want to train, we can bring the course to you on your schedule and customizing it for your environment.

Final bit of plugging, here is the URL to ADTC's web site page that discusses out training offerings:
http://www.advancedatatools.com/Training/InformixTraining.html

So, do you want to become an Informix Genius?  Apple Geniuses learn from each other once they get behind that counter and are hands-on.  If you are already an Informix DBA or a Developer/Jack-of-All-Informix, are you already behind the counter but you don't have the Genius Badge yet?  Learn from me.  Learn from Lester.  Give us a call or send us an email and we will schedule you into a course. 



Wednesday, June 27, 2012

What filesystem type should I use for my Informix database on Linux?


The post below is a translation and repost of an article by Eric Vercelleto on his Le Village Informix Blog last August.  Here is the link to the original for my French speaking friends:

http://levillageinformix.blogspot.ca/2011/08/choix-du-type-de-file-system-sur-linux.html

This is not a literal translation, but I hope it is faithful to the Eric's meaning and intent.  Thanks to Sonny Bernier for showing the original to me and many thanks to Eric for permitting me to reprint it here and for all that he does for the Informix community in France and around the world.

Choosing the type of file system on Linux: if you have any doubt ...


Hi all,    It is under the threatening sky that I take my pen to dig into a test I have wanted to remake for some time: 

    That is the impact of the choice of the type of file system on which to implement your chunks for Informix Dynamic Server on Linux.

     As you probably know, raw devices have for some time been considered obsolete technology on most Linux distributions. It is typical of the conflict of interest between database vendors who advocate its use, and the editors of the OS and hardware manufacturers who go in the opposite direction. 

    How to implement chunks in IDS is not without effect, insofar as it directly affects the performance of input-output, and so the overall performance of the Informix server. As a reminder, it's always good to divide up the data on the maximum number of disks, and to mirror the physical devices.  That's why it is better to create the chunks on disks configured with some redundancy (ex: RAID 0/1, RAID 5, journaled file system, etc. ..), but there are costs in terms performance associated with doing so.

     I hold, that it is obvious that there is no reason to use redundant storage for your data dbspaces unless your databases are not logged or if you do not back up your logical logs.   However, in both of these cases, certainly, you should revise your strategy unless the loss of a day's work or more would not cause you problems. 

    As far as the security of your data is concerned, IDS protects you very well.  In the ability to "backup" the logical logs continuously to an external device, and of course the ability to perform regular storage space backups with ontape or onbar, you can face a total crash with a high degree of confidence about the sustainability of your employment. 

    The big advantage of this system is that IDS ensures data consistency (through transactions), and the restoration of your data- after the physical loss of a server - and of completed transactions based on information kept about these same transactions through the logical logs and their backups. The actual interval between each logical log backup to an external device is important, since it will determine how much work can be lost in case of crash of all disks in the instance and the loss of transaction details since the last logical log backup. 

    In short, it is unnecessary to secure your hosts file system of your data, it does not help much, except possibly for the rootdbs and / or the dbspace that contains physical and logical logs to ensure availability (through RAID 0/1 for example) .  (Editors note: Eric holds that the server archives and logical log backups are sufficient to prevent data loss removing the need for redundant storage.  Those who know me know Eric and I disagree on this point, but on the balance of his post we are aligned. -- Art)

     So, to determine which is the best filesystem type for Informix dbspace chunks, I performed a test which consisted of the dbimport of a database of small size (1.7 Gb), but which has very complex indexing. 

   The test plan was as follows:
  1. Creation of a dbspace on a linux ext4 file system (journaled), with the onconfig parameter DIRECT_IO set to 0, then dbimport the data into a non-Logged database.
  2. Creation of the dbspace on a linux ext4 file system (journaled), with the onconfig parameter DIRECT_IO set to 1, then dbimport the data into a non-Logged database.
  3. Creation of the dbspace on a Linux ext2 file system (not journaled), with the onconfig parameter DIRECT_IO set to 0, then dbimport the data into a non-Logged database.
  4. Creation of dbspace on a Linux ext2 file system (not journaled), with the onconfig parameter DIRECT_IO set to 1, then dbimport the data into a non-Logged database.

    I can hear the question: "What is DIRECT_IO". This is an onconfig parameter that allows Informix to override the cache layer of file systems, so IDS writes and reads using the Kernel Asynchronous IO bypassing the filesystem's caching mechanism.  This offers virtually the same benefits as if one is using raw devices, namely safely writing directly to the disks, and a performance gain from using Kernel IO. This parameter is available with the Growth and Enterprise Editions (Ed: and in the new Growth Warehouse and Ultimate Warehouse Editions), however, unfortunately not :-( in Innovator-C Edition. 


    To make this short: 
Test 1: execution time =   91m 41s 
Test 2: execution time = 195m 57s 

Test 3: execution time =   54m 49s 
Test 4: execution time =   49m 54s 

    The final winner is the EXT2 file system with direct_io activated.

   The surprise was the test with ext4 with direct_io activated.

   This turns out to be a combination to be avoided. 

   Note that each test was repeated two times, taking care in between restart the Informix instance. The execution times for each test were very consistent every time. 

    You now have the elements needed to make decisions for your future implementations.See you on our station for new challenges.

Posted by Eric at Vercelletto, August 2011


    On the subject of the need or lack thereof for redundant storage, it's not so much that I think Eric is wrong.  Essentially, he is correct.  Given Informix's famous archive and logical log recovery mechanisms, it is certainly possible to completely recover from a hard crash with minimal data loss.  Where we differ is that I am to lazy to want to go through a full restore like that if there is any way that I can avoid it.  Fully redundant disk storage (ie RAID1 or RAID10) is one such way and I minimize the possibility of ANY data loss that way.  Eric, thanks for a great post and thanks for the efforts of performing the testing. 


    I did some similar testing and reported the very similar results during one of my sessions at the IIUG Conference in San Diego.  I would include EXT3 with EXT4 as being slower than EXT2 unless you turn off the data journaling leaving only meta-data journaling active.  In addition, note that EXT4 and EXT3 with journaling set to write-back mode, are unsafe (see my presentation slides for details).

Thursday, March 15, 2012

Bitemporal Data - Is this the next big thing?

I know you.  You are all thinking "What the heck is that?  What's Art babbling about now?" The concept of bitemporal data deals with the temporal state of our data from two separate perspectives.  Hang in there, I've got to spew the buzz words before I can explain them.  Here is what that all means.  

Data changes over time.  But there are two separate timescales.  The first is the date/time at which a value was modified, and, as part of that, what the value was/is before and after that transition time.  The second timescale or perspective answers the question: For what range of dates/times was each version of the value valid?  Here's an example that I encountered once, in a previous lifetime:  Sales teams.  I had to build a reporting system for sales teams that could cope with multiple levels of reporting (I won't even deal here with the fact that the number of reporting levels was different from one division to another - that's a separate article).  The composition of the teams, the sales divisions each team belongs to, the sales region that owns which groups of divisions, and who manages each level all of those elements were changing periodically.

What happens when a salesperson moves from one team to another in the middle of a reporting period?  How do you account for the change if the reports have to show sales earned while a member of each team attributed to each team's manager and to the correct division.  Meanwhile the individual's sales need to all be attirbuted to that one salesperson.  To make matters worse, the team change was made on the 15th of the month retroactive to the 15th of the previous month and the reporting period is from the 1st of the month to its end.  Compounding it all, last month's reports have to be recalculated.  Lastly, what happens next year when the auditors want to know why the sales figures for the two divisions were recalculated after the fact, want to see an audit and the original figures?

Honestly, when I did this, management pooh pooh'd the most difficult parts of this whole scenario and produced simplified rules that got me off the hook to do it right.  But in general, that may not happen to you if you are hit with something similar.  So, what's the solution?  Bitemporal data versioning.  The ISO/IEC (JTC1 SC32 WG3) SQL committee recently published SQL extensions to cope with bitemporal data.  MS SQL Server, Oracle, and DB2 are each partially compliant since they have had some supporting features for a little while (not long actually) but us poor Informix zlubs got nothing.  OK end of complaint which isn't the purpose of this post.  

You can implement BTD (getting tired of typing it all out so BTD will have to do now) yourself without server support, though support makes things SO much easier, especially at query time.  Each table needs two pairs of date columns:
  • First_Effective_Date
  • Last_Effective_Date
  • Date_Added
  • Date_Invalidated
With these columns in place and the Date_Added and First_Effective_Date made part of the primary key of the table you can implement my nightmare scenario and survive the exercise.

"Wait", you say, "I get the effective dates, why the added and invalidated dates?"  Well, that implements the 'B' part of BTD.  What if on the 15th we change that salesman's team to team Victor.  Then on the 1st of next month we generate the sales report for this last month and regenerate the previous month's adjusted sales report.  Senior Management looks at the report at noon on the 2nd but accounting updated their reports at 9AM.  At 12:15 the Sales Manager calls his assistant and says: "Dude, you screwed up!  I said move John K. from Team Jones to Team Hector not Team Victor.  Go fix this and get me new reports!"

So by 9AM on the 2nd there will be three versions of the sales reports for the prior month and two versions of the reports for last month.  They all have to be reproducable for auditing purposes at any time.  That's why we need the four dates.  Yes, we could live with just one effective data and one valid date, but the queries to determine which row is the one we need to report will get VERY hinky indeed as you would have to look ahead for future effective and valid dates that might invalidate the version you are looking at.

Is this all immensely complex and immensely confusing?  You bet.  Would an implementation of the BTD SQL standard in Informix ease the pain?  Probably.  Do you  REALLY care about BTD?  Well, you should because one day, when you least expect it, BTD is going to jump out and bite you on your... ankle. ;-)  

There is a LinkedIn group called Temporal Data (which includes BTD: http://www.linkedin.com/groups?gid=3885228&trk=myg_ugrp_ovr) that you can join if you want to learn more and join in the fun.  There are several experts that post and lurk there and if you look through the posting history, you will find links to BLOG posts and other documents explaining the issues and solutions more thoroughly. 

Wednesday, February 8, 2012

It seems that my friends at Oninit UK have worked with i2Global to port the excellent open source CRM SugarCRM to use Informix as its repository.  It will be no surprise to us that with Informix standing behind it, SugarCRM now scales much better than it did with its default repository RDBMS.  Duh!

When is IBM going to get it that they have the best database without peer in Informix and finally get 100% behind it?