IIUG World 2019

IIUG 2020 Online World Conference Presentations will begin soon!

Wednesday, June 27, 2012

What filesystem type should I use for my Informix database on Linux?


The post below is a translation and repost of an article by Eric Vercelleto on his Le Village Informix Blog last August.  Here is the link to the original for my French speaking friends:

http://levillageinformix.blogspot.ca/2011/08/choix-du-type-de-file-system-sur-linux.html

This is not a literal translation, but I hope it is faithful to the Eric's meaning and intent.  Thanks to Sonny Bernier for showing the original to me and many thanks to Eric for permitting me to reprint it here and for all that he does for the Informix community in France and around the world.

Choosing the type of file system on Linux: if you have any doubt ...


Hi all,    It is under the threatening sky that I take my pen to dig into a test I have wanted to remake for some time: 

    That is the impact of the choice of the type of file system on which to implement your chunks for Informix Dynamic Server on Linux.

     As you probably know, raw devices have for some time been considered obsolete technology on most Linux distributions. It is typical of the conflict of interest between database vendors who advocate its use, and the editors of the OS and hardware manufacturers who go in the opposite direction. 

    How to implement chunks in IDS is not without effect, insofar as it directly affects the performance of input-output, and so the overall performance of the Informix server. As a reminder, it's always good to divide up the data on the maximum number of disks, and to mirror the physical devices.  That's why it is better to create the chunks on disks configured with some redundancy (ex: RAID 0/1, RAID 5, journaled file system, etc. ..), but there are costs in terms performance associated with doing so.

     I hold, that it is obvious that there is no reason to use redundant storage for your data dbspaces unless your databases are not logged or if you do not back up your logical logs.   However, in both of these cases, certainly, you should revise your strategy unless the loss of a day's work or more would not cause you problems. 

    As far as the security of your data is concerned, IDS protects you very well.  In the ability to "backup" the logical logs continuously to an external device, and of course the ability to perform regular storage space backups with ontape or onbar, you can face a total crash with a high degree of confidence about the sustainability of your employment. 

    The big advantage of this system is that IDS ensures data consistency (through transactions), and the restoration of your data- after the physical loss of a server - and of completed transactions based on information kept about these same transactions through the logical logs and their backups. The actual interval between each logical log backup to an external device is important, since it will determine how much work can be lost in case of crash of all disks in the instance and the loss of transaction details since the last logical log backup. 

    In short, it is unnecessary to secure your hosts file system of your data, it does not help much, except possibly for the rootdbs and / or the dbspace that contains physical and logical logs to ensure availability (through RAID 0/1 for example) .  (Editors note: Eric holds that the server archives and logical log backups are sufficient to prevent data loss removing the need for redundant storage.  Those who know me know Eric and I disagree on this point, but on the balance of his post we are aligned. -- Art)

     So, to determine which is the best filesystem type for Informix dbspace chunks, I performed a test which consisted of the dbimport of a database of small size (1.7 Gb), but which has very complex indexing. 

   The test plan was as follows:
  1. Creation of a dbspace on a linux ext4 file system (journaled), with the onconfig parameter DIRECT_IO set to 0, then dbimport the data into a non-Logged database.
  2. Creation of the dbspace on a linux ext4 file system (journaled), with the onconfig parameter DIRECT_IO set to 1, then dbimport the data into a non-Logged database.
  3. Creation of the dbspace on a Linux ext2 file system (not journaled), with the onconfig parameter DIRECT_IO set to 0, then dbimport the data into a non-Logged database.
  4. Creation of dbspace on a Linux ext2 file system (not journaled), with the onconfig parameter DIRECT_IO set to 1, then dbimport the data into a non-Logged database.

    I can hear the question: "What is DIRECT_IO". This is an onconfig parameter that allows Informix to override the cache layer of file systems, so IDS writes and reads using the Kernel Asynchronous IO bypassing the filesystem's caching mechanism.  This offers virtually the same benefits as if one is using raw devices, namely safely writing directly to the disks, and a performance gain from using Kernel IO. This parameter is available with the Growth and Enterprise Editions (Ed: and in the new Growth Warehouse and Ultimate Warehouse Editions), however, unfortunately not :-( in Innovator-C Edition. 


    To make this short: 
Test 1: execution time =   91m 41s 
Test 2: execution time = 195m 57s 

Test 3: execution time =   54m 49s 
Test 4: execution time =   49m 54s 

    The final winner is the EXT2 file system with direct_io activated.

   The surprise was the test with ext4 with direct_io activated.

   This turns out to be a combination to be avoided. 

   Note that each test was repeated two times, taking care in between restart the Informix instance. The execution times for each test were very consistent every time. 

    You now have the elements needed to make decisions for your future implementations.See you on our station for new challenges.

Posted by Eric at Vercelletto, August 2011


    On the subject of the need or lack thereof for redundant storage, it's not so much that I think Eric is wrong.  Essentially, he is correct.  Given Informix's famous archive and logical log recovery mechanisms, it is certainly possible to completely recover from a hard crash with minimal data loss.  Where we differ is that I am to lazy to want to go through a full restore like that if there is any way that I can avoid it.  Fully redundant disk storage (ie RAID1 or RAID10) is one such way and I minimize the possibility of ANY data loss that way.  Eric, thanks for a great post and thanks for the efforts of performing the testing. 


    I did some similar testing and reported the very similar results during one of my sessions at the IIUG Conference in San Diego.  I would include EXT3 with EXT4 as being slower than EXT2 unless you turn off the data journaling leaving only meta-data journaling active.  In addition, note that EXT4 and EXT3 with journaling set to write-back mode, are unsafe (see my presentation slides for details).

5 comments:

  1. Been doing a lot of testing recently and this is what I found

    http://www.oninit.com/bench/index.php?id=directio.html

    Cheers
    Paul
    www.oninit.com

    ReplyDelete
  2. Raid 1 not 10 but only on active / active san. Split into 4 or 8 devices. 1 lun path per device. Small disks, many spindles even with 64gb+ cache.
    Raid 10 on single channel San.
    Stripe width / block size 32k.
    ufs/ext2 agree.
    Ra pages 128 and up.

    ReplyDelete
  3. Peter: I missed you comment when you made it in June, sorry. I'm not sure that I understand all of your points, but I agree where I understand. When using RAID10 it is important to limit the stripe block size, 32K is ideal. Many SAN manufacturers and "SAN experts" push huge stripe blocks like 1MB or larger, but while that works very well for filesystems it performs very poorly for databases (whether Informix, Oracle, or anything else).

    More spindles is always better than fewer spindles!

    I disagree about the RA_PAGES though. I have always believed that with modern systems with intelligent drives, intelligent controllers, intelligent SANs all with local cache and their own levels of read ahead, read ahead in Informix is mostly overkill and tends to thrash the cache unnecessarily.

    Anyway, please feel free to expand you intent here and we can have a discussion.

    ReplyDelete
  4. have you tested ext4 without journal ?

    ReplyDelete
    Replies
    1. kk: I'm no Linux or filesystem expert, my understanding was that you cannot disable the journaling in EXT4. If you can, then like disabling the journal in EXT3, this results in what is essentially an EXT2 filesystem that you can expand dynamically without unmounting it. So, the performance SHOULD be roughly the same as what Eric's test reported for EXT2.

      Delete