IIUG World 2019

IIUG 2020 Online World Conference Presentations will begin soon!

Tuesday, October 26, 2010

My next new feature request

I've thought of my next feature request.  Speaking with other users, especially those in high volume near 24x7 environments, I often hear the lament that after a server restart it can take many minutes and often hours before the server reaches its normal steady state with active rows in the cache and all data dictionary entries, data distributions, stored procedures, common statements, and other cached objects loaded up.  During this ramp-up period performance is significantly reduced since most queries require a physical IO to read in the pages that are needed and possibly fill in other cache entries.

Based on this lament, my feature request is that in the background, after every checkpoint, the engine dumps out the list of cached data and index page addresses so that during a restart the engine can - optionally - read in the list and repopulate the cache.  If this goes through the normal page processing algorithms I expect that the data dictionary and distribution caches can be filled in as well.  The statements cache and stored procedure caches may have to be handled separately, though the procedure cache should be simple enough if the procnames are saved along with the page addresses at checkpoint time.  The statement cache may be the costly one since the entire statement texts will have to be saved and restored to memory, but this is also doable.  At least from an outsider's point of view.

I ran this by some of the Informix development team last night here at IOD and no one thought that this one was difficult to pull off.  The only thing we lack is a strong user case.  What do YOU think?  Is this a useful feature?

8 comments:

  1. Yes, this is a very useful feature. However writing out a large memory dump to a file is a costly task. My buffer cache is 60GB in size. We have noticed that even a 12GB SHMDUMP puts a huge load on the system during an assert failure. Having the option to write to raw disks may help.

    If the buffer cache could be copied to physical memory it may be faster. This still does not help if the physical server crashes. But if Informix had to go down for a release or other maintenance, it would be great not to suffer while the buffer cache is being populated.

    What about some type of ER/HDR option? Your stand by server would also have the buffer cache populated. If you fail over to a secondary or ER replicate, the buffer cache would already be well populated. Provided you were using all the new Grid features, you are going to fail over to an ER or HDR/RSS type server. Get it setup for rolling upgrades and your application could fail over during the upgrade and still be using an up to date buffer cache.

    ReplyDelete
  2. You are missing the solution, Kernoal. You don't have to dump the actual buffer cache pages, indeed you may not want to, certainly if you expect the cache to be repopulated after a crash in addition to after a normal shutdown since some of the pages at that point would be dirty. The solution is to NOT dump the actual cache pages but only the page addresses. So, if you have 2 million buffers with big chunks enabled, you would have to dump 16MB of page addresses which is not much at all, certainly not compared to the 4GB or more (for larger pagesize caches) of cached data. At restart the engine would reread the actual data pages from disk in the background after processing fast recovery. You could allow sessions to start using the engine immediately after fast recovery and the reload would abort once the cache was filled (since other pages pulled in by new sessions will have filled some available buffers).

    Hmm, I like your idea about secondary servers. When you bring up a secondary (and a recovering primary would be a secondary initially) it could pull back a buffer page address list from the primary to prime its cache. Same idea as mine, but taking advantage of the current primary's existing cache to prime the workload. The fast recovery of the newly started/restarted secondary would only partially populate the cache to process the logical log pages passed from the primary, so this would complete the process of initializing cache. Madison? Comments?

    ReplyDelete
  3. Great blog Art.

    My feature request is simple: cascade null on foreign key constraints.

    ReplyDelete
  4. Perhaps I should have said "set null".

    Like cascade delete, but instead of deleting the dependent row, nulling out the referring columns of the dependent row (assuming that they're nullable).

    http://www.sqlite.org/foreignkeys.html#fk_actions

    ReplyDelete
  5. TP: OK, I've followed the link you posted to the SQL Lite site and read the description of the use of SET NULL rule for cascading updates. First, Informix, as you know, doesn't support cascading updates. I personally believe this is a good thing and that cascading updates are a bad thing. It should be rare to change a code that is a primary key and foreign key, but I know that in the real world this happens. When it happens, I want control over how it happens and SET CONSTRAINTS ... DEFERRED; is sufficient to get the job done and is explicit and intentional and cannot happen by accident. An improper cascaded update can be the result of an accident.

    Of course, I don't like cascading deletes either and have only once coded a foreign key relationship that included cascading deletes. The resulting chaos quickly convinced me that this had been a very bad idea indeed.

    I mention this mostly so you will understand where my head is at when I say, without any animosity towards or reflection on you, that cascading updates in general, and set null in particular is probably the dumbest idea in the history of relational databases.

    Cascading deletes are bad enough. When you set that up, you are saying: "If I decide I don't need a code any longer, any data that was connected to that code is garbage and you can throw it away without my having to review it first. Now with cascading set null, you are saying instead: "I don't need the code any longer, but one day I may regret deleting the child rows, so I will just wipe out their keys so if I ever find that the data there is useful I will still have it, but, I won't have any idea what code used to describe it!

    No. The correct solution for situations like this is to add a valid/invalid flag to the independent/parent table and simply mark the record invalid when it is no longer used. That will leave the child records with the correct link if it is ever needed, and still "remove" the unused code value. If we really want the foreign key in the dependent table to be changed to NULL, to disassociate it from any code value, that is just as easy to do with an update as it is to delete the parent row or mark it as invalid.

    I hear your suggestion, and you are certainly free to make it to IBM, or even to submit it to the IIUG for inclusion in the next Informix Features survey - which will be coming out at about the time of the IIUG Conference in May, BTW. If it gets enough support, IBM will consider it, they always consider ideas that have a strong user base of support, but personally, I won't vote for it myself.

    Again, no reflection on you. You just want a feature that seems to be useful. But to me, it's like a farmer thinking that the description of a backhoe sounds like a good tool to remove groundhogs from his field. Wrong solution to the problem which will do more damage than the problem it cures.

    ReplyDelete
  6. No offence taken. Our db uses cascade deletes extensively, has been in production for the last 7 years, and there is no chaos; but I'm about as likely to persuade you to change your position, as you are to persuade me to change mine.

    Our db does use "valid/invalid" flags as you suggest; they're on every table, and when appropriate deletes are logical (setting the flag) rather than physical (deleting the row).

    I can understand why cascading deletes might be feared by some; they do need to be treated with respect. For me, they are a useful tool for managing data integrity.

    ReplyDelete
  7. My chaos was a 'data manager' trying to delete an old parent record that had no children from a table with referencing tables that had cascading delete defined and forgetting to put in the WHERE clause. Of course, no BEGIN WORK either so no ROLLBACK WORK. Whoosh, no more data. Had to restore from archive.

    ReplyDelete