I've thought of my next feature request. Speaking with other users, especially those in high volume near 24x7 environments, I often hear the lament that after a server restart it can take many minutes and often hours before the server reaches its normal steady state with active rows in the cache and all data dictionary entries, data distributions, stored procedures, common statements, and other cached objects loaded up. During this ramp-up period performance is significantly reduced since most queries require a physical IO to read in the pages that are needed and possibly fill in other cache entries.
Based on this lament, my feature request is that in the background, after every checkpoint, the engine dumps out the list of cached data and index page addresses so that during a restart the engine can - optionally - read in the list and repopulate the cache. If this goes through the normal page processing algorithms I expect that the data dictionary and distribution caches can be filled in as well. The statements cache and stored procedure caches may have to be handled separately, though the procedure cache should be simple enough if the procnames are saved along with the page addresses at checkpoint time. The statement cache may be the costly one since the entire statement texts will have to be saved and restored to memory, but this is also doable. At least from an outsider's point of view.
I ran this by some of the Informix development team last night here at IOD and no one thought that this one was difficult to pull off. The only thing we lack is a strong user case. What do YOU think? Is this a useful feature?
Tuesday, October 26, 2010
Monday, October 18, 2010
Fear the Panther
So, after much waiting and absolutely no pre-event hoopla, Panther is finally here. You will see much in the press and in the IBM official announcements about the features of Panther (Informix Dynamic Server version 11.70) that IBM thinks are important to the market place. Jerry Keesee is calling Panther "The Last of the Big Cats" hinting that IBM will be finding another theme for the next release.
I don't want to repeat all of what you will hear from IBM, but I do want to provide my own announcement of those features that I feel will be the most important in the market place over the next 12 months. This is a tremendous release as far as features are concerned. Without further ado, here's my list, categorized as IBM has done theirs:
OLTP Data Security
What do I mean by OLTP Data Security? Well, what is the greatest fear in an OLTP environment? To me it is loss of transactional data. If a user has completed a transaction and thinks it is secured in your database, you cannot afford to lose it. If that transaction was a $20 ATM withdrawal, your company is on the hook for the money. If it was a multi-million dollar electronic funds transfer, even more so.
If a user is in the midst of a transaction and your primary server crashes, do you tell the users "Sorry. System problem. Please try again later." You will lose a customer, or several hundred customers.
"OK.", you ask, "How can Panther help?" Panther has a new feature that IBM is calling "transaction survival". I call it "uninterruptible transactions". If your primary MACH11 server goes down any transactions begun by a user connected to any secondary server (HDR, SDS, or RSS) will not be lost but will be picked up by the surviving SDS secondary server which is promoted to be the new primary server and the transaction will continue unharmed. There is no application change or ONCONFIG parameter needed to enable this feature and make it work. Your existing applications don't have to be modified to reconnect to the new primary server. This is all automatic and effort-free. No other RDBMS server in the industry has this feature. Oracle does have a similar feature but it's not real and it's not automatic. You have to enable it in your application because it is actually implemented in the front-end libraries which fake survival by recording the transaction locally on the client in a flat file and replay the transaction from scratch after reconnecting to the new server. The feature is so lame that Oracle doesn't even talk about it.
To me transaction survival is a biggie! The biggest feature in Panther.
OLTP Performance
First, ADTC's testing shows that Panther is about 11% faster than 11.50xC7 without taking advantage of any of the performance enhancing features mentioned below. However, Panther has several features to further improve performance of OLTP systems.
One is the removal of the requirement for a foreign key constraint to have an index on the foreign key. The supporting index can now be optionally disabled at constraint create time (or later) and its storage will be released. The selectivity of these indexes on the foreign keys pointing to lookup tables with few rows is low and the utility of requiring the index has always been questionable. The engine does not need these indexes to enforce the constraints unless cascading deletes are enabled. You likely don't need them for searching because most systems have composite keys containing those foreign key columns that are used for searching and filtering. It is rare in an OLTP style query for the lookup tables to be selected as the primary query tables requiring an index on the dependent table to support the join. Removing this requirement will reduce the overhead of inserting and modifying data in tables with many code columns without having any impact on query performance.
Another one is the new Forest of Trees indexes. This is a new indexing method that combines many of the advantages of a Hash index with those of a traditional Informix B+tree index. For indexes on keys who's first column(s) have low selectivity but are still important to the correct processing of many OLTP queries (think geo, country, region of an index supporting a company's geographic reporting structure as an example) the b+tree indexes can become rather deep with relatively few unique elements on a level. What Forest of Trees indexes do is to let the Database Architect specify one or more of the leading columns to be used to create a hash key. Then a separate b+tree is created for each hash value containing only the remaining key parts following the hash columns. This results in several flatter b+tree indexes. You can't perform range scans on the hash columns (for that you will also need a pure b+tree index) but you can do so on the remaining columns in the index key.
Multiple index scans in the optimizer is another big win for OLTP. Often you have a complex join between three or four tables using the filtered rows from the independent tables to act as filters on the dependent table and several filters on multiple columns in the dependent table. The optimizer in earlier releases of IDS had to select the "best index" often resulting the engine having to read many rows of data and perform the final filters using the actual row data even though indexes on the filter columns are available. The old XPS optimizer could use more than one index on each table in a query. Now Panther can as well. This can reduce the number of long key compound indexes which can also improve insert, delete, and update performance. For databases that are delivered as part of a third party application where you can't control the schema, this one may be a HUGE win since the composite indexes that could make such queries reasonably efficient may never have existed in the first place.
Data Warehouse Performance
New Star Schema support in the optimizer. What does this one mean? This is another offshoot of XPS and the multi-index support above. In Panther, when you have many dimension tables, you can create a single compound index on all of the dimension table keys in the fact table. The optimizer will realize this and use the filters on the dimension tables to generate a list of valid combinations of the dimension tables' keys into a temp table and join that temp table to the fact table using the compound index to quickly locate the rows that satisfy the criteria on the dimension tables. Whoosh! At ADTC we've tested this one quite a bit and the results are impressive.
Administration
I put the new Heterogeneous Server Grid feature under this heading because I can't think of a better one. IBM is actually making quite a bit of noise about this one and rightfully so. If you have lots of old hardware around, don't throw it out. Don't sell it to a junker or to a used equipment reseller. Configure it as an IDS grid node and add to your company server farm's net computing power at near zero cost (since the boxes are already paid for and amortized). I'm waiting for confirmation about whether this means you can have an HDR or other MACH11 secondary that's build from different hardware or different OS. Is suspect not, but who knows what the future may bring. Heterogeneous grid was a big request from a number of very large Informix customers, so if Heterogeneous MACH11 members are what you need, speak up.
Zero downtime rolling upgrades. Speaks for itself. Jarrod down in New Zealand is jumping for joy over this one I'm sure. He and his one assistant have to upgrade a very large number of servers during the few weeks a year that his company's servers can afford to be offline even for a few hours. Being able to configure their server grid to be able to do this anytime to roll in a fixpack that removes a critical bug will be a god send for him and many of us out there.
The IIUG will soon be posting its new Features Survey which we will then hand to IBM to help them design the next release of Informix Dynamic Server. If you don't think that you have a voice in how IBM determines features for Informix, think again. Every one of the features mentioned here was requested by users like you/me/us either directly or through an expressed need that had to be filled. Several are features that were asked for and received high marks on the previous Features Surveys. So, when you see the Insider or email announcement of the Survey, fill it out.
In the next release I'd like to see:
What would you like to see? Comments welcome.
I don't want to repeat all of what you will hear from IBM, but I do want to provide my own announcement of those features that I feel will be the most important in the market place over the next 12 months. This is a tremendous release as far as features are concerned. Without further ado, here's my list, categorized as IBM has done theirs:
OLTP Data Security
What do I mean by OLTP Data Security? Well, what is the greatest fear in an OLTP environment? To me it is loss of transactional data. If a user has completed a transaction and thinks it is secured in your database, you cannot afford to lose it. If that transaction was a $20 ATM withdrawal, your company is on the hook for the money. If it was a multi-million dollar electronic funds transfer, even more so.
If a user is in the midst of a transaction and your primary server crashes, do you tell the users "Sorry. System problem. Please try again later." You will lose a customer, or several hundred customers.
"OK.", you ask, "How can Panther help?" Panther has a new feature that IBM is calling "transaction survival". I call it "uninterruptible transactions". If your primary MACH11 server goes down any transactions begun by a user connected to any secondary server (HDR, SDS, or RSS) will not be lost but will be picked up by the surviving SDS secondary server which is promoted to be the new primary server and the transaction will continue unharmed. There is no application change or ONCONFIG parameter needed to enable this feature and make it work. Your existing applications don't have to be modified to reconnect to the new primary server. This is all automatic and effort-free. No other RDBMS server in the industry has this feature. Oracle does have a similar feature but it's not real and it's not automatic. You have to enable it in your application because it is actually implemented in the front-end libraries which fake survival by recording the transaction locally on the client in a flat file and replay the transaction from scratch after reconnecting to the new server. The feature is so lame that Oracle doesn't even talk about it.
To me transaction survival is a biggie! The biggest feature in Panther.
OLTP Performance
First, ADTC's testing shows that Panther is about 11% faster than 11.50xC7 without taking advantage of any of the performance enhancing features mentioned below. However, Panther has several features to further improve performance of OLTP systems.
One is the removal of the requirement for a foreign key constraint to have an index on the foreign key. The supporting index can now be optionally disabled at constraint create time (or later) and its storage will be released. The selectivity of these indexes on the foreign keys pointing to lookup tables with few rows is low and the utility of requiring the index has always been questionable. The engine does not need these indexes to enforce the constraints unless cascading deletes are enabled. You likely don't need them for searching because most systems have composite keys containing those foreign key columns that are used for searching and filtering. It is rare in an OLTP style query for the lookup tables to be selected as the primary query tables requiring an index on the dependent table to support the join. Removing this requirement will reduce the overhead of inserting and modifying data in tables with many code columns without having any impact on query performance.
Another one is the new Forest of Trees indexes. This is a new indexing method that combines many of the advantages of a Hash index with those of a traditional Informix B+tree index. For indexes on keys who's first column(s) have low selectivity but are still important to the correct processing of many OLTP queries (think geo, country, region of an index supporting a company's geographic reporting structure as an example) the b+tree indexes can become rather deep with relatively few unique elements on a level. What Forest of Trees indexes do is to let the Database Architect specify one or more of the leading columns to be used to create a hash key. Then a separate b+tree is created for each hash value containing only the remaining key parts following the hash columns. This results in several flatter b+tree indexes. You can't perform range scans on the hash columns (for that you will also need a pure b+tree index) but you can do so on the remaining columns in the index key.
Multiple index scans in the optimizer is another big win for OLTP. Often you have a complex join between three or four tables using the filtered rows from the independent tables to act as filters on the dependent table and several filters on multiple columns in the dependent table. The optimizer in earlier releases of IDS had to select the "best index" often resulting the engine having to read many rows of data and perform the final filters using the actual row data even though indexes on the filter columns are available. The old XPS optimizer could use more than one index on each table in a query. Now Panther can as well. This can reduce the number of long key compound indexes which can also improve insert, delete, and update performance. For databases that are delivered as part of a third party application where you can't control the schema, this one may be a HUGE win since the composite indexes that could make such queries reasonably efficient may never have existed in the first place.
Data Warehouse Performance
New Star Schema support in the optimizer. What does this one mean? This is another offshoot of XPS and the multi-index support above. In Panther, when you have many dimension tables, you can create a single compound index on all of the dimension table keys in the fact table. The optimizer will realize this and use the filters on the dimension tables to generate a list of valid combinations of the dimension tables' keys into a temp table and join that temp table to the fact table using the compound index to quickly locate the rows that satisfy the criteria on the dimension tables. Whoosh! At ADTC we've tested this one quite a bit and the results are impressive.
Administration
I put the new Heterogeneous Server Grid feature under this heading because I can't think of a better one. IBM is actually making quite a bit of noise about this one and rightfully so. If you have lots of old hardware around, don't throw it out. Don't sell it to a junker or to a used equipment reseller. Configure it as an IDS grid node and add to your company server farm's net computing power at near zero cost (since the boxes are already paid for and amortized). I'm waiting for confirmation about whether this means you can have an HDR or other MACH11 secondary that's build from different hardware or different OS. Is suspect not, but who knows what the future may bring. Heterogeneous grid was a big request from a number of very large Informix customers, so if Heterogeneous MACH11 members are what you need, speak up.
Zero downtime rolling upgrades. Speaks for itself. Jarrod down in New Zealand is jumping for joy over this one I'm sure. He and his one assistant have to upgrade a very large number of servers during the few weeks a year that his company's servers can afford to be offline even for a few hours. Being able to configure their server grid to be able to do this anytime to roll in a fixpack that removes a critical bug will be a god send for him and many of us out there.
The IIUG will soon be posting its new Features Survey which we will then hand to IBM to help them design the next release of Informix Dynamic Server. If you don't think that you have a voice in how IBM determines features for Informix, think again. Every one of the features mentioned here was requested by users like you/me/us either directly or through an expressed need that had to be filled. Several are features that were asked for and received high marks on the previous Features Surveys. So, when you see the Insider or email announcement of the Survey, fill it out.
In the next release I'd like to see:
- Transaction survival when the server your session is connected to crashes!
- Bitmap indexes with multiple bitmaps used to generate a net bitmap of satisfying rows.
What would you like to see? Comments welcome.
Subscribe to:
Posts (Atom)