It’s my understanding that nulls are not indexable in DB2, so assuming we have a huge table (Sales) with a date column (sold_on) which is normally a date, but is occasionally (10% of the time) null.
Furthermore, let’s assume that it’s a legacy application that we can’t change, so those nulls are staying there and mean something (let’s say sales that were returned).
We can make the following query fast by putting an index on the sold_on and total columns
Select * from Sales where Sales.sold_on between date1 and date2 and Sales.total = 9.99
But an index won’t make this query any faster:
Select * from Sales where Sales.sold_on is null and Sales.total = 9.99
Because the indexing is done on the value.
Can I index nulls? Maybe by changing the index type? Indexing the indicator column?
I’m no DB2 expert, but if 10% of your values are null, I don’t think an index on that column alone will ever help your query. 10% is too many to bother using an index for — it’ll just do a table scan. If you were talking about 2-3%, I think it would actually use your index.
Think about how many records are on a page/block — say 20. The reason to use an index is to avoid fetching pages you don’t need. The odds that a given page will contain 0 records that are null is (90%)^20, or 12%. Those aren’t good odds — you’re going to need 88% of your pages to be fetched anyway, using the index isn’t very helpful.
If, however, your select clause only included a few columns (and not *) — say just salesid, you could probably get it to use an index on (sold_on,salesid), as the read of the data page wouldn’t be needed — all the data would be in the index.