Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3403738
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T05:17:51+00:00 2026-05-18T05:17:51+00:00

What are the range of tactics available for selecting records on low selectivity columns?

  • 0

What are the range of tactics available for selecting records on low selectivity columns?

An example might be an orders table where, over many years, you build up a large number of completed orders but often need to select active orders. An order might go through a lifecycle such as placed, stock-allocated, picked from warehouse, despatched to customer, invoiced and paid. An order might additionally be cancelled, held, etc. The majority of records will eventually be in the final state (e.g. paid) but you might often need to select, say, allocated orders. In this case a sequential read would be slow.

Similar questions on indexing
MySQL: low cardinality/selectivity columns = how to index?
Do indexes suck in SQL?
What are indexes and how can I use them to optimize queries in my database?
Defining indexes: Which Columns, and Performance Impact?
and numerous others decreasingly related.

The approaches I have read about (in stackoverflow and elsewhere) include

  • Use a bitmap index
  • Use a partial index (create index x on t(c2) where c1='a')
  • Use a clustered index?
  • Don’t index low selectivity columns, use sequential read
  • Partition the data (e.g. into several tables with identical schema)
  • Use a supplementary table (e.g. active_customers(customer_id)

My current DBMS doesn’t support the first three options listed above and the remainder seem problematic – are there any other commonly used approaches?

Update: I’ve seen
– index your low-selectivity column, but only ever select for high-selectivity values.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T05:17:52+00:00Added an answer on May 18, 2026 at 5:17 am

    I agree with Unreason’s However branch. But there are some things to know about this case.

    This is called skew and skew kills. This is a perfect use for a partial index where you’d exclude the 95% of paid invoices and only index the more interesting and selective stats. But you don’t have that. You can horizontally partition all the rows into separate table/partitions but then you need to account for row migration (moving from one status to another) and that’s expensive. The DBMS has to perform an Update, a Delete and an insert to change the status. If you’re a high volume system that will hurt.

    Forget what you said about whether or not to index based on selectivity because putting an index on a rapidly changing column is also usually a bad idea. Your index will have hot blocks where all the step 1’s are being removed and another where all the step 2’s are being inserted and oh btw, some step 2’s are being removed at the same time into step 3’s. This won’t scale well.

    I would recommend vertically partitioning your status into a separate table(s).

    Your invoice table will have a PK and all the columns except status.

    Your status you can handle two ways. That table will have the PK value as an FK back to the invoice table, the Status and a timestamp for when you entered that status. The best is a horizontally partitioned table on status. You’ll have a partition for each status possible. So finding all or one “Placed” status will partition prune and read only the partition it needs – which is a very small number of blocks. Because the row is so narrow, you might get 400 invoice statuses on a single block. Looking up that status of any one invoice is easy since there’s a global index on the PK.

    If your RDBMS doesn’t support partitioning with row migration, you’ll need to manage these partitions as tables and delete from one and insert into another. You’ll encapsulate these movements in a transaction in a procedure, so you keep the data clean. Every invoice is in one and only one status table. The harder part is querying by invoice ID, you’ll have to check every table to see where it is.

    You have another choice
    You can either write paid statuses or not. If it’s a partitioned table, you can just delete the invoice from the invoice status table when it moves to paid. (Of course you’ll write a paid record to the history table mentioned in the bonus material). Then you’ll do an outer join to the status table and nulls mean paid. If you almost never query for paid status, there’s really no reason to make that a fast query.

    Bonus Material

    in either case you’ll want to keep track of these movements in a reporting table. Everytime you update a status, you’ll want to write that to a history table. Eventually you’ll want to analyze what I call transit times. What’s the average time from filled to paid, by month? Is that increasing as a result of the bad economy? what’s the transit time from placed to filled, by month. Do the summer months take longer because of missing bodies on vacation? you get the point. By updating that column you’re losing those answers, so you’ll need to embed that history log into your procedures.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I've created list-range partitioned table: CREATE TABLE WHREST_PRT( RCNUM NUMBER NOT NULL, WHNUM NUMBER
What are the range of options available for implementing a Download as PDF feature
I have a years range stored into two variables. I want to create an
Example I have: range = start.to_date..(end.to_date + 1.day) end and start are dates. How
I want to get range ranking from a table using mysql query. the table
If I have a Range object--for example, let's say it refers to cell A1
Given a date range, I need to know how many Mondays (or Tuesdays, Wednesdays,
A wide range of structures is used in Win32 programming. Many times only some
I have a date range where a start date is 2 years prior to
You can specify a range of lines to operate on. For example, to operate

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.