Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8529559
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T09:03:31+00:00 2026-06-11T09:03:31+00:00

I have a small(ish) aggregated data set in Netezza, about 10m rows, on a

  • 0

I have a small(ish) aggregated data set in Netezza, about 10m rows, on a TwinFin 6.

To simplify the question a bit, I’ve cut down the number of columns:

CUSTOMER_SALES_AGG

CUSTOMER_ID
NUMBER_TRANS
TOTAL_DOLLARS
TOTAL_ITEMS

This table is distributed on CUSTOMER_ID, with 1 row per customer ID, collecting the total transactions the customer has made, the total dollars they’ve spent, and the # of items that they’ve purchased.

I’m attempting to calculate the decile ranking of each customer across all customers, by # transactions, total $ spent, and total items bought. E.G. if a customer spent >= 90% of other customers, they would rank in the 1st decile.

I’ve built a query:

SELECT
    CUSTOMER_ID, 
    NUMBER_TRANS,
    NTILE(10) OVER(ORDER BY NUMBER_TRANS DESC NULLS LAST) as TRANS_DECILE,
    TOTAL_DOLLARS,
    NTILE(10) OVER(ORDER BY TOTAL_DOLLARS DESC NULLS LAST) as DOLLARS_DECILE,
    TOTAL_ITEMS,
    NTILE(10) OVER(ORDER BY TOTAL_ITEMS DESC NULLS LAST) as ITEMS_DECILE
FROM CUSTOMER_SALES_AGG;

This works, but it’s very slow, taking nearly 10-20 minutes to run.

Since doing a decile computation requires sorting the data and then dividing that sorted data into groups, it seems like the MPP structure of Netezza would handle this very well. If I was partitioning the deciles I could redistribute and do the ranking on each SPU, it could be even faster.

Any ideas on how to speed this up?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T09:03:32+00:00Added an answer on June 11, 2026 at 9:03 am

    It appears that the main problem stems from the use of multiple analytic functions (NTILE) in the same SQL statement (my actual statement ranks the customers in 7 different ways).

    From what I can tell, Netezza does, as @GordonLinoff explains in the comments, a quicksort on each processor and a final quicksort on the controller system (the Netezza Host). However, it only does this once, and then, as he guessed – pushes it all to the controller system.

    It proceeds to quicksort the data on the controller system for the remaining analytic functions, not using parallelism at all. I would expect that instead it ought to sort the data each way, on each processor, do the final sort on the host, then push the data back down to the processors for a final hash join of each column.

    I ended up creating a query something like this.

    WITH 
    NT AS (
      select customer_id, 
             number_trans,
             ntile(10) over (order by number_trans) as trans_decile
    ),
    TD AS (
      select customer_id, 
             total_dollars,
             ntile(10) over (order by total_dollars) as dollars_decile
    ),
    NI AS (
      select customer_id, 
             total_items,
             ntile(10) over (order by total_items) as items_decile
    )
    SELECT
        NT.CUSTOMER_ID, NT.NUMBER_TRANS, NT.TRANS_DECILE,
        TD.TOTAL_DOLLARS, TD.DOLLARS_DECILE,
        NI.TOTAL_ITEMS, NI.ITEMS_DECILE
    FROM NT
    JOIN TD ON (NT.CUSTOMER_ID = TD.CUSTOMER_ID)
    JOIN NI ON (NT.CUSTOMER_ID = NI.CUSTOMER_ID);
    

    This query’s plan is much more complex, but for my case where i was doing 7 analytic rankings, it took query time down from 12 minutes to a bit less than 5 minutes.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Possible Duplicate: Functionality of PHP get_class For a small ORM-ish class-set, I have the
I have small question by entity framework 4.1. There is interface of image DTO:
I have small question. When I started to programm my application, I grouped types
I have a collection of about 10,000 small VBScript programs (50-100 lines each) and
I have a large-ish file (4-5 GB compressed) of small messages that I wish
I have an optimization question. It is only somewhat traveling-salesman-ish. Lets say I have
I have developed a small-ish C# console application (TextMatcher.exe) on my local development machine
have small problem, and would very much appreciate help :) I should convert byte
I have small query today I happen to see when I typed a website
I have small script in bash, which is generating graphs via gnuplot. Everything works

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.