Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9113773
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T04:02:34+00:00 2026-06-17T04:02:34+00:00

I have two tables: DATA DATA_ID | SAMPLE_ID | ASSAY_ID | SIGNAL 101 |

  • 0

I have two tables:
DATA

DATA_ID  |  SAMPLE_ID  |  ASSAY_ID  |  SIGNAL
101      |  201        |  301       |  2.87964
102      |  201        |  302       |  7.64623
103      |  202        |  301       |  1.98473
...

And SAMPLES:

SAMPLE_ID  |  SAMPLE_NAME  |  CATEGORY
201        |  SAMP0001     |  CAT A  
202        |  SAMP0002     |  CAT B
203        |  SAMP0003     |  CAT A
...

There are about 20,000 rows in SAMPLES. For each sample, there are about 40,000 rows in DATA. Each ASSAY_ID occurs exactly once per sample in DATA. I need to take a subset of the samples in SAMPLE and calculate a standard/z-score value for each signal value in DATA, grouping by ASSAY_ID. I am trying to create a stored procedure that will be called repeatedly, which will accept a single ASSAY_ID value and return SAMPLE_ID and ZSCORE pairs for all of the samples in the predefined sample subset.

Given a set of sample signal values (X = [3.21, 4.56, 1.12, ..]) for a given assay, the standard/z-score in this case is calculated as

(X[i] - median(X))/(K * MAD)

Where K is a scale factor equal to 1.4826 and MAD is the median adjusted deviation, equal to:

median(|X[i]-median(X)|)

Got that? Good 🙂 Now, what is the most efficient way to perform this calculation using a SQL query? Execution time is key, given that there are close to a billion rows in DATA and a z-score needs to be calculated for almost every SIGNAL value.

Here is the best query I have been able to come up with so far:

WITH BASE AS (
    SELECT 
        S.SAMPLE_ID,
        D.SIGNAL
    FROM
        DATA D
        JOIN SAMPLES S
            ON D.SAMPLE_ID = S.SAMPLE_ID
    WHERE 
        S.CATEGORY IN ('CAT A', 'CAT B')
        AND D.ASSAY_ID = 12345
        AND S.SAMPLE_NAME NOT IN ('SAMP0003', 'SAMP0005', 'SAMP0008')          
)
SELECT  
    A.SAMPLE_ID,
    (A.SIGNAL-B.MED)/(1.4826*C.MAD) AS ZSCORE
FROM 
    BASE A,
    (
        SELECT MEDIAN(X.SIGNAL) AS MED 
        FROM BASE X
    ) B,
    (
        SELECT MEDIAN(ABS(Y.SIGNAL-YY.MED)) AS MAD 
        FROM BASE Y, 
        (SELECT MEDIAN(SIGNAL) AS MED FROM BASE) YY
    ) C 

Is there a more efficient way to perform this query?

Bonus Question: Can I write a single SQL query that would perform this calculation for EVERY ASSAY_ID in a single execution?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T04:02:35+00:00Added an answer on June 17, 2026 at 4:02 am

    Can you have a look at:

    SELECT ASSAY_ID, SAMPLE_ID, 
           (SIGNAL - MED)/(1.4826F * MAD) AS ZSCORE
      FROM (
            SELECT ASSAY_ID, SAMPLE_ID, SIGNAL, MED,
                   MEDIAN(ABS(SIGNAL - MED)) OVER (PARTITION BY ASSAY_ID) AS MAD
              FROM (
                    SELECT ASSAY_ID, SAMPLE_ID, SIGNAL,
                           MEDIAN(SIGNAL) OVER (PARTITION BY ASSAY_ID) AS MED
                      FROM DATA    D
                      JOIN SAMPLES S USING (SAMPLE_ID)
                     WHERE S.CATEGORY IN ('CAT A', 'CAT B')
                       AND S.SAMPLE_NAME NOT IN ('SAMP0003', 'SAMP0005', 'SAMP0008')  
                       AND D.ASSAY_ID = 301
                   )
           );
    

    Is it correct? Is it faster? If it is, just remove the AND D.ASSAY_ID = 301 clause for the bonus question 🙂

    On the physical side, I would look into the data type for signal (BINARY_FLOAT or BINARY_DOUBLE are supposedly faster than NUMBER). And, if this is an option, I’d try to physically collocate the assays with partitions.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a database with two tables: data and file . file_id is a
Lets say I have data in two tables. In one I have Order ID
i have two data tables with the same structure the first one has one
1I have the following two tables (sample data) and need to be able to
I have two Tables: Table 1: Questions : QuestionId NUMERIC Title TEXT Test Data
I have two tables set up in phpmyadmin- table userid and table data. The
I have two tables with the following (simplified) structures: table "Factors" which holds data
I am trying to import data from a large database. I have two tables
I have two tables, items & categories, sample data below: Items: Title category_id Item
I have two tables Team_DATA and Driver_PROFILE_DATA in an SQL database. For every driver_profile

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.