Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3437734
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T08:06:05+00:00 2026-05-18T08:06:05+00:00

This is a long question that I do not know how to summarize… I

  • 0

This is a long question that I do not know how to summarize…

I have a table that I need to read from that has financial data with almost a billion records of detailed data in it. I cannot change the structure of this table, I am merely a consumer of it. This table has columns such as a transaction data, a bunch of attribute columns with Int data in them describing the transaction (not named Attribute1-20, just named that way for simplicity below) and then an Amount column.

TABLE: FinancialData
COLUMNS:
  Id (BigInt IDENTITY)
  TransactionId (Int FK)
  TransactionDate (DateTime)
  Attribute1 (Int)
  Attribute2 (Int)
  .
  .
  Attribute20 (Int)
  Amount (Decimal)

I have a process that needs to summarize this FinancialData table into 2 database tables (one a header table and another a detail table with aggregated amounts) for a user-defined time-line so that a snapshot of the data can be used by other processes. The header table contains one record per user-defined time-line (snapshot) and the detail table contains aggregated amount records across all attributes of the FinancialData table.

TABLE: FinancialHeader 
COLUMNS:
  Id (Int IDENTITY)
  BeginTransactionDate (DateTime)
  EndTransactionData (DateTime)


TABLE: FinancialDetail
COLUMNS:
 Id (Int IDENTITY)
 FinancialHeaderId (Int FK)
 Attribute1 (Int)
 Attribute2 (Int)
 .
 .
 Attribute20 (Int)
 Amount (Decimal)

To give an example of the process, say there are 20 million records in the FinancialData table with a TransactionDate between 1/1/2010 and 6/30/2010 with many redundant attributes (however they would have different TransactionId values). If I were to summarize this data in the FinancialHeader and FinancialDetail tables above, I would create one FinancialHeader record with a BeginTransactionDate of 1/1/2010 and an EndTransactionDate of 6/30/2010 and then multiple FinancialDetail records that are child records to the header. The FinancialDetail table aggregates the 20 million records from FinancialData basically contains a grouping of the unique values of Attribute1 – Attribute20 along with a SUM(Amount) to track the total amount for those attributes. Typically 20 million records in the FinancialData table would contain about 10,000 unique combinations of attributes, which would then yield 10,000 records in the FinancialDetail table with an aggregated amount. So in my example there would be 1 FinancialHeader record and roughly 10,000 FinancialDetail records created during the process.

The question I have relates to storing 20 columns worth of unique combinations of attribute data… this “snapshot” process I am talking about can be ran over and over again by the user any number of times for various date ranges to essentially store amounts for that period in time. So what happens is the FinancialDetail table tends to have a lot of data in it even though it is aggregate data. What I don’t like is the fact that there are 20 columns in the FinancialDetail table I created that I feel may be wasting space. What I was thinking may be a better approach is to store the each unique combination of attributes into a row in yet another table, say called FinancialAttribute, that contains an Id column that can then be used as a look up mechanism for the FinancialDetail table. So the FinancialAttribute table would look like this:

TABLE: FinancialAttribute
COLUMNS: 
  Id (Int IDENTITY)
  Attribute1 (Int)
  Attribute2 (Int)
  .
  .
  Attribute2 (Int)

And the FinancialDetail table would be modified into this:

TABLE: FinancialDetail
COLUMNS:
 Id (Int IDENTITY)
 FinancialHeaderId (Int FK)
 FinancialAttributeId (Int FK)
 Amount (Decimal)   

Is this a pretty common pattern to deal with aggregation across multiple columns/attributes? Or am I thinking about this in completely the wrong way? I need to store the data from the FinancialData table into my own local copy somehow though because there are several downstream processes that need to process or report on these user-defined time-line snapshots of this financial information.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T08:06:05+00:00Added an answer on May 18, 2026 at 8:06 am

    Wordy question, wordy answer! 😉

    I’m no data warehousing expert, so I’m not familiar with the patterns (and anti-patterns) in that area. I’m speaking as just a DB dev who has maybe done something similar.

    In my case, we take snapshots from large source tables of prescription drug info. The snapshots are used for downstream analysis and reporting. The users specify snapshot criteria, like date and drug type, which usually affects 2m (vs your 20m) records. This typically compiles down to 120k (vs. your 10k) records. Snapshots are kept indefinitely, as source tables change over time and are NOT historical. I share your concerns about snapshots pulling and storing redundant information.

    Your question – are you doing something dumb? Is there a better way?

    Conceptually speaking, it’s pretty apparent that your factoring is “safe”. By this, I mean it’s a straightforward transformation that’s obviously correct, and it’s pretty obvious how to map the factored version back into the original with little pain. From that perspective (conceptual ease), I think it has merits.

    As for impact, I’d consider the expected table sizes. My assumptions are:

    • Avg snapshot records = Total unique Attribute combos per snapshot = 10k
    • Total num snapshots over time = 10k
    • Total unique Attribute combos over time = 100k
    • Amount column precision is < 20 digits

    So:

    FinancialDetail (orig)
    Column   | Type     | Avg Size
    -------------------------------
    ID       | int      | 4
    HeaderID | int      | 4
    Amount   | decimal  | 9
    A1 - A20 | int x 20 | 80
    -------------------------------
    Total:                97
    Expected num rows:    100m
    Total expected size:  9GB
    
    
    FinancialDetail (new)
    Column   | Type     | Avg Size
    -------------------------------
    ID       | int      | 4
    HeaderID | int      | 4
    AttribID | int      | 4
    Amount   | decimal  | 9
    -------------------------------
    Total:                21
    Expected num rows:    100m
    Total expected size:  2GB
    
    FinancialAttribute (new)
    Column   | Type     | Avg Size
    -------------------------------
    ID       | int      | 4
    A1 - A20 | int x 20 | 80
    -------------------------------
    Total:                84
    Expected num rows:    100k
    Total expected size:  8MB
    

    If my assumptions are in the ball park (and my math right), you could be saving 78% on space. This doesn’t include space for indexes or table fill slack, so the actual table sizes will be higher.

    Does saving 7GB matter?

    • 9GB is easily managed by a modern disk, and is nothing compared to your master table.
    • The data is (probably) put back together in memory on a query anyway, so maybe no savings over the wire or in caching.
    • Query IO should be a little better, if seaching by Attributes
    • Backup / restore time should be better
    • Time spent reindexing, updating statistics, etc. should be better
    • (Covering) indexes based on Attributes are much smaller
    • The query time to insert into your factored tables will go up

    You can make your own call on this but it seems to me that your factoring could be worthwhile if space is the #1 concern, even if it’s not technically the mostest optimalest solution.

    Speaking of efficiency…

    If you somehow managed to optimize your Attribute space down to 0 bytes, you’d only save another 0.09% off the original. So I wouldn’t futher optimize for space there.

    On the other hand, simply dropping FinancialDetail.ID and using PK on (HeaderID, AttributeID) would save you 4.1% off the original. (Assumption: You don’t have FKs pointing to this table.)

    As far as if there’s a better way – I don’t know. It would depend on how many snapshots you get, how your snapshots are used, and how fast it needs to be.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have been asking myself this question for a long time now. Thought of
I have this long and complex source code that uses a RNG with a
This is a long text. Please bear with me. Boiled down, the question is:
i'm a long-time newbie to c#, and this question may be too obvious, so
I've had this long term issue in not quite understanding how to implement a
I know this is a long shot - but is there any way at
Compiling this lines long int sz; char tmpret[128]; //take substring of c, translate in
I got this idea a long time ago when i saw an app do
This is a bit of a long shot, but if anyone can figure it
A long time ago I saw this trick in Ruby. Instead of doing (for

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.