Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3794718
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 19, 20262026-05-19T12:58:21+00:00 2026-05-19T12:58:21+00:00

Imagine a table with the following structure on PostgreSQL 9.0: create table raw_fact_table (text

  • 0

Imagine a table with the following structure on PostgreSQL 9.0:

create table raw_fact_table (text varchar(1000));

For the sake of simplification I only mention one text column, in reality it has a dozen. This table has 10 billion rows and each column has lots of duplicates. The table is created from a flat file (csv) using COPY FROM.

To increase performance I want to convert to the following star schema structure:

create table dimension_table (id int, text varchar(1000));

The fact table would then be replaced with a fact table like the following:

create table fact_table (dimension_table_id int);

My current method is to essentially run the following query to create the dimension table:

Create table dimension_table (id int, text varchar(1000), primary key(id));

then to create fill the dimension table I use:

insert into dimension_table (select null, text from raw_fact_table group by text);

Afterwards I need to run the following query:

select id into fact_table from dimension inner join raw_fact_table on (dimension.text = raw_fact_table.text);

Just imagine the horrible performance I get by comparing all strings to all other strings several times.

On MySQL I could run a stored procedure during the COPY FROM. This could create a hash of a string and all subsequent string comparison is done on the hash instead of the long raw string. This does not seem to be possible on PostgreSQL, what do I do then?

Sample data would be a CSV file containing something like this (I use quotes also around integers and doubles):

"lots and lots of text";"3";"1";"2.4";"lots of text";"blabla"
"sometext";"30";"10";"1.0";"lots of text";"blabla"
"somemoretext";"30";"10";"1.0";"lots of text";"fooooooo"
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-19T12:58:22+00:00Added an answer on May 19, 2026 at 12:58 pm

    Just to questions:
    – it neccessary to convert your data in 1 or 2 steps?
    – May we modify the table while converting?

    Running more simplier queries may improve your performance (and the server load while doing it)

    One approach would be:

    1. generate dimension_table (If i understand it correctly, you don’t have performance problems with this) (maybe with an additional temporary boolean field…)
    2. repeat: choose one previously not selected entry from dimension_table, select every rows from raw_fact_table containing it and insert them into fact_table. Mark dimension_table record as done, and next… You can write this as a stored procedure, and it can convert your data in the background, eating minimal resources…

    Or another (probably better):

    1. create fact_table as EVERY record from raw_fact_table AND one dimension_id. (so including dimension_text and dimension_id rows)
    2. create dimension_table
    3. create an after insert trigger for fact_table which:
      • searches for dimension_text in fact_table
      • if not found, creates a new record in dimension_table
      • updates dimension_id to this id
    4. in a simle loop, insert every record from raw_fact_table to fact_table
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Imagine the following tables: create table boxes( id int, name text, ...); create table
Consider the following structure : alt text http://aeon-dev.org/pap/pap_db.png Ignore the table user_token. Now, imagine
I have a MySQL 5.1 InnoDB table ( customers ) with the following structure:
I've got a table, called faq_questions with the following structure: id int not_null auto_increment,
Simplified Table structure: CREATE TABLE IF NOT EXISTS `hpa` ( `id` bigint(15) NOT NULL
Imagine I have table like this: id:Product:shop_id 1:Basketball:41 2:Football:41 3:Rocket:45 4:Car:86 5:Plane:86 Now, this
What is the best way to track changes in a database table? Imagine you
I have a problem. Imagine this data model: [Person] table has: PersonId, Name1 [Tag]
Imagine I have an function which goes through one million/billion strings and checks smth
hopefully somebody can help The table structure is as follows: tblCompany: compID compName tblOffice:

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.