Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7675635
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T17:00:17+00:00 2026-05-31T17:00:17+00:00

The problem is with respect to the writing speed of the computer (10 *

  • 0

The problem is with respect to the writing speed of the computer (10 * 32 bit machine) and the postgresql query performance.I will explain the scenario in detail.

I have data about 80 Gb (along with approprite database indexes in place). I am trying to read it from Postgresql database and writing it into HDF5 using Pytables.I have 1 table and 5 variable arrays in one hdf5 file.The implementation of Hdf5 is not multithreaded or enabled for symmetric multi processing.I have rented about 10 computers for a day and trying to write them inorder to speed up my data handling.

As for as the postgresql table is concerned the overall record size is 140 million and I have 5 primary- foreign key referring tables.I am not using joins as it is not scalable

So for a single lookup i do 6 lookup without joins and write them into hdf5 format.
For each lookup i do 6 inserts into each of the table and its corresponding arrays.

The queries are really simple

select * from x.train where tr_id=1 (primary key & indexed)
select q_t from x.qt where q_id=2 (non-primary key but indexed) 

(similarly five queries)

Each computer writes two hdf5 files and hence the total count comes around 20 files.

Some Calculations and statistics:

Total number of records : 14,37,00,000
Total number of records per file : 143700000/20 =71,85,000 
The total number of records in each file : 71,85,000 * 5 = 3,59,25,000

Current Postgresql database config :

My current Machine : 8GB RAM with i7 2nd generation Processor.

I made changes to the following to postgresql configuration file :
shared_buffers : 2 GB
effective_cache_size : 4 GB

Note on current performance:

I have run it for about ten hours and the performance is as follows:
The total number of records written for each file is about 6,21,000 * 5 = 31,05,000

The bottle neck is that i can only rent it for 10 hours per day (overnight) and if it processes in this speed it will take about 11 days which is too high for my experiments.

Please suggest me on how to improve.
Questions:
1. Should i use Symmetric multi processing on those desktops(it has 2 cores with about 2 GB of RAM).In that case what is suggested or prefereable?
2. If i change my postgresql configuration file and increase the RAM will it enhance my process.
3. Should i use multi threading.. In that case any links or pointers would be of great help

Thanks
Sree aurovindh V

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T17:00:19+00:00Added an answer on May 31, 2026 at 5:00 pm

    Please refer to the following link

    http://sourceforge.net/mailarchive/forum.php?thread_name=CAC4BLaLCMuA6%3DDated_MsPKp5-F_EyKbrUkMWS4g_D7grwpVXQ%40mail.gmail.com&forum_name=pytables-users

    This might be helpful in understanding query efficiency.

    Thanks

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a problem with the $(a).AjaxSubmit of jQuery.Form , with respect of option
Problem: I've written State Machine for my android application. It is separate class, extension
Problem: I have a table that prints out vertical but I would like it
Problem: I have two array where one produce a category and the second produce
Problem Using Director 11.5 and Windows 7, with MouseWheel Xtra (wheelmouse.zip), I have the
first of all I hope I will respect the netiquette of this forum, since
I'm running into a problem with Jackson where it does not respect the @JsonTypeInfo
my problem is here: I have some class public class Component { ... private
Problem: I have two arrays that can possibly be different lengths. I need to
I'm working on an image processing application, and I have the problem that I'd

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.