Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6563635
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T13:52:26+00:00 2026-05-25T13:52:26+00:00

I have wrote a few simple Rails application, accessing the database via the ActiveRecord

  • 0

I have wrote a few simple Rails application, accessing the database via the ActiveRecord abstraction, so I am afraid that I don’t know much about the inner working of PostgreSQL engine.
However, I am writing a Rails app that need to support 100000+ rows with dynamically updated content and am wondering whether I am using the random functions efficiently:

Database migration schema setting:
t.float: attribute1
t.integer: ticks

add_index :mytable, :attribute1
add_index :mytable, :ticks

Basically, I want to have the following random function distribution:

a) row that has the top 10% value in attribute1 = 30% chance of being selected

b) the middle 60% (in attribute1) row = 50% chance of being selected

c) the lowest 30% (in attribute1) with less than 100 ticks = 15% chance of being selected,

d) and for those with the lowest 30% attribute1 that have more than X (use X = 100 in this question) ticks = 5% chance of being selected.

At the moment I have the following code:

@rand = rand()
if @rand>0.7
    @randRowNum = Integer(rand(0.1) * Mytable.count )
    @row = Mytable.find(:first, :offset =>@randRowNum , :order => '"attribute1" DESC')
else
    if @rand>0.2
        @randRowNum = Integer((rand(0.6)+0.1) * Mytable.count)
        @row= Mytable.find(:first, :offset =>@randRowNum , :order => '"attribute1" DESC')
    else
        @row= Mytable.find(:first, :offset =>Integer(0.7 * Mytable.count), :order => '"attribute1" DESC')
        if !(@row.nil?) 
            if (@rand >0.05)
                @row= Mytable.find(:first, :order => 'RANDOM()', :conditions => ['"attribute1" <= '"#{@row.attribute1}", '"ticks" < 100' ] )
            else
                @row= Mytable.find(:first, :order => 'RANDOM()', :conditions => ['"attribute1" <= '"#{@row.attribute1}", '"ticks" >= 100' ] )
            end
        end
    end
end

1) One thing I would like to do is avoid the use of :order => ‘RANDOM()’ as according to my research, it seems that each time it is called, involves the SQL engine first scanning through all the rows, assigning them a random value. Hence the use of @randRowNum = Integer(rand(0.1) * Mytable.count ) and offset by @randRowNum. for a) and b). Am I actually improving the efficiency? Is there any better way?

2) Should I be doing the same as 1) for c) and d), and is by using the :conditions => [‘”attribute1″ <= ‘”#{@row.attribute1}”, ‘”ticks” >= 100’ ], am I forcing the SQL engine to scan through all the rows anyway? Is there anything apart from indexing that I can improve the efficiency of this call (with as little space/storage overhead as possible too)?

3) There is a chance that the total number of entries in Mytable may have been changed between the Mytable.count and Mytable.find calls. I could put the two calls within a transaction, but it seems excessive to lock that entire table just for a read operation (at the moment, I have extra code to fall back to a simple random row selection if I got a @row.nil from the above code). Is it psosible to move that .count call within a single atomic SQL query in Rails? Or would it have the same efficiently as locking via transaction in Rails?

4) I have also been reading up on stored procedure in PostgreSQL… but for this particular case, is there any gain in efficiency to be achieved, worth moving the code from Activerecording abstraction to stored procedure?

Many thanks!

P.S. development/deployment environment:
Rube 1.8.7
Rails 2.3.14
PostgreSQL 8 (on Heroku)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T13:52:26+00:00Added an answer on May 25, 2026 at 1:52 pm

    Your question seems a bit vague, so correct me if my interpretation is wrong.

    If you didn’t split (c) and (d), I would just convert the uniform random variable over [0,1) to a biased random variable over [0,1) and use that to select your row offset.

    Unfortunately, (c) and (d) is split based on the value of “ticks” — a different column. That’s the hard part, and also makes the query much less efficient.

    After you fetch the value of attribute1 at 70% from the bottom, also fetch the number of (c) rows; something like SELECT COUNT(*) FROM foo WHERE attribute1 <= partiton_30 AND ticks < 100. Then use that to find the offset into either the ticks < 100 or the ticks >= 100 cases. (You probably want an index on (attribute1, ticks) or something; the order which is best depends on your data).

    If the “ticks” threshold is known in advance and doesn’t need to change often, you can cache it in a column (BOOL ticks_above_threshold or whatever) which makes the query much more efficient if you have an index on (ticks_above_threshold, attribute1) (note the reversal). Of course, every time you change the threshold you need to write to every row.

    (I think you can also use a “materialized view” to avoid cluttering the main table with an extra column, but I’m not sure what the difference in efficiency is.)

    There are obviously some efficiency gains possible by using stored procedures. I wouldn’t worry about it too much, unless latency to the server is particularly high.

    EDIT:

    To answer your additional questions:

    1. Indexing (BOOL ticks_above_threshold, attribute1) should work better than (ticks, attribute1) (I may have the order wrong, though) because it lets you compare the first index column for equality. This is important.

      Indexes generally use some sort of balanced tree to do a lookup on what is effectively a list. For example, take A4 B2 C3 D1 (ordered letter,number) and look up “number of rows with letter greater than B and number greater than 2”. The best you can do is start after the Bs and iterate over the whole table. If you order by number,letter, you get 1D 2B 3C 4A. Again, start after the 2s.

      If you instead your index is on is_greater_than_2,letter, it looks like false,B false,D true,A true,C. You can ask the index for the position of (true,B) — between true,A and true,C — and count the number of entries until the end. Counting the number of items between two index positions is fast.

      Google App Engine’s Restrictions on Queries goes one step further:

      Inequality Filters Are Allowed on One Property Only

      A query may only use inequality filters (<, <=, >=, >, !=) on one property across all of its filters.

      …

      The query mechanism relies on all results for a query to be adjacent to one another in the index table, to avoid having to scan the entire table for results. A single index table cannot represent multiple inequality filters on multiple properties while maintaining that all results are consecutive in the table.

    2. If none of you other queries benefit from an index on ticks, then yes.

      In some cases, it might be faster to index a instead of a,b (or b,a) if the clause including b is almost always true and you’re fetching row data (not just getting a COUNT()) (i.e. if 1000 <= a AND a <= 1010 matches a million rows and b > 100 only fails for two rows, then it might end up being faster to do two extra row lookups than to work with the bigger index).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have got a Wavecom Supreme GSM modem. I wrote a simple application that
I have a Windows Forms application that I wrote that does some monitoring of
Pretty simple question this time around. I have an application that communicates with another
I have this Array i wrote a function MostFreq that takes an array of
I have a Python script I recently wrote that I call using the command
I have a macro that I wrote to just help me with my unit
I have several functions that I wrote and I use regularly on my servers,
I have a Perl application someone else wrote and I'm trying to make it
I have a start of a webapp that I wrote without using the Object
I have several small open-source projects that I wrote. All my attempts to find

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.