In a Ruby project that I have been spending some time on lately, I

Question

0

Asked: May 28, 20262026-05-28T03:29:38+00:00 2026-05-28T03:29:38+00:00

In a Ruby project that I have been spending some time on lately, I

0

In a Ruby project that I have been spending some time on lately, I have been counting the intersection of two large sets of strings.

From what I thought I understood, I decided that it would make a lot of sense to compare integers instead of strings (all of these strings are being held in a database, and i could easily just swap them out for ids)

When i actually did the benchmarking, i ended up finding the complete opposite.

First i generated sets of 850 strings, and sets of ~850 large integers:

r = Random.new
w1 = (1..850).collect{|i| w="";(0..3).collect{|j| (rand*26 + 10).to_i.to_s(35)}.each{|l| w+=(l.to_s)};w}.to_set
w2 = (1..850).collect{|i| w="";(0..3).collect{|j| (rand*26 + 10).to_i.to_s(35)}.each{|l| w+=(l.to_s)};w}.to_set

i1 = (1..2000).collect{|i| (r.rand*1000).to_i**2}.to_set;
i2 = (1..2000).collect{|i| (r.rand*1000).to_i**2}.to_set;

And then i timed the comparisons:

t=Time.now;(0..1000).each {|i| w1 & w2};Time.now-t
=> 0.301727
t=Time.now;(0..1000).each {|i| i1 & i2};Time.now-t
=> 0.70151

Which i thought was crazy! I always thought integer comparison was much faster..

So i was wondering if anybody in the world of stacks knew anything about why the string comparison is so much faster in ruby, i would really appreciate hearing your thoughts.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T03:29:39+00:00

The speed of the set intersection operation appears to be affected by the number of intersecting elements.

Your integer creation code is creating a substantially larger number of intersecting elements, probably because it’s selecting 2000 entries from a smaller set (1000).

In one test, for example, 755 of the 857 entries in i1 were duplicated in i2, but only 2 of the 849 entries in w1 were duplicated in w2.

When I ran a simple alteration:

755.times {|x| w2 << w1.to_a[x]}

(dumping 755 items into w2 that are known to be in w1), the results on my system showed the string set operation to be much closer to the equivalent integer operation.

My original results were:

1.9.2p180 :006 > t=Time.now;(0..1000).each {|i| w1 & w2};Time.now-t
 => 1.020355
1.9.2p180 :007 > t=Time.now;(0..1000).each {|i| i1 & i2};Time.now-t
 => 2.057535

My results after making the two sets of sets more alike in terms of intersecting elements, via:

1.9.2p180 :051 > 755.times {|x| w2 << w1.to_a[x]}
1.9.2p180 :052 > w2 = w2.to_a[-849..-1].to_set

were:

1.9.2p180 :053 > t=Time.now;(0..1000).each {|i| w1 & w2};Time.now-t
 => 2.014967 
1.9.2p180 :054 > t=Time.now;(0..1000).each {|i| i1 & i2};Time.now-t
 => 2.037542
1.9.2p180 :055 > [i1.length, i2.length, w1.length, w2.length, (i1 & i2).length, (w1 & w2).length]
 => [857, 884, 849, 849, 755, 754]

I hope that helps some; the two timings are within what I would consider a margin of error that other things on the system could be causing the difference. They are, essentially, equal for strings of this length.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In a Ruby project that I have been spending some time on lately, I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply