Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 576685
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T14:04:25+00:00 2026-05-13T14:04:25+00:00

I am a newbie working in a simple Rails app that translates a document

  • 0

I am a newbie working in a simple Rails app that translates a document (long string) from a language to another. The dictionary is a table of terms (a string regexp to find and substitute, and a block that ouputs a substituting string). The table is 1 million records long.

Each request is a document that wants to be translated. In a first brutish force approach I need to run the whole dictionary against each request/document.

Since the dictionary will run whole every time (from the first record to the last), instead of loading the table of records of the dictionary with each document, I think the best would be to have the whole dictionary as an array in memory.

I know it is not the most efficient, but the dictionary has to run whole at this point.

1.- If no efficiency can be gained by restructuring the document and dictionary (meaning it is not possible to create smaller subsets of the dictionary). What is the best design approach?

2.- Do you know of similar projects that I can learn from?

3.- Where should I look to learn how to load such a big table into memory (cache?) at rails startup?

Any answer to any of the posed questions will be greatly appreciated. Thank you very much!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T14:04:26+00:00Added an answer on May 13, 2026 at 2:04 pm

    I don’t think your web hoster will be happy with a solution like this. This script

    dict = {}
    (0..1000_000).each do | num |
        dict[/#{num}/] = "#{num}_subst"
    end
    

    consumes a gigabyte of RAM on my MBP for storing the hash table. Another approach will be to store your substitutions marshaled in memcached so that you could (at least) store them across machines.

    require 'rubygems'
    require 'memcached'
    @table = Memcached.new("localhost:11211")
    
    retained_keys = (0..1000_000).each do | num |
      stored_blob = Marshal.dump([/#{num}/, "#{num}_subst"])
      @table.set("p#{num}", stored_blob)
    end
    

    You will have to worry about keeping the keys “hot” since memcached will expire them if they are not needed.

    The best approach however, for your case, would be very simple – write your substitutions to a file (one line per substitution) and make a stream-filter that reads the file line by line, and replaces from this file. You can also parallelize that by mapping work on this, say, per letter of substitution and replacing markers.

    But this should get you started:

      require "base64"
    
      File.open("./dict.marshal", "wb") do | file |
        (0..1000_000).each do | num |
          stored_blob = Base64.encode64(Marshal.dump([/#{num}/, "#{num}_subst"]))
          file.puts(stored_blob)
        end
      end
    
      puts "Table populated (should be a 35 meg file), now let's run substitutions"
    
      File.open("./dict.marshal", "r") do | f |
        until f.eof?
          pattern, replacement = Marshal.load(Base64.decode64(f.gets))
        end
      end
    
      puts "All replacements out"
    

    To populate the file AND load each substitution, this takes me:

     real    0m21.262s
     user    0m19.100s
     sys     0m0.502s
    

    To just load the regexp and the string from file (all the million, piece by piece)

     real    0m7.855s
     user    0m7.645s
     sys     0m0.105s
    

    So this is 7 seconds IO overhead, but you don’t lose any memory (and there is huge room for improvement) – the RSIZE is about 3 megs. You should easily be able to make it go faster if you do IO in bulk, or make one file for 10-50 substitutions and load them as a whole. Put the files on an SSD or a RAID and you got a winner, but you get to keep your RAM.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Rails 3 newbie here... I'm working to create a devise auth system that like
I am a newbie working towards developing an IE extension that would appear as
I am kind of newbie on C++, and working on a simple program on
Hibernate newbie here. I am working on a simple Hibernate mapping file. When I
Hallo experts, A simple question from me, an old newbie to Tortoise SVN. I'm
i am newbie to Ext Js..i am working on ruby on rails... can any1
I'm admittedly a straight-C newbie, but this has got me stumped. I'm working on
Newbie question. I have a NSMutableArray that holds multiple objects (objects that stores Bezier
Java Newbie here. I have a JFrame that I added to my netbeans project,
.NET newbie here... I'd like to make a button in a Windows form that

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.