Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 769723
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T18:22:56+00:00 2026-05-14T18:22:56+00:00

I have two tables, both having more than 20 million records; table1 is a

  • 0

I have two tables, both having more than 20 million records; table1 is a list of terms, and table2 is a list of keywords that may or may not appear in those terms. I need to identify the terms that contain a keyword.
The ‘term’ field is a VARCHAR(320) and the ‘keyword’ field is a VARCHAR(64).

My current strategy is:

SELECT table1.term, table2.keyword FROM table1 INNER JOIN table2 ON table1.term 
LIKE CONCAT('%', table2.keyword, '%');

This is not working, it takes f o r e v e r.
It’s not the server, afaict (see notes).

How might I rewrite this so that it runs in under a day?

I have entertained in-memory tables, or changing to innodb and making the buffer pool big enough to hold both tables. Unfortunately, each mysql thread is bound to one cpu, but I have 4 cores (well, “8” with hyperthreading); if I could distribute the workload, that would be fantastic.

Notes:

  1. Regarding server optimization: both tables are myisam and have unique indexes on the matching fields; the myisam key buffer is greater than the sum of both index file sizes, and it is not even being fully taxed (key_blocks_unused is … large); the server is a 2x dual core xeon 2U beast with fast sas drives and 8G of ram, tuned for the mysql workload.

  2. I just remembered that I only index the first 80 characters of the ‘term’ field (to save disk space); not sure if this is hurting or helping.

  3. MySQL 5.0.32, Debian Lenny x86_64

  • 1 1 Answer
  • 3 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T18:22:57+00:00Added an answer on May 14, 2026 at 6:22 pm

    You want to set up a full-text index, then do a search against that. Right now, your unique index probably isn’t helping the search at all (because of the leading ‘%’ in the search).

    That means, it’s almost certainly running a full scan of table1 for each item in table2. Calling that grossly inefficient is putting it nicely. Building a full-text index is somewhat slow (though probably faster than what you’re doing right now) but once that’s done, the searching should go a lot faster.

    As to what to use to do the full-text indexing: while MySQL has a built-in full-text indexing capability, I doubt it’ll help you a lot — with 20 million rows, its performance is pretty poor (at least in my experience). Sphinx is a bit more work to set up, but is a lot more likely to give you adequate performance.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have two tables table1 = records, table2 = duplicates. Both tables contain a
I have two tables table1 and table2. Table2 is having less number of rows
I have two tables, both having column a device_id column that I want to
We have two tables in our application that both have a ShowOrder column. We
I have two tables both having same column and but different no of rows.
I have a requirement. I have two tables say TableA and TableB. Both having
i have two tables both are related with primary-foreign key ralation and i have
I have two tables. Both tables are editable and both tables should allways be
I have two tables invoices and pending_payments both of which have the following rows
I have two tables, both named as say, Employee in two different schema HR

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.