Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7007863
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T21:39:42+00:00 2026-05-27T21:39:42+00:00

I have a strange performance problem with a query used to create a filter

  • 0

I have a strange performance problem with a query used to create a “filter by tags” widget for Delicious-like bookmarking webapp. The specific, relatively complex query performs much (1000 to 10000 times) faster if run as few, separate queries.

I’ve tested it on following environments:

  • Windows XP / MySQL 5.1.37 (server & client)
  • Ubuntu 11.10 / MySQL 5.1.58 (server & client)

The problem didn’t show up in small, development database. I caught it during production use, after large increase of records in database (currently about 100K rows in link_tags table & 11K unique tags).

I use following DB schema:

CREATE TABLE IF NOT EXISTS `link_tags` (
  `link_id` int(11) NOT NULL,
  `tag_id` int(11) NOT NULL,
  UNIQUE KEY `link_tag_id` (`link_id`,`tag_id`),
  KEY `tag_id` (`tag_id`),
  KEY `link_id` (`link_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

CREATE TABLE IF NOT EXISTS `tags` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `tag` varchar(255) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `tag` (`tag`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

The schema is straightforward (see also http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html), so it shouldn’t require further explanation.

Technically speaking, the problematic query (below) retrieves tags related to given set of tags (specifically, all tags attached to links tagged by specified set of tags) and counts number of links for each found tag AND set of tags.

[ORIGINAL QUERY]

SELECT COUNT(*) AS link_count, tag FROM (
    SELECT
        t.tag AS tag,
        CONCAT(lt.tag_id,':',lt.link_id) AS tag_link_hash
    FROM
        link_tags lt, tags t
    WHERE
        t.id = lt.tag_id
        AND lt.link_id IN (
            SELECT
                link_id
            FROM
                link_tags lt2, links l2
            WHERE
                l2.id = lt2.link_id
                AND l2.created_by = ?  <-- user to filter tags for
                AND lt2.tag_id IN (
                   SELECT id FROM tags t2 WHERE tag IN (?)  <-- tags set to filter by
                )
            GROUP BY
                link_id
            HAVING
                COUNT(*) = ?)  <-- number of tags in filter
    GROUP BY
        tag_link_hash) tmp
GROUP BY
    tag
ORDER BY
    link_count DESC,
    tag ASC
[Results in X minutes - up to 4 hours]

In production database (as I mentioned – about 100K link_tags and 11K tags) the query runs in minutes to hours (depends on occurrence frequency of specified tags).
Strangely, everything goes smooth if I separate it into few queries:

1) Find ids for given tag names.

[REPLACEMENT QUERY 1]

SELECT id FROM tags t2 WHERE tag IN (?)

[Results in 0,0011 seconds]

2) Find all link_ids for given set of tags (intersection!).

[REPLACEMENT QUERY 2]

SELECT
    link_id
FROM
    link_tags lt2, links l2
WHERE
    l2.id = lt2.link_id
    AND l2.created_by = 1
    AND lt2.tag_id IN ( ? )  <-- here goes imploded result of query 1
GROUP BY
    link_id
HAVING
    COUNT(*) = ?  <-- number of tags

[Results in 0,0996 seconds]

3) Find all tags for given set of link_ids and group tags by count of links.

[REPLACEMENT QUERY 3]

SELECT COUNT(*) AS link_count, tag FROM (
    SELECT
        t.tag AS tag,
        CONCAT(lt.tag_id,':',lt.link_id) AS tag_link_hash
    FROM
        link_tags lt, tags t
    WHERE
        t.id = lt.tag_id
        AND lt.link_id IN ( ? )  <-- here goes imploded result of query 2
    GROUP BY
        tag_link_hash) tmp
GROUP BY
    tag
ORDER BY
    link_count DESC,
    tag ASC

[Results in 0,0543 seconds]

Do you have any idea what is going on? EXPLAIN shows roughly the same plans for large query as for the sum of separated ones. The difference is in number of rows processed in each step (and this is also strange).

Could you help to rewrite original query, hint the MySQL optimizer to run it efficiently or point me to the MySQL bug that causes this behavior?

EXPLAIN results for original query:

id  select_type table       type    possible_keys   key         key_len ref                     rows    Extra
1   PRIMARY     <derived2>  ALL     N8LL            N8LL        N8LL    N8LL                    32      Using temporary; Using filesort
2   DERIVED     lt          index   tag_id          link_tag_id 8       N8LL                    78162   Using where; Using index; Using temporary; Using filesort
2   DERIVED     t           eq_ref  PRIMARY         PRIMARY     4       lstack_prod.lt.tag_id   1
3   DEPENDENT   t2          range   PRIMARY,tag     tag         767     N8LL                    2       Using where; Using temporary; Using filesort
    SUBQUERY
3   DEPENDENT   lt2         ref     link_tag_id,    tag_id      4       lstack_prod.t2.id       7
    SUBQUERY                        tag_id,link_id
3   DEPENDENT   l2          eq_ref  PRIMARY,        PRIMARY     4       lstack_prod.lt2.link_id 1       Using where
    SUBQUERY                        created_by
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T21:39:43+00:00Added an answer on May 27, 2026 at 9:39 pm

    the WHERE IN (select values from table) is extremely inefficient in MySQL, and will trigger full table scans and file sorts all the time. Generally, you should replace these with an INNER JOIN.

    I THINK this should help, but I haven’t tried to re-create your DB, and haven’t run this query, so there might be typos.

    SELECT COUNT(*) AS link_count, tag FROM (
        SELECT
            t.tag AS tag,
            CONCAT(lt.tag_id,':',lt.link_id) AS tag_link_hash
        FROM
            link_tags lt
        JOIN tags t on t.id = lt.tag_id
        JOIN (SELECT
                    link_id
                FROM
                    link_tags lt2
                JOIN links l2 on l2.id = lt2.link_id
                JOIN tags t2 on t2.id = lt2.tag_id                
                WHERE
                    AND l2.created_by = ?  <-- user to filter tags for
                    AND t2.tag IN (?)  <-- tags set to filter by
                GROUP BY
                    link_id
                HAVING
                    COUNT(*) = ?) as eligible_links on eligible_links.link_id = lt.link_id
        GROUP BY
            tag_link_hash) tmp
    GROUP BY
        tag
    ORDER BY
        link_count DESC,
        tag ASC
    

    However, an explain plan would be very helpful.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a strange problem with PostgreSQL performance for a query, using PostgreSQL 8.4.9.
Strange performance outcome, I have a LINQ to SQL query which uses several let
I have strange problem with sharepoint and ajax functionality. We have an UpdatePanel placed
i have strange problem doing reporting: i have numerous clients with different issued invoices.
We have very strange problem, one of our applications is continually querying server by
We are facing a strange performance problem with SQL Server Express 2005 in a
I have a strange problem creating new counters in existing group. I have a
I'm dealing with a strange problem. I have a GUI built using swing. Its
I have a strange problem in Django development environment. I have changed my ISP
I have strange problem with Servicemix version Fuse ESB 4.4.1. Sometimes the part of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.