Data is fairly large and takes few minutes to run it every time, so

Question

0

Asked: June 15, 20262026-06-15T23:29:06+00:00 2026-06-15T23:29:06+00:00

Data is fairly large and takes few minutes to run it every time, so

0

Data is fairly large and takes few minutes to run it every time, so its taking a lot of time debugging this problem. When I run like concat('%',T.item,'%') on smaller data it seems to identify items properly. However, when I run it on the main DB (the code shown), it still shows many(maybe even all) of the exceptions.

EDIT:
it seems when i add NOT it stops identifying items

select distinct T.comment
from (select comment, source, item from data, non_informative where ticker != "O" and source != 7 and source != 6) as T
where T.comment not like concat('%',T.item,'%')
order by T.comment;

comment and source are in data, item is in non_informative

Some items from T.item:

‘Stock Analysis -‘, ‘#InsideTrades’, ‘IIROC Trade’

Example comment which should be removed

‘#InsideTrades #4 | MACNAB CRAIG (Director,Officer,Chief Executive
Officer): Filed Form 4 for $NNN (NATIONAL RETA’

Can’t seem to figure out it why shows all the items

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T23:29:08+00:00

You’ve got a Cartesian product between non_informative and data tables. (Not at all clear which table the column ticker is from.

Understand that for a “comment” to be returned, all that is required (to satisfy the predicates in your query) is for one row to be found in non_informative which does not “match” the comment. There may be rows in non_informative that do match, but your query doesn’t care about those. Your query is only looking for the existence of a row that does NOT match. The query is effectively saying that a “comment” will be excluded ONLY if it matches every single row in non_informative.

If what you want to return is the values of “comment” for which there is NO matching row in non_informative, you need a different query. (I’m going to assume that the ticker column is from the data table.)

I’m also going to exclude the corner cases of an empty string value for item, since that will essentially “match” every non-null value for comment.

SQL Fiddle here

— using a NOT EXISTS predicate:

 SELECT d.comment
   FROM `data` d
  WHERE d.ticker != 'O'
    AND d.source != 7
    AND d.source != 6
    AND NOT EXISTS
        ( SELECT 1
            FROM `non_informative` n
           WHERE n.item <> ''
             AND d.comment LIKE CONCAT('%',n.item,'%')
        )
  GROUP BY d.comment
  ORDER BY d.comment

— or, using an anti-join:

 SELECT d.comment
   FROM `data` d
   LEFT
   JOIN ( SELECT n.item
            FROM `non_informative` n
           WHERE n.item <> ''
           GROUP BY n.item
        ) m
     ON d.comment LIKE CONCAT('%',m.item,'%')
  WHERE d.ticker != 'O'
    AND d.source != 7
    AND d.source != 6
    AND m.item IS NULL
  GROUP BY d.comment
  ORDER BY d.comment

These two statements should return an equivalent result set (but different from the resultset of your original query). They will also likely exhibit different performance characteristics (depending on the version of MySQL, and whether the MySQL engine can transform the NOT EXISTS predicate into an anti-join operation… performance is really going to depend on what indexes are available, and generated execution plan.)

If we don’t bother with the empty string corner-case, we can simplify the second statement a bit…

 SELECT d.comment
   FROM `data` d
   LEFT
   JOIN `non_informative` n
     ON d.comment LIKE CONCAT('%',n.item,'%')
  WHERE d.ticker != 'O'
    AND d.source != 7
    AND d.source != 6
    AND n.item IS NULL
  GROUP BY d.comment
  ORDER BY d.comment

Basically, for every row in the data table, we’re checking for a “match” in the non_informative table. For any row where we find a “match”, that row will be excluded by the “n.item IS NULL” predicate. For any row from data where it doesn’t find a matching row in non_informative, the LEFT JOIN operation will generate a NULL value for the “item” column, so the row will be included in the resultset.

PERFORMANCE:

Your original query includes an inline view (aliased as t). MySQL is going to materialize that as an intermediate MyISAM table, before the outer query runs. And that kind of think can be a real performance killer with large tables.

But before we “tune” that statement, we really need a statement that returns a correct resultset. (There’s no sense in re-writing that statement if it doesn’t return the desired resultset, except as an exercise.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Data is fairly large and takes few minutes to run it every time, so

Leave an answerCancel reply

1 Answer

SQL Fiddle here

Leave an answer
Cancel reply