So, I have a database with a table called artcles, and also a table called article tags. When a user views an article, I want to query up to five articles that have tags similar to the one that is being viewed. Here are my two tables:
CREATE TABLE `articles` (
`article_id` int(15) NOT NULL AUTO_INCREMENT,
`parent_id` int(15) NOT NULL,
`author_id` int(15) NOT NULL,
`title` text NOT NULL,
`content` text NOT NULL,
`date_posted` text NOT NULL,
`views` int(15) NOT NULL,
`preview` text NOT NULL,
`status` tinyint(1) NOT NULL,
`modified_date` text NOT NULL,
PRIMARY KEY (`article_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `article_tags` (
`tag_id` int(15) NOT NULL AUTO_INCREMENT,
`article_id` int(15) NOT NULL,
`keyword` varchar(250) NOT NULL,
PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I’ve tried writing my own queries, but they never seem to work. I would like to use joins in the query instead of resorting to using CSV’s and LIKE. Here’s the query that I have so far:
SELECT A2.article_id, count(A2.article_id) AS matches
FROM article_tags AS A1 JOIN article_tags ON (A1.keyword = A2.keyword AND 1.article_id != A2.article_id)
JOIN articles ON (A2.article_id = A.article_id) AS A
WHERE A1.article_id = 1
GROUP BY A2.article_id
ORDER BY matches DESC
LIMIT 5"
This is my updated query:
$query = "
SELECT t2.article_id, count(t2.keyword) AS matches
FROM article_tags t1
JOIN article_tags t2 ON (t1.keyword = t2.keyword AND t1.article_id != t2.article_id)
WHERE t1.article_id = ".$article_id."
GROUP BY t2.article_id
ORDER BY matches DESC
LIMIT 5";
This is the result of dumping the array with var_dump
array
0 =>
array
'article_id' => string '2' (length=1)
'matches' => string '1' (length=1)
$query = "
SELECT t2.article_id, count(t2.keyword) AS matches
FROM article_tags t1
JOIN article_tags t2 ON (t1.keyword = t2.keyword AND t1.article_id != t2.article_id)
WHERE t1.article_id = ".$article_id."
GROUP BY t2.article_id
ORDER BY matches DESC
LIMIT 5";
if($query = $this->db->query($query)){
if($query->num_rows() > 0){
foreach($query->result_array() as $id => $article){
$articles[$id] = $this->fetch_article($article['article_id']);
}
} else {
$articles = array();
}
} else {
$articles = array();
}
return $articles;
}
Basically your thinking is correct – make a self JOIN on
article_tagstable. There are something that you should improve:tag_idinstead ofarticle_id, since you want to sort articles by relevance, and the count of matched tags indicates the relevance.tag_idinstead ofkeyword. Join on non-indexed column will be a performance issue.!=in JOIN condition for the reason of performance. Just get all related articles and simply remove the most related one, which should be the current article itselfarticlesis not necessary, for the reason of performance. You don’t need the articles themselves; just do a simple SELECT onarticlesafter you get ids of the 5 related articles.So the answer could be something like this:
You should get an array with 6 ids, and just remove the first one, then do a SELECT(e.g. in python):