I’ve been learned to keep my data in MySQL as raw as possible. Here’s an example of how I store content in my MySQL database:
title (VARCHAR, 255) => Références
content (TEXT) => <p>A paragraph about références...</p>
When I output it to a page, I use htmlentities() on title, but ofcourse not on content. I feel this is the correct way of storing it, since title stores only text and content stores HTML.
However, I now see a limitation to this: when I’m do a fulltext search to match a specific keyword (such as réferences), I need to search both for références AND références in order to retrieve all results.
And now I’m thinking… What is the correct way to solve this problem?
- Review the database and store everything with htmlentities? (Don’t want!)
- Do two searches, one for the keyword without htmlentities and one for the one with? (Doesn’t seem optimal to me…)
Just for the record, here’s my enormous MySQL query, that searches in page, page_content, article, download, member and event, so you have a bit of a picture of what I’m dealing with.
Thanks in advance for your efforts.
$keyword = utf8_decode(mysql_real_escape_string($_POST['keyword']));
SELECT
*,
sum(score) AS total_score
FROM
(
SELECT
"page" as db_table,
lid,
sid as page_sid,
sid,
hook,
title,
meta_keywords,
meta_description,
NULL as content,
NULL as location,
NULL as company,
MATCH(title, meta_keywords, meta_description) AGAINST("'.$keyword.'*" IN BOOLEAN MODE) AS score
FROM page
WHERE MATCH(title, meta_keywords, meta_description) AGAINST("'.$keyword.'*" IN BOOLEAN MODE)
UNION
SELECT
"page_content" as db_table,
p.lid as lid,
pc.page_sid as page_sid,
NULL as sid,
NULL as hook,
p.title as title,
NULL as meta_keywords,
NULL as meta_description,
pc.content as content,
NULL as location,
NULL as company,
MATCH(content) AGAINST("'.$keyword.'*" IN BOOLEAN MODE) AS score
FROM page_content pc, page p
WHERE MATCH(content) AGAINST("'.$keyword.'*" IN BOOLEAN MODE)
AND p.sid = pc.page_sid
UNION
SELECT
"article" as db_table,
lid,
NULL as page_sid,
sid,
NULL as hook,
title,
meta_keywords,
meta_description,
content,
NULL as location,
NULL as company,
MATCH(meta_keywords, meta_description, title, content) AGAINST("'.$keyword.'*" IN BOOLEAN MODE) AS score
FROM article
WHERE MATCH(meta_keywords, meta_description, title, content) AGAINST("'.$keyword.'*" IN BOOLEAN MODE)
UNION
SELECT
"download" as db_table,
lid,
NULL as page_sid,
NULL as sid,
NULL as hook,
title,
NULL as meta_keywords,
NULL as meta_description,
content,
NULL as location,
NULL as company,
MATCH(title, content) AGAINST("'.$keyword.'*" IN BOOLEAN MODE) AS score
FROM download
WHERE MATCH(title, content) AGAINST("'.$keyword.'*" IN BOOLEAN MODE)
UNION
SELECT
"event" as db_table,
lid,
NULL as page_sid,
NULL as sid,
NULL as hook,
title,
NULL as meta_keywords,
NULL as meta_description,
content,
location,
NULL as company,
MATCH(title, content, location) AGAINST("'.$keyword.'*" IN BOOLEAN MODE) AS score
FROM event
WHERE MATCH(title, content, location) AGAINST("'.$keyword.'*" IN BOOLEAN MODE)
UNION
SELECT
"member" as db_table,
NULL as lid,
NULL as page_sid,
NULL as sid,
NULL as hook,
NULL as title,
NULL as meta_keywords,
NULL as meta_description,
NULL as content,
NULL as location,
company,
MATCH(company) AGAINST("'.$keyword.'*" IN BOOLEAN MODE) AS score
FROM member
WHERE MATCH(company) AGAINST("'.$keyword.'*" IN BOOLEAN MODE)
) AS sub_query
WHERE 1=1
GROUP BY page_sid
ORDER BY total_score DESC
You could leave
"référence"unescaped incontentas well, as it remains correct HTML.(As long as the HTML header specifies the encoding to be the same as in the database, supposedly UTF-8:
)