In my database table(whitelist_domain_data) I have the fields id,url,data
The Url column has multiple urls like
http://www.dailystrength.org/c/Hidradenitis_Suppurativa/forum/8870995-solodyn-135-mg-works http://au.answers.yahoo.com/question/index?qid=20090325215905AA6UVOa http://navaspot.wordpress.com
I want to fetch the rows which have the same domain.
TABLE : Whitelist_domain_data
Schemas : id,url,data
select regexp_matches(url,'http\:\/\/([a-z0-9\.]+)\.org') as domain,
count(*)
from whitelist_domain_data
group by domain;
Should return:
dailystrength.org 200
Ques:
How to design the query to fetch all the urls data,if the url has the domain “dailystrength.org”?
You can do this with substring(), and you’ll also probably want to use an expression index. Here’s an example (I tweaked the regex to match what I think you want):
Now this query can use the index. If this is something you plan on using a lot, you might consider also creating a specific function for it:
Then the above becomes:
So then if you ever want to change what a domain is (to ignore subdomains, or whatever) you can just change the function and your queries will all still work. I think you’ll have to reindex at that point, though.
I checked that this all works on Postgres 9.1, but it should be compatible with any recent version. Expression indexes and substring() both go back to the 7.x days.