I’ve been thinking about this for a while now, so I thought I would

Question

0

Asked: May 17, 20262026-05-17T16:24:14+00:00 2026-05-17T16:24:14+00:00

I’ve been thinking about this for a while now, so I thought I would

0

I’ve been thinking about this for a while now, so I thought I would ask for suggestions:

I have some crawler which enters the root of some site (could be anything from http://www.StackOverFlow.com, http://www.SomeDudesPersonalSite.se or even http://www.Facebook.com). Then I need to determin what “kind of homepage” I’m visiting.. Different types could for instance be:

Forum
Blog
Link catalog
Social media site
News site
“One man site”

I’ve been brainstorming for a while, and the best solution seems to be some heuristic with a point system. By this I mean different trends gives some points to the different types, and then the program makes a guess afterwards.

But this is where I get stuck.. How do you detect trends?

Catalogs could be easy: If sitesIndexed/Outgoing links is very high, catalogs should get several points.
News sites/Blogs could be easy: If a high amount of sites indexed has a datetime, those types should get several points..

BUT I can’t really find too many trends.

SO: My question is:
Any ideas on how to do this?

Thanks so much..

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T16:24:15+00:00

Editorial Team

2026-05-17T16:24:15+00:00Added an answer on May 17, 2026 at 4:24 pm

You could train a neural network to recognise them. Give it number/types of links, maybe types of HTML tags as well.

I think otherwise you’re just going to be second-guessing what makes a site what it is.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve been thinking about this for a while now, so I thought I would

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply