I am writing a classifier for categorizing whether a special deal is for a

Question

0

Asked: May 26, 20262026-05-26T12:19:26+00:00 2026-05-26T12:19:26+00:00

I am writing a classifier for categorizing whether a special deal is for a

0

I am writing a classifier for categorizing whether a special deal is for a restaurant/hotel/etc… This is part of a web-crawler for analyzing external sites.
For start I made a meal?() method, which accepts a piece of text and will return true if it think the text is about a meal deal. It can’t be 100% accurate, since only simple keyword matching is used.

def meal?(text)
  !text.match(/restaurant|meal|wine|.../i).nil?
end

Now I am writing a test for it, and I have two questions. The first one is that I think it is a bit redundant to re-list all of these keywords in the unit test again. What do you think?

The second question:
I have an .html file in source control. It is used to test the crawler’s parsing functionality. Theoretically all of its items should pass, so I am thinking to use that html in this categorizing test, parse that html and feed the descriptions of each deal into this method.

One drawback is that the .html is taken from an external site. When that site changes layout I will update this .html file, and then I have to change this categorizing test too. But I think this is okay.

Is this recommended? I thought of this way because I feels uneasy extracting information out of that .html and place it in the test script itself (not DRY, and makes test script quite big). Would feeding the parsed description violate any fundamental testing laws, like ‘this hides the necessary details away from developers’ or ‘this is bad for generating reports’?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T12:19:26+00:00

OK so I obviously misunderstood the question so I will revise this answer completely.

I personally think it is simpler and preferable to take the actual text from the html file and copy/paste it to the test as opposed to the indirection of loading an html file. Two reasons I can find…

When I write/read unit tests I prefer all the info to be there right in front of me instead of being an ‘external source’ like a resource file that I have to dig for. Personal preference tho.
It is a bit confusing, because you can use this method for other things as well not just reading text from html file and classifying it. So to keep it more generic I would just use raw text in the actual test.

I cannot however find a reason why what you are trying to do is really really bad, I think it boils down to personal preference.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am writing a classifier for categorizing whether a special deal is for a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply