Okay – I have a dilemma. So far my script converts page titles into

Question

0

Asked: June 5, 20262026-06-05T17:14:13+00:00 2026-06-05T17:14:13+00:00

Okay – I have a dilemma. So far my script converts page titles into

0

Okay – I have a dilemma. So far my script converts page titles into categories. This is based on keywords, and when there is a match a certain score is added, I.e some words hold the value of 10, some only 1. This gets accumulated into a total score for each category.

[{15: [32, 'massages']}, {45: [12, 'hair-salon']}, {23,:[3, 'automotive service']}]

Index being the category id, first value the score second value the category.

In some instances this spans to over 10 category matches.

How can I filter this to only the top 60-75%

I.e clearly massages and hair salon are the most as they are well above automotive service. But how can this intelligence we use be programmed?

I was thinking stddev could help?

Edit

I am trying to filter out low scoring items e.g.

data = [{15: [32, 'massages']}, {45: [1, 'hair-salon']}, {23:[1, 'automotive service']}]]

Massages is the only high scoring item in this instance

data = [{15: [4, 'massages']}, {45: [2, 'hair-salon']}, {23:[1, 'automotive service']}]]

Stil massages

data = [{15: [10, 'massages']}, {45: [50, 'hair-salon']}, {23:[5, 'automotive service']}]]

Now hair-salon (as it is well above others)

So I need not take the first (N) objects, moreso, the first objects that are x higher then other numbers as a percentage or form of standard deviation.

So 50 is much higher then 10 and 5

10 is much higher then 3 or 2

However 9, 8 and 6 are much the same

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T17:14:17+00:00

yourdata = [{15: [32, 'massages']}, {45: [12, 'hair-salon']}, {23:[3, 'automotive service']}]

# transfer your data into a more usable format
data = [(score,cat,name) for dat in yourdata for cat,(score,name) in dat.iteritems()]

# sort on descending score
data.sort(reverse=True)

# throw away the low-scoring items
data = data[:int(len(data)*0.6 + 1)]

returns

[(32, 15, 'massages'), (12, 45, 'hair-salon')]

(the two highest-scoring items)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Okay – I have a dilemma. So far my script converts page titles into

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply