I can’t seem to find a question on SO about my particular problem, so

Question

0

Editorial Team

Asked: May 25, 20262026-05-25T21:50:00+00:00 2026-05-25T21:50:00+00:00

I can’t seem to find a question on SO about my particular problem, so

0

I can’t seem to find a question on SO about my particular problem, so forgive me if this has been asked before!

Anyway, I’m writing a script to loop through a set of URL’s and give me a list of unique urls with unique parameters.

The trouble I’m having is actually comparing the parameters to eliminate multiple duplicates. It’s a bit hard to explain, so some examples are probably in order:

Say I have a list of URL’s like this

hxxp://www.somesite.com/page.php?id=3&title=derp
hxxp://www.somesite.com/page.php?id=4&title=blah
hxxp://www.somesite.com/page.php?id=3&c=32&title=thing
hxxp://www.somesite.com/page.php?b=33&id=3

I have it parsing each URL into a list of lists, so eventually I have a list like this:

sort = [['id', 'title'], ['id', 'c', 'title'], ['b', 'id']]

I nee to figure out a way to give me just 2 lists in my list at that point:

new = [['id', 'c', 'title'], ['b', 'id']]

As of right now I’ve got a bit to sort it out a little, I know I’m close and I’ve been slamming my head against this for a couple days now :(. Any ideas?

Thanks in advance! 🙂

EDIT: Sorry for not being clear! This script is aimed at finding unique entry points for web applications post-spidering. Basically if a URL has 3 unique entry points

['id', 'c', 'title']

I’d prefer that to the same link with 2 unique entry points, such as:

['id', 'title']

So I need my new list of lists to eliminate the one with 2 and prefer the one with 3 ONLY if the smaller variables are in the larger set. If it’s still unclear let me know, and thank you for the quick responses! 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T21:50:01+00:00

I’ll assume that subsets are considered “duplicates” (non-commutatively, of course)…

Start by converting each query into a set and ordering them all from largest to smallest. Then add each query to a new list if it isn’t a subset of an already-added query. Since any set is a subset of itself, this logic covers exact duplicates:

a = []
for q in sorted((set(q) for q in sort), key=len, reverse=True):
    if not any(q.issubset(Q) for Q in a):
        a.append(q)
a = [list(q) for q in a] # Back to lists, if you want

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I can’t seem to find a question on SO about my particular problem, so

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply