So I have a table with two columns title and url. The rows go

Question

0

Asked: May 23, 20262026-05-23T16:03:03+00:00 2026-05-23T16:03:03+00:00

So I have a table with two columns title and url. The rows go

0

So I have a table with two columns “title” and “url”. The rows go as such:

Title                              url

    Galago - Wikipedia                  http://en.wikipedia.org/wiki/Galago         
    Characteristics - Wikipedia          http://en.wikipedia.org/wiki/Galago
    Classification - Wikipedia           http://en.wikipedia.org/wiki/Galago
    Myst- Gamestop                       http://www.gamestop.com/ds/games/myst/69424
    Plot- Gamestop                       http://www.gamestop.com/ds/games/myst/69424

my question is, how would I remove the common characters that are present in all rows from a certain url (remove – Wikipedia from the first three, and – Gamestop from the other 2). This is just a minor example….I have many other rows that have the same pattern (they have common characters, words, that reoccur in all of the rows from a certain url). I wanted to add that I store these values from a javacript array

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T16:03:03+00:00

I think that most automated solutions to this risk removing data that you want to keep. A word or phrase that occurs on more than one row is not necessarily redundant. A couple of potential, but still unreliable, methods come to mind. These would work only if you are looking for whole words.

Read all the titles into an array, and create a wordlist array by splitting each title into words. You can then determine the frequency of each word, and use that information to remove the unwanted words from the titles. If you have a lot of data, this method could use a lot of memory…
Parse each URL, extract the hostname, split it using a period (.) As the delimiter, and then search for and remove occurrences of those strings from the title. You might choose to create a whitelist of strings to ignore, like www, com, co, uk, net, org, and so on. This method may work if the unwanted words are found in the domain name (as in your examples).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

So I have a table with two columns title and url. The rows go

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply