I have a list of words in which some are composed words, in example
- palanca
- plato
- platopalanca
I need to remove “plato” and “palanca” and let only “platopalanca”.
Used array_unique to remove duplicates, but those composed words are tricky…
Should I sort the list by word length and compare one by one?
A regular expression is the answer?
update: The list of words is much bigger and mixed, not only related words
update 2: I can safely implode the array into a string.
update 3: I’m trying to avoid doing this as if this was a bobble sort. there must be a more effective way of doing this
Well, I think that a buble-sort like approach is the only possible one 🙁
I don’t like it, but it’s what i have…
Any better approach?
function sortByLengthDesc($a,$b){
return strlen($a)-strlen($b);
}
usort($words,'sortByLengthDesc');
$count = count($words);
for($i=0;$i<=$count;$i++) {
for($j=$i+1;$j<$count;$j++) {
if(strstr($words[$j], $words[$i]) ){
$delete[]=$i;
}
}
}
foreach($delete as $i) {
unset($words[$i]);
}
update 5: Sorry all. I’m A moron. Jonathan Swift make me realize I was asking the wrong question.
Given x words which START the same, I need to remove the shortests ones.
- “hot, dog, stand, hotdogstand” should become “dog, stand, hotdogstand”
- “car, pet, carpet” should become “pet, carpet”
- “palanca, plato, platopalanca” should become “palanca, platopalanca”
- “platoother, other” should be untouchedm they both start different
I think you need to define the problem a little more, so that we can give a solid answer. Here are some pathological lists. Which items should get removed?:
SOME CODE
This code should be more efficient than the one you have:
You could optimise this by storing word lengths in an array before the loops.