I have many numbers in DB. For example,
448-48-00 #(from 00 to 99, 100 numbers)
336-87-00 #(same as above)
449-20-00 #(from 000 to 999, 1000 numbers)
I need to get base of these numbers. For this example, I need to get 44848, 33687 and 4492.
I have this code, but I don’t know, how to finish it 🙂
#!/usr/bin/perl
use v5.10;
use warnings;
my @p = 4484900..4484999;
push @p, $_ for 3368700..3368799;
my $data;
do {
my $z = 1;
while($z++ <= length $_) {
$data->{substr $_, 0, $z}++;
}
} for @p;
foreach my $key (sort { $data->{$a} <=> $data->{$b} } (keys %$data)) {
say $key if $data->{$key} > 99;
}
I need to get the longest elements and remove short elements, which longest code contain it
I tried to understand what you’re doing in your code and to improve it to do what you want. Disclaimer: it’s not that simple, for example there’s no way for an algorithm to see that you don’t want to group
44848..and4492...to44.....but that you want to group4492...instead of44924..and so on. But maybe this could already help you.I think the important part is the “smart filter” which for example looks at
336and3368and deletes the count of336if it isn’t higher than the other (336marks a trivial super set of3368). Important here is the string-sort together with thestatevariable$last:The output is absolutely right but not yet what you want:
So if you want to get rid of the 10-groups, you could change
to
Output:
This looks very good. Now it’s up to you what to do with the
4492-100-groups and the44-1100-group. If you want to delete the 100-groups depending on their length, that could also delete the4492group in favor of the large44group.