Basic idea is to sort the strings and compare signature of strings, where signature is the alphabetically sorted string.
What would be the efficient algorithm to do so ?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
If you are sorting the UTF8 characters “alphabetically”, you can convert them to 32-bit integers (UTF8 chars are 1 to 4 8-bit values) and then do a RADIX sort. It will work in O(N) time. If you were using just ASCII, I would suggest Counting Sort.
There are many ways to match the signatures but I would use a Hash Table ( O(1) on average ) or a O(Lg N) structure such as Red-Black Trees or Skip-Lists.
To further speed up your string matching, you can compress these signatures by Run Length Encoding these UTF8 characters (since they’re sorted, the signature will be runs + gaps). Actually, you could compress them to use bit tags that represent 7-bit chars (most common), RLE runs, and longer literals (8-bit through 32-bit chars). Comparing the compressed strings would be faster.