Is there a way to drop a letter in a string if it repeats?
For example lets say that I have the string aaardvark and I wanted to drop one of the beginning a, how would I do this?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
If I understood your question correctly, you can do this using regular expressions:
This collapses all sequences of identical characters into one, giving you
'ardvark'.As for the implementation of your spell checker, I suggest “collapsing” all words that have repeating characters in sequence in your dictionary and keeping that in a dictionary (data structure), where the key is the collapsed word and the value is the original word (or possibly a
setof original words):Now when you analyze your input, for each word:
Check if it exists in your list of correct words. If it does, ignore it. (eg: input is
'person'. It’s in the list of words. Nothing to do here).If it doesn’t, “collapse” it and see if:
'computerr'becomes'computer'. Now you just replace it with the original word in your list).'aaapppleee'become'aple'. Now you look up'aple'in your word list. It’s not there. Now look in your dictionary for the key'aple'. If it is there. Replace it with its value,'apple'.)The only problem I see with this approach is two valid words possibly “collapsing” into the same “word.” This means you’ll have to use a
setas your value.Say
'hallo'and'halo'are both valid words and the user enters'halloo'. Now you’ll have to decide which one to replace with. This can be done by calculating the Levenshtein distance between the input and the possible replacements.