What is the quickest way to find the first character which only appears once in a string?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
I see that people have posted some delightful answers below, so I’d like to offer something more in-depth.
An idiomatic solution in Ruby
We can find the first un-repeated character in a string like so:
How does Ruby accomplish this?
Reading Ruby’s source
Let’s break down the solution and consider what algorithms Ruby uses for each step.
First we call
each_charon the string. This creates an enumerator which allows us to visit the string one character at a time. This is complicated by the fact that Ruby handles Unicode characters, so each value we get from the enumerator can be a variable number of bytes. If we know our input is ASCII or similar, we could useeach_byteinstead.The
each_charmethod is implemented like so:In turn,
rb_string_enumerate_charsis implemented as:From this we can see that it calls
rb_enc_mbclen(or its fast version) to get the length (in bytes) of the next character in the string so that it can iterate the next step. By lazily iterating over a string, reading just one character at a time, we end up doing just one full pass over the input string astallyconsumes the iterator.Tally is then implemented like so:
Here,
tally_iusesRB_BLOCK_CALL_FUNC_ARGLISTto call repeatedly totally_up, which updates the tally hash on every iteration.Rough time & memory analysis
The
each_charmethod doesn’t allocate an array to eagerly hold the characters of the string, so it has a small constant memory overhead. When wetallythe characters, we allocate a hash and put our tally data into it which in the worst case scenario can take up as much memory as the input string times some constant factor.Time-wise,
tallydoes a full scan of the string, and callingfindto locate the first non-repeated character will scan the hash again, each of which carry O(n) worst-case complexity.However, tally also updates a hash on every iteration. Updating the hash on every character can be as slow as O(n) again, so the worst case complexity of this Ruby solution is perhaps O(n^2).
However, under reasonable assumptions, updating a hash has an O(1) complexity, so we can expect the average case amortized to look like O(n).
My old accepted answer in Python
You can’t know that the character is un-repeated until you’ve processed the whole string, so my suggestion would be this:
Edit: originally posted code was bad, but this latest snippet is Certified To Work On Ryan’s Computer™.