Ok, let’s suppose we have members table. There is a field called, let’s say, about_member. There will be a string like this 1-1-2-1-2 for everybody. Let’s suppose member_1 has this string 1-1-2-2-1 and he searches who has the similar string or as much similar as possible. For example if member_2 has string 1-1-2-2-1 it will be 100% match, but if member_3 has string like this 2-1-1-2-1 it will be 60% match. And it has to be ordered by match percent. What is the most optimal way to do it with MYSQL and PHP? It’s really hard to explain what I mean, but maybe you got it, if not, ask me. Thanks.
Edit: Please give me ideas without Levenshtein method. That answer will get bounty. Thanks. (bounty will be announced when I will be able to do that)
Jawa posted this idea originally; here is my attempt.
^is the XOR function. It compares 2 binary numbers bit-by-bit and returns 0 if both bits are the same, and 1 otherwise.How this applies to your problem:
How we can count these bits in MySQL:
So to get the similarity…
If we could just make your
about_memberfield store data as bits (and be represented by an integer), we could do all of this easily! Instead of1-2-1-1-1, use0-1-0-0-0, but without the dashes.Here’s how PHP can help us:
And finally, here’s the implementation:
This last query will have selected the 10 members most similar to me!
Now, to recap, in layman’s terms,
We use binary because it makes things easier; the binary number is like a long line of light switches. We want to save our “light switch configuration” as well as find members that have the most similar configurations.
The
^operator, given 2 light switch configurations, does a comparison for us. The result is again a series of switches; a switch will beONif the 2 original switches were in different positions, andOFFif they were in the same position.BIT_COUNTtells us how many switches areON–giving us a count of how many switches were different.YOUR_TOTAL_BITSis the total number of switches.But binary numbers are still just numbers… and so a string of 1’s and 0’s really just represents a number like 133 or 94. But it’s a lot harder to visualize our “light switch configuration” if we use decimal numbers. That’s where PHP’s
decbinandbindeccome in.Learn more about the binary numeral system.
Hope this helps!