I’m trying to calculate how closely an item matches a user’s defined preferences. The following is how I thought of doing it. But I’m not overly experienced and wanted to know if there might be a better way to go about it.
Using cars as a simple example. We will narrow it down to color and style of car (car, van, etc.).
Part One
The user selects in an HTML form the following:
Color: ( )White, (*)Black, ( )Red
Style: ( )Car, ( )Van, (*)Suv, (*)Truck
Now if I convert the above into a binary number where the first digit = first attribute (white) and continues on.
Attribute Code = 0100011 (black, suv, truck)
Part Two
Now with MySQL
Select item_id, attribute_code FROM items
items table = [item_id][attribute_code]
Next use PHP to calculate how closely each items attribute code matches the users preferences.
// Set users attribute code to var
$user_pref = $_POST['user_att_code'];
while($row=mysql_fetch_array($result))
{
// Pull attribute_code from database and put into var
$item_code = $row['attribute_code'];
// Set counters
$count_digit = 0;
$count_match = 0;
// Length of attribute code
$length = 7;
// Start calculating match
while($count_digit <= $length)
{
// Does first digit of users code = 1?
// Does first digit of items code = 1?
if($user_pref{$count_digit} != 0 && $user_pref{$count_digit} == $item_code{$count_digit})
{
// Add a positive match point to counter
$count_match++;
}
// Next digit in code
$count_digit++;
}
if($count_match > 0)
{
// Make array of item_id and match amount
$item_search [$row['item_id']] = $count_match;
}
}
// Sort array by most similar
arsort($item_search);
Then a little more code is used to calculate a percentage.
The above did the following: It took the users desired attribute code and compared it to every items attribute code in the database. It went digit by digit through each code and made a tally every time there was a match. At the end it put the tally for that item into an array and went onto the next item’s attribute code.
user: 0100011
it_1: 0100011 = 100% match
it_2: 0100100 = 50% match
it_3: 0011000 = 0% match
// If you notice the 50% does not make sense ignore it.
// I left something out for simplification.
Now I know this works. However, it does not seem like a good way to go about it. Performance wise mostly. Assuming over 150,000 items, and an attribute code length of about 200. That’s at least 30,000,000 calculations for one search (based on above).
Is there another way perhaps? Is this a big deal?
To achieve better perfomance you should reorganise your system.
Use separated tables for Colors and Styles. Also tables to organize relations between data (Items – Colors, Items – Styles).
You will be able to select from database ONLY parameters user has selected and not to iterate over all Items each request.