Are there pattern recognition algorithms where I can specify the weight of matching or missing certain parameters? For example, suppose I have 3 strings:
str1 = Samsung 11.6" 64GB Slate PC Tablet with Wi-Fi - Black
str2 = Samsung Series 7 XE700T1A-A05US 11.6-Inch Slate (64 GB, Win 7 Pro)
str3 = Samsung Series 7 XE700T1A-A03US 11.6-Inch Slate (128 GB SSD, Win 7 HP)
I would like to match str2 to str1 since they have equal GB, even though conventional string distance would say str2 is closer to str3. In reality, I would hope for anything that can handle a large number of parameters with different weights.
Any pointers to the right direction would be appreciated.
{Number}{Space}?"GB"for gigabytes or{TradeMark}{Space}"Series"{Space}{Number}for trademark and series.Note: to easier work with dictionaries and rules consider using GATE framework. To measure distance between 2 vectors you may use cosine distance.