I have a generic with some filenames (LIST1) and another biggeneric with a full list of names (LIST2).
I need to match names from LIST1 to similar ones in LIST2. For example
LIST1
- **MAIZE_SLIP_QUANTITY_3_9.1.aif**
LIST 2
1- TUTORIAL_FAILURE_CLINCH_4.1.aif
2- **MAIZE_SLIP_QUANTITY_3_5.1.aif**
3- **MAIZE_SLIP_QUANTITY_3_9.2.aif**
4- TUTORIAL_FAILURE_CLINCH_5.1.aif
5- TUTORIAL_FAILURE_CLINCH_6.1.aif
6- TUTORIAL_FAILURE_CLINCH_7.1.aif
7- TUTORIAL_FAILURE_CLINCH_8.1.aif
8- TUTORIAL_FAILURE_CLINCH_9.1.aif
9- TUTORIAL_FAILURE_PUSH_4.1.aif
I’ve read about Levenshtein distance and used an implementation of it in a Framework (SignumFramework Utilities).
It returns me distance=1 in lines 2 and 3. But in my case line 3 is a better match than line 2.
Is there another method better to compare similar strings? Something more flexible?
When comparing as strings, “9.2” is not a better match than “5.1” for “9.1”. If you want the version numbers to be evaluated numerically, you have to parse the strings so that you can compare the string parts and the numerical parts separately.