I have a table that stores a tree like structure of file names. There are currently 8 million records in this table. I am working on a way to quickly find a list of files what have a specific serial number embedded in the name.
FS_NODES
-----------------------------------
NODE_ID bigint PK
ROOT_ID bigint
PARENT_ID bigint
NODE_TYPE tinyint
NODE_NAME nvarchar(250)
REC_MODIFIED_UTC datetime
REC_DELETION_BIT bit
Example file name (as stored in the node_name):
scriptname_SomeSerialNumber_201205240730.xml
As expected, the LIKE statement to find the files takes several minutes to scan the entire table and would like to improve this. There is no consistent patterns for the names as each developer likes to create their own naming convention.
I tried using the Full Text Search and really love the idea but not able to get it to find files based off keywords in the name. I believe the problem is due to the underscores.
Any suggestions on how I can get this to work? I am using a neutral language for the catalog.
@@VERSION
Microsoft SQL Server 2005 - 9.00.4035.00 (Intel X86)
Nov 24 2008 13:01:59
Copyright (c) 1988-2005 Microsoft Corporation
Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 2)
Is there a way to alter the catalog and split the keywords out manually?
Thank you!
Full-text search is not the answer. It is used for words, not partial string matching. What you should do is, when inserting or updating data in this table, extract the parts of the filename that are relevant for future searching into their own column(s) which you can index. After all, they are separate pieces of data the way you are using them. You could also consider enforcing a more predictable naming convention instead of just letting the developers do whatever they want.
EDIT per user request:
Add a computed column that is REPLACE(filename, ‘_’, ‘ ‘). Or instead of a computed column, just a column you manually populate for existing data and change your insert procedure to deal with going forward. Or even break those out into separate rows in a related table.