I’m currently researching ways to speed up and scale up a long running matching job which is currently running as a stored procedure in MSSQL 2005. The matching is involves multiple fields with many inexact cases. While I’d like to ultimately scale it up to large scale data sets outside of the database I need to consider some shorter term solutions also.
Given that I don’t know much about the internal implementation of how they are run I’m wondering if it were possible to split the process up into parallel procedures by dividing the data set with a master procedure, which then kicks off subprocs which work on smaller data sets.
Would this yield any performance gains with a clustered database? Will MSSQL distribute the subprocs across the cluster nodes automatically and sensibly?
Perhaps it’s better to have the master process in java and call worker procedures through jdbc which would presumably use cluster load balancing effectively? Aside from any arguments about maintainability could this be faster?
You have a fundamental misunderstanding of what clustering means for SQL Server. Clustering does not allow a single instance of SQL Server to share the resources of multiple boxes. Clustering is a high availability solution that allows the functionality of one box to shift over to another standby box in case of a failure.