We went in production with our new SharePoint solution this week. After nearly a year in development and testing on a staging environment it is the first time I’ve experienced the following error.
The error: Sometimes it works, sometimes it crashes with a 404
We are using Nintex workflow in our SharePoint solution – but I guess Nintex is not the deal breaker here. From this workflow (and other workflows) we are calling a custom asmx webservice that is hosted in the SharePoint farm’s _vti_bin. Everything inside the webservice runs with elevated privileges. When the workflow is calling the webservice, the webservice randomly returns for some users a 404 error: The resource cannot be found. The missing resource here is the asmx path to the webservice.
The interesting thing is that this error occurs only when one of our customer’s users is initiating the workflow, but not if one of our test logins is workflow initiator. The error is also only occurring 2 out of 5 test runs for the customer’s users. When using my test login for a test run I do experience such an error too sometimes, but only 3 times out of 20 tries. The error is occurring at another webservice call from the workflow – not the same call that crashes when a customer’s user is trying.
The environment
The SharePoint environment consists of two web front ends and an application server. A load balancer is also operating. So my guess is that one of the machines in the environment isn’t configured properly. When I’m testing with our test users I’m testing on workstations which are in the same network as the farm’s servers – I’m doing this directly via a remote desktop connection and also my tests have to pass the load balancer. The customer’s users are testing the workflow under similar circumstances, but I believe they’re routed differently to the SharePoint farm so that they get onto another front end like I do when testing remote.
What I already tried
- Testing the webservice directly without a workflow with my test users
- Testing the webservice directly without a workflow on each front end
- Checking if the asmx file is really in all _vti_bin folders on all machines
Conclusion: The asmx exists everywhere and I’ve never experienced a the ASP.NET specific 404 when calling the webservice manually.
My suspicion
It sometimes works, so I guess probably one front end is doing its job ok. When the workflow runs on the other (faulty) front end the problem raises – that would explain why my test user initiated workflows behave differently to the customer initiated.
Or could it be a permission problem? I have tested calling the webservice manually from the _vti_bin with a test user that has absolutely no rights on the SharePoint farm and was able to call the webservice successfully. Or should I try to start the workflow with the System Account?
Are there any things I could try to narrow down the problem? The staging system works still fine – same version, same users, no problems.
Thanks in advance and happy holidays!
Cheers
Solved the problem today.
It was a configuration issue: My customer’s IT added another application server to the farm and has forgotten to configure the bindings in the IIS console. Everytime the workflow ran on the new application server it couldn’t connect to the webservice because of the missing binding.