Ok, this is a tricky one. I’m trying to set up a Selenium Grid 2 with some Windows 7 VMs to run Webdriver tests. To automatize the whole process I use some ant script that connects to the VMs through ssh to start/stop/reconfigure the nodes.
Everything works great, the nodes can register with the hub host and execute the test. Only problem is that I don’t see any browser window during the test run. I can see the process and I see the test log being executed, but there is no graphical interface.
On the other hand, if I start the node manually through Windows, everything is normal.
I suppose the problem is that processes executed under cygwin cannot start Windows displays, but in that case, shouldn’t throw an error? The other option I’m thinking is that Webdriver is using HTMLunit as a fallback, but then… why do I see the firefox process as long as the test lasts and consuming CPU and memory?
Through ssh, you only exchange with Windows stdin, stdout and stderr streams. The ssh connection is tunneling those streams and nothing else. You don’t see Windows Desktop interface, but the Desktop object exists on the Windows machine, the programs (here the browsers) are connected to it, and all GUI interactions are live in there.
If the GUI doesn’t require any user interaction, everything is fine that way. The dialog boxes are created, the program runs, once it finishes, the dialog boxes are destroyed by the application and the application closes. Nothing is blocking in terms of GUI our application.
If you program requires an user action in the created yet invisible dialog boxes, your program will be there waiting for your interaction to move forward. You will see the process in the task manager, doing nothing but waiting. As you don’t have access to the Windows Desktop where the dialog boxes are created and virtually ‘displayed’, the program seems to hang.
A typical case 2 is if you remote run a program waiting for a user to do something, say notepad. You can launch notepad, it will be spawned and then it will wait for you to type some text or close it.
With your Selenium tests, you are in case 1: all the browsers’ interactions needed to make the GUI working are actually done by Selenium server that does the navigation clicks and the program exit for you. Their GUI actually are living by browsing through your test web servers, you just don’t see it.
Some further readings from Microsoft website on Desktops and Desktop Creation.