I want to write a program that could play solitaire on windows. The user would run the program and open solitaire and watch as the cards move. Its easy to write the AI for the program that would play solitaire (I obviously don’t have the source code for the Windows solitaire). One way I can think of doing this is if I took an image of the solitaire and analyzed it to determine the current state of the cards, then I could pass the current state of the cards to my program, which could determine the next move. But how would I execute the clicking command.
More generally, I want to write a program that can interact with another program like a user would.
I have experience with C, C++, Java, Ruby. But I don’t know how to even get started on this, or whether this can be even done?
Java is not the language to do this. In order to facilitate functionality like this you need to rely heavily on windows API, especially those that allow you to inject input. As a starting point I can suggest you get acquainted with how windows application actually process the input. This is a good article on the subject.