I apologize if this question is too generic, if it is please feel free to edit. I am designing an A.I. system which is to monitor/observe human interaction with a desktop environment and learn from it.
I may use image captures and computer vision, but this adds a layer of complexity concerning the interacted elements on the screen.
I was wondering if there is a way to get the actual DOM or HTML elements a user interacts with (mouse click, on focus, kb input, etc) directly from the browser.
In windows, I may be able to hook a DLL into the browser, but in Linux I have no idea how to do something similar. The idea behind this is that when user clicks on “Button” LOG IN, instead of capturing image pixels using CV, I actually get the data structure of the element the user interacted with. How may I do something like this ? The engine will be a service developed in C/C++.
I apologize if this question is too generic, if it is please feel free
Share
If you are monitoring a desktop environment, I have the following suggestion for the Linux environment.
A starting point for X event watcher is given here. Hope this helps you.