On the topic of screen understanding and executing or performing actions. There is a whole industry doing that (although not using LLM per se) RPA (Robotic process automation) - primarily built products around screen understanding and executing actions on the surface of the screen. Microsoft released the omniparser, and open source openinterpreter provide some of those capabilities.
On the topic of screen understanding and executing or performing actions. There is a whole industry doing that (although not using LLM per se) RPA (Robotic process automation) - primarily built products around screen understanding and executing actions on the surface of the screen. Microsoft released the omniparser, and open source openinterpreter provide some of those capabilities.