1 Comment
User's avatar
VJAnand's avatar

On the topic of screen understanding and executing or performing actions. There is a whole industry doing that (although not using LLM per se) RPA (Robotic process automation) - primarily built products around screen understanding and executing actions on the surface of the screen. Microsoft released the omniparser, and open source openinterpreter provide some of those capabilities.