Abstract:
We present Rataplan, a robust and resilient pixel-based approach for linking multi-modal proxies to automated sequences of actions in graphical user interfaces (GUIs). With Rataplan, users demonstrate a sequence of actions and answer human-readable follow-up questions to clarify their desire for automation. After demonstrating a sequence, the user can link a proxy input control to the action which can then be used as a shortcut for automating a sequence. Alternatively, output proxies use a notification model in which content is pushed when it becomes available. As an example use case, Rataplan uses keyboard shortcuts and tangible user interfaces (TUIs) as input proxies, and TUIs as output proxies. Instead of relying on available APIs, Rataplan automates GUIs using pixel-based reverse engineering. This ensures our approach can be used with all applications that offer a GUI, including web applications. We implemented a set of important strategies to support robust automation of modern interfaces that have a flat and minimal style, have frequent data and state changes, and have dynamic viewports.