TYPE Action Execution Logic Optimization #41

Hua-Wen · 2025-02-19T01:32:33Z

Feature request / 功能建议

At present, in many scenarios, "input box text input" is divided into "CLICK" first, and then "TYPE".
It's a human input habit.
However, in fact, this is not necessary for computers. The coordinates output by "CLICK" and "TYPE" are the same. If the operation logic is put into Type Action to realize the theory, there is no problem, and the time and coast of a model call can be saved.

Motivation / 动机

Improve Action Execution Efficiency

Your contribution / 您的贡献

If necessary, I can prepare the training data set under simple guidance.

sixsixcoder · 2025-02-19T08:45:15Z

Because cogagent simulates human interface operations, it needs to locate the input box first before calling TYPE for input, so it needs to be executed step by step. But if you have a better solution, we'd like to hear it.

sixsixcoder self-assigned this Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TYPE Action Execution Logic Optimization #41

TYPE Action Execution Logic Optimization #41

Hua-Wen commented Feb 19, 2025

sixsixcoder commented Feb 19, 2025

TYPE Action Execution Logic Optimization #41

TYPE Action Execution Logic Optimization #41

Comments

Hua-Wen commented Feb 19, 2025

Feature request / 功能建议

Motivation / 动机

Your contribution / 您的贡献

sixsixcoder commented Feb 19, 2025