Skip to content
/ acu Public

A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.

Notifications You must be signed in to change notification settings

francedot/acu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

logo

X Community

ACU - Awesome Agents for Computer Use

An AI Agent for Computer Use is an autonomous program that can reason about tasks, plan sequences of actions, and act within the domain of a computer or mobile device in the form of clicks, keystrokes, other computer events, command-line operations and internal/external API calls. These agents combine perception, decision-making, and control capabilities to interact with digital interfaces and accomplish user-specified goals independently.

A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.

Table of Contents

Articles

Papers

Surveys

Surveys


Frameworks & Models

Frameworks & Models


UI Grounding

UI Grounding


Dataset

Dataset


Benchmark

Benchmark


Safety

Safety


Projects

Open Source

Frameworks & Models

Frameworks & Models

  • AutoGen

    • Framework for building AI agent systems.
    • It simplifies the creation of event-driven, distributed, scalable, and resilient agentic applications.
  • Auto-GPT

    • Autonomous GPT-4 agent
    • Task automation focus
  • Browser Use

    • Make websites accessible for AI agents with vision + HTML extraction
    • Supports multi-tab management and custom actions with LangChain integration
  • Claude Computer Use Demo

    • MacOS implementation
    • Claude integration
  • Claude Minecraft Use

    • Game automation
    • Specialized use case
  • Computer Use OOTB

    • Ready-to-use implementation
    • Comprehensive toolset
  • Cybergod

    • Advanced computer control
  • Grunty

    • Computer control agent
    • Task automation focus
  • Inferable

    • Distributed agent builder platform
    • Build tools with existing code
  • LaVague

    • AI web agent framework
    • Modular architecture
  • Mac Computer Use

    • MacOS-specific tools
    • Anthropic integration
  • NatBot

    • Browser automation
    • GPT-4 Vision integration
  • OpenAdapt

    • AI-First Process Automation
    • Multimodal model integration
  • OpenInterface

    • Open-source UI interaction framework
    • Cross-platform support
  • OpenInterpreter

    • General-purpose computer control framework
    • Python-based, extensible architecture
  • Open Source Computer Use by E2B

    • Open-source implementation of computer control capabilities
    • Secure sandboxed environment for AI agents
  • Self-Operating Computer

    • Computer control framework
    • Vision-based automation
  • Skyvern

    • AI web agent framework
    • Automate browser-based workflows with LLMs using vision and HTML extraction
  • Surfkit

    • Device operation toolkit
    • Extensible agent framework
  • WebMarker

    • Web page annotation tool
    • Vision-language model support

UI Grounding

UI Grounding

  • AskUI/PTA-1

    • A small vision language model for computer & phone automation, based on Florence-2.
    • With only 270M parameters it outperforms much larger models in GUI text and element localization.
  • Microsoft/OmniParser

    • A general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent

Environment & Sandbox

Environment & Sandbox


Automation

Automation

  • nut.js

    • Native UI automation
    • JavaScript/TypeScript implementation
  • PyAutoGUI

    • Cross-platform GUI automation
    • Python-based control library

Commercial

Frameworks & Models

Frameworks & Models

  • Anthropic Claude Computer Use

    • Commercial computer control capability
    • Integrated with Claude 3.5 models
  • Multion

    • AI agents that can fully complete tasks in any web environment.
  • Runner H

    • Advanced AI agent for real-world applications.
    • Scores 67% on WebVoyager

Contributing

We welcome and encourage contributions from the community! Here's how you can help:

  • Add new resources: Found a relevant paper, project, or tool? Submit a PR to add it
  • Fix errors: Help us correct any mistakes in existing entries
  • Improve organization: Suggest better ways to structure the information
  • Update content: Keep entries up-to-date with latest developments

To contribute:

  1. Fork the repository
  2. Create a new branch for your changes
  3. Submit a pull request with a clear description of your additions/changes
  4. Post in the X Community to let everyone know about the new resource

For an example of how to format your contribution, please refer to this PR.


Thank you for helping spread knowledge about AI agents for computer use!