Hey HN,I made Browser-Use, an open-source tool that lets (all Langchain supported) LLMs execute tasks directly in the browser just with function calling.It allows you to build agents that interact with web elements using natural language prompts. We created a layer that simplifies website interaction for LLMs by extracting xPaths and interactive elements like buttons and input fields (and other fancy things). This enables you to design custom web automation and scraping functions without manual inspection through DevTools.Hasn't this been done a lot of times?
Good question, as a general SaaS tool yes, but I think a lot of people are going to try to make their own web automation agents from scratch, so the idea is to provide groundwork/library for the hard part so that not everyone has to repeat these steps:- parse html in a LLM friendly way (clickable items + screenshots)- provide a nice function calls for everything inside the browser- create reusable agent classesWhat this is NOT? An all knowing AI agent that can solve all your problems.The vision: create repeatable tasks on the web just by prompting your agent and not care about the hows.To better showcase the power of text extraction we made a few demos such as:- Applying for multiple software engineering jobs in San Francisco- Opening new tabs to search for images of Albert Einstein, Oprah Winfrey, and Steve Jobs- Finding the cheapest one-way flight from London to Kyrgyzstan for December 25thI’d be interested in feedback on how this tool fits into your automation workflows. Try it out and let me know how it performs on your end.We are Gregor & Magnus and we built this in 5 days.
Users discussed the efficiency of CLI over GUI for automation, with suggestions to use tools like OpenInterpreter and Agent-E for CLI automation. There's a preference for HTML over screenshots in LLMs due to issues with screenshots hiding interactive elements and being larger in size. Some users expressed interest in browser extensions for HTML/CSS access and the use of LLMs. Concerns were raised about context length, API costs, and the reliability of APIs. There's also interest in web automation, with suggestions for a test suite and community prompt recipes. Users shared experiences with building Chrome extensions and Docker containers, and there's a desire for automation in Windows environments. The project received positive feedback, with requests for a visible license file and features like database integration and cron for automated tasks.
Users criticized the product for lacking integration with Claude's APIs, having an undesirable LLM design with slow response and low success rates, and relying on GUI automation which is seen as unreliable and inefficient. The use of screenshots was also a major point of contention due to their size, lack of interactivity, and poor representation of context. Additionally, there were concerns about high costs, inefficiency with large files, and high token usage. The development experience was described as sad, with a lack of transparency, no browser extension, and no support for Selenium or Playwright. Users also mentioned the absence of a test suite, community prompt recipes, and a clear task evaluation method. The product was deemed challenging to use with Chrome extensions and moving from Playwright to browser-based systems. There were also requests for running on real machines, database comparisons, cron features, and headless browser capabilities.