Google Crashes Copilot Vision, Computer Use Party with Mariner


SOURCE: ANALYTICSINDIAMAG.COM
DEC 11, 2024

  • by Supreeth Koundinya

Google has announced an early-stage research prototype, Project Mariner, which will understand and reason based on information that can be accessed while a user navigates on a web browser. This feature is built on top of Google’s latest Gemini 2.0.

Google also says that the agent uses information it sees on the screen through a Google Chrome extension to complete related tasks. The agent will be able to read information, like text, code, images, forms and even voice-based instructions.

The agent is also capable of navigating and interacting with websites on the user’s behalf and automating certain tasks.

The company, in a demo video, showcased Project Mariner’s capabilities. The agent was prompted to find a painting of ‘the most famous post-impressionist’ from Google Arts and Culture and clubbed it with an unrelated task, which involved adding ‘colourful paints’ to an Etsy cart.

Project Mariner then fed the instructions to Gemini to find the artist and the painting, fetched details, and then automatically redirected the user to Google Arts and Culture. Later, it searched for the painting on the website. For the next task, it navigated to Etsy and added a set of watercolours to the shopping cart.

During the process, Project Mariner understood the instructions and further broke them into step-by-step actionable tasks. The tool performed actions in the active tab and not through any background activity.

Project Mariner is available through a ‘Trusted Tester Waitlist’. Along with this announcement, Google also officially unveiled the Gemini 2.0 family of models, starting with Gemini 2.0 Flash.

Google also announced updates to Project Astra, such as better dialogue and memory capabilities and the ability to use external tools. Along with Project Mariner, Google also unveiled Jules, an AI code agent that can be directly integrated into a GitHub workflow.

That said, Google’s agent arrived just days after Microsoft announced Copilot Vision as an experimental feature.

Copilot Vision can read and analyse web pages and can provide relevant summaries and information to the user. However, unlike Project Mariner, Copilot Vision cannot take actions on behalf of the user.

Therefore, Google’s only real competitor is Anthropic’s Computer Use, which not only performs autonomous actions but is also not restricted to a browser environment. Many developers are already experimenting with Computer Use, and most recently, Hume AI explored a capability that lets you control your desktop just by using your Voice.

It will be interesting to see what OpenAI’s rumoured ‘Project Operator’ is going to look like. A few days ago, OpenAI demonstrated an agent based on GPT 4o at the GenerationAI Conference in Paris, where it assisted in customer issues.

It is possible that OpenAI will officially announce features along these lines at the ongoing 12 Days of OpenAI events.

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.