Frequently Asked Questions

Everything you need to know about Browser Operator

General Questions

How is Browser Operator different from ChatGPT or Claude?

While ChatGPT and Claude are AI assistants that answer questions, Browser Operator is a complete browser platform where you and AI work together as partners. Key differences:

Direct web interaction: Browser Operator can actually browse, click, and interact with websites
Real-time collaboration: You and AI work together simultaneously, not in a chat format
Local processing: Your data stays on your machine, not sent to cloud servers
Agent creation: Build custom AI agents for your specific workflows
Open source: Full transparency with auditable code

How do I get started with Browser Operator?

Getting started is simple:

Download: Get the latest release from our GitHub repository
Install: Follow the simple installation instructions
Explore: Try the included demo agents to see collaboration in action
Create: Build your first custom agent using the AI Agent Studio
Join: Connect with our community for support and inspiration

I'm happy with my current browser. Why should I switch?

We get it - switching browsers feels like moving houses. You’ve got your bookmarks organized, passwords saved, and extensions set up just right. Here’s why Browser Operator is worth the move:

Import everything: Your bookmarks, passwords, and settings transfer seamlessly from Chrome, Firefox, or other browsers
It’s still Chromium: Browser Operator is built on Chromium, so your favorite websites work exactly as expected
Keep your extensions: Most Chrome extensions work perfectly in Browser Operator
Gradual adoption: You don’t have to switch completely - use Browser Operator for AI-powered tasks while keeping your regular browser
Immediate value: Within minutes, you’ll automate tasks that would take hours manually

Think of it this way: When smartphones arrived, we didn’t stop making calls - we just gained superpowers. Browser Operator doesn’t replace web browsing; it transforms it into something far more powerful.

Many users tell us they intended to “just try it” but found themselves using Browser Operator as their primary browser within a week because the AI capabilities become indispensable.

Why do I need a new browser instead of an extension?

Browser extensions are limited by security restrictions and can’t provide true human-AI collaboration:

Full browser control: Native integration enables capabilities impossible with extensions
Cross-site workflows: Work seamlessly across multiple websites without security limitations
Performance: Built into the browser engine for faster, more reliable operation
Deep integration: AI collaboration is part of the browser’s core architecture
Privacy: No data sent to extension developers or cloud services

How does Browser Operator make any website have AI agents?

Browser Operator transforms every website into an AI-enabled platform by injecting intelligent capabilities directly into the browser:

Universal AI Integration: Every website automatically gains AI agent capabilities without needing website-specific modifications
Contextual Understanding: AI agents can read, understand, and interact with any webpage content in real-time
Cross-site Workflows: Agents can work across multiple websites simultaneously, carrying context and data between them
Dynamic Adaptation: AI adapts to different website layouts, forms, and interfaces automatically
No Permissions Needed: Works immediately on any website without requiring special access or API integrations

For example, an AI agent can help you compare products across shopping sites, research topics across academic databases, or automate form filling across various platforms - all working seamlessly regardless of the website’s original design.

Can I create my own AI agents?

Yes! Browser Operator includes a built-in AI Agent Studio:

No coding required: Visual tools make agent creation accessible to everyone
Custom workflows: Build agents for your specific tasks and needs
Community sharing: Share your agents and discover ones made by others
Instant deployment: Test and use your agents immediately
Continuous improvement: Agents can learn and improve from your feedback

Is my AI agent interaction private?

Yes, Browser Operator’s AI agent layer is designed with privacy at its core:

What We Control (AI Agent Layer):

Local agent processing available: Run AI agents entirely on your machine with local LLMs
No agent telemetry: We don’t track or collect your AI agent usage
Your prompts, your control: Choose where AI processing happens - locally or cloud
Open-source agent code: Audit exactly how agents handle your data
No data retention: We never store your agent interactions or automations

Your Choices:

Maximum Privacy: Use local LLM models - all agent processing stays on your device
Cloud LLMs: If you choose OpenAI, Claude, etc., your prompts follow their privacy policies
Hybrid Usage: Use local models for sensitive automation, cloud for general tasks

Note: Browser Operator is built on Chromium, so standard web browsing follows Chromium’s privacy model. Our privacy guarantees apply specifically to the AI agent layer we’ve built.

Can I use OpenAI GPT-4 or Claude API for more powerful models?

Yes! Browser Operator supports multiple LLM providers for enhanced AI capabilities:

OpenAI Integration: Connect your OpenAI API key to use GPT-4, GPT-4 Turbo, and other OpenAI models
Claude API Support: Use Anthropic’s Claude models through API integration
LiteLLM Proxy: Connect to any LLM provider supported by LiteLLM, including Google Gemini, Mistral AI, Local models via Ollama, Azure OpenAI, and many more
Model Switching: Easily switch between different models based on your task requirements
Cost Control: Set usage limits and monitor API costs directly in the browser

To configure external APIs, go to the AI settings panel in Browser Operator and add your API keys. The browser will securely store them locally and never send them to our servers.

Why make Browser Operator open source?

We are developers who value freedom and privacy. By building open source, it helps us bring light into opaque agents that other platforms don’t share. Here’s why open source matters to us:

Transparency: You can inspect exactly how AI agents work and what they do with your data
Trust through verification: No hidden behaviors or secret data collection - everything is auditable
Community-driven innovation: Anyone can contribute improvements and new features
Ethical AI development: Open source ensures AI tools remain accountable to users, not corporations
Educational value: Developers can learn from and build upon our work
Long-term sustainability: The project can’t be shut down or locked behind paywalls by a single company

We believe the future of AI should be open, transparent, and controlled by users - not hidden behind proprietary walls.

Technical Deep Dive

Why make an AI Agent platform?

We believe that people need choice to customize the way their agents work, based on their own approach. Many AI agents currently available use pre-defined prompts that are generic to all of them but not specific to how you handle tasks. By building an easy to implement platform we allow users to not only see and understand how the AI Agents work but also customize to work well for you.

Personal workflows: Everyone has unique ways of working - your agents should adapt to you, not the other way around
Domain expertise: Create specialized agents that understand your industry’s specific terminology and processes
Iterative improvement: See exactly how agents make decisions and refine them based on real results
No black boxes: Unlike closed platforms, you can see and modify every prompt and behavior
Collaborative development: Share your custom agents with teams or the community to help others
Future-proof flexibility: As your needs change, your agents can evolve with you

The power of AI should be in your hands, shaped by your expertise and tailored to your specific needs.

What tools do the AI Agents have access to?

We have: URL Navigation, schema extraction to convert a webpage to structured data, screenshot capture for visual understanding of the webpage when DOM is not enough, action tools to perform typing, form filling, clicks, scroll and so on, HTML to Markdown tools to convert a page to Markdown, Document Semantic Search to search data stored in VectorDB, and Document Store to store the data in VectorDB.

URL Navigation: Navigate to any website, go back/forward, refresh pages
Schema Extraction: Convert unstructured web content into structured JSON/CSV data
Screenshot Capture: Visual understanding for complex layouts or when DOM parsing isn’t sufficient
Action Tools: Click buttons, fill forms, type text, scroll, hover, and interact with any element
HTML to Markdown: Clean conversion of web pages to readable markdown format
Document Store: Save important information to VectorDB for later retrieval
Semantic Search: Intelligently search through stored documents using natural language

These tools work together to enable complex workflows. For example, an agent could navigate to multiple product pages, extract pricing data, store it in the VectorDB, and later search for the best deals using semantic search.

Why build a multi-agent framework?

We found that smaller, focused agents work better at solving tasks than generic single agents. Our research has found that it’s better to compose a pattern of orchestrator agent and execution agents. You could use a powerful thinking model as orchestrator agent while small open source models for execution agents. This way you can get the speed of smaller agents while using the orchestrator agent’s thinking approach to handle unknown tasks.

Specialized expertise: Each agent can be optimized for specific tasks (research, data extraction, form filling, etc.)
Parallel processing: Multiple agents can work simultaneously on different parts of a complex task
Cost optimization: Use expensive models only for orchestration, cheaper models for execution
Faster response times: Small, focused agents respond much quicker than large general-purpose models
Better error handling: If one agent fails, others can continue or retry with different approaches
Modular architecture: Easy to add, remove, or update individual agents without affecting the entire system
Scalable complexity: Handle both simple and complex tasks by dynamically composing agent teams

This architecture mirrors how human teams work - a manager coordinates while specialists execute, resulting in better outcomes than any individual could achieve alone.

This multi-agent approach is becoming the industry standard, as seen in:

I want to know more about your Multi-Agent framework

Every user message is sent to the Orchestrator agent which then calls Tool Agents like tool calling. Each tool-calling agent has its own memory context and system prompt and they in turn have more tool agents or regular tools. This way you can build layers upon layers until the final task by LLM is a single composable step. The Orchestrator agent only sees the final response of the tool-calling agent response - this way the orchestrator’s context is not depleted quickly.

Hierarchical structure: Orchestrator → Tool Agents → Sub-agents → Final tools
Isolated contexts: Each agent maintains its own memory and conversation history
Clean interfaces: Agents communicate through well-defined inputs and outputs
Context preservation: Orchestrator’s context stays clean by only seeing final results
Composable architecture: Complex tasks broken down into single, manageable steps
Scalable depth: Add layers of agents as needed for task complexity

Will we need multi-agent architecture if the context window becomes larger? Maybe not, but with the current limits of LLM models, we see multi-agent architecture solves the limitations. Think of it as microservices in normal scalable systems.

Just like microservices allow you to scale different parts of your application independently, our multi-agent framework lets you optimize each agent for its specific task - using different models, prompts, and tools as needed.

Why have AI agents on the Browser instead of the cloud

We could have AI Agents run on the cloud and access your browser remotely - this would be like cloud agents calling remote tools on your machine. But this approach creates fundamental problems:

Security nightmare: You’d need to expose your browser to the internet for cloud agents to control it
Complex setup: Users would need to run local servers, configure ports, manage authentication
Privacy concerns: All your browsing data would flow through cloud servers
Dependency hell: Can’t run agents without cloud access AND local framework running
Latency issues: Every click, scroll, or action requires a round trip to the cloud
Session management: Cloud agents can’t easily maintain your logged-in sessions

By building agents directly into the browser, we eliminate these problems. The browser becomes the runtime environment for agents - no external dependencies, no exposed ports, no privacy concerns. It’s like having a powerful computer that can also browse the web, rather than trying to remote control a browser from far away.

This is similar to why we run JavaScript in browsers rather than on remote servers - direct access, better performance, and simpler architecture.

Can I use custom trained LLM models?

Yes, you can use LiteLLM proxy to proxy a model that you run through vLLM or Llama.cpp. This is actually better because sometimes you might need a better price to performance ratio that can be achieved through training open source models with your custom data and then using them.

Cost efficiency: Custom models can be much cheaper to run than commercial APIs for specific tasks
Specialized performance: Fine-tuned models often outperform general models on domain-specific tasks
Data privacy: Keep sensitive training data and inference completely on your infrastructure
Local inference: Run models on your own hardware using vLLM or Llama.cpp
Complete control: Customize model behavior, response format, and performance characteristics

Here’s an example workflow you can follow to train your model and then use it with the agents running on Browser Operator: ART·E: How We Built an Email Research Agent That Beats o3

This approach lets you create highly specialized agents that are both cost-effective and performant for your specific use cases.

Do you support MCP and Agent protocols?

Yes, we plan to add support for tools such that you can bring your own tools. For example, we can navigate to Gmail and Google Calendar and understand context or perform actions, but we can also support MCP connectors to Gmail or GCalendar so that the tool calling is simpler for agents. Agent protocol actually fits well into the multi-agent architecture because we can hand off some tasks to agents running on cloud so whichever ones you connect/configure can do async tasks or scheduled tasks.

Bring your own tools: Integrate custom MCP connectors and tools into the agent ecosystem
Simplified tool calling: Use MCP connectors for direct API access instead of web scraping
Cloud agent handoffs: Delegate tasks to agents running on remote servers
Async task support: Handle long-running operations in the background
Scheduled workflows: Configure agents to run tasks at specific times or intervals
Hybrid approach: Combine web automation with API-based tools for maximum flexibility

For example, an agent could use web automation to research products on e-commerce sites, then use MCP connectors to add events to your calendar or send summaries via email APIs.

If you are interested in adding MCP support or any agent protocol, feel free to create a PR to our GitHub repository. We are always looking for people who are interested in contributing to the community.

Still have questions?

Can't find what you're looking for? Join our community or reach out directly.

Join Discord GitHub Issues

Browser Operator is revolutionizing how we interact with the web through AI-powered automation and collaboration.