Frequently Asked Questions
Everything you need to know about Browser Operator
General Questions
How is Browser Operator different from ChatGPT or Claude?
While ChatGPT and Claude are AI assistants that answer questions, Browser Operator is a complete browser platform where you and AI work together as partners. Key differences:
- Direct web interaction: Browser Operator can actually browse, click, and interact with websites
- Real-time collaboration: You and AI work together simultaneously, not in a chat format
- Local processing: Your data stays on your machine, not sent to cloud servers
- Agent creation: Build custom AI agents for your specific workflows
- Open source: Full transparency with auditable code
How do I get started with Browser Operator?
Getting started is simple:
- Download: Get the latest release from our GitHub repository
- Install: Follow the simple installation instructions
- Explore: Try the included demo agents to see collaboration in action
- Create: Build your first custom agent using the AI Agent Studio
- Join: Connect with our community for support and inspiration
I'm happy with my current browser. Why should I switch?
We get it - switching browsers feels like moving houses. You’ve got your bookmarks organized, passwords saved, and extensions set up just right. Here’s why Browser Operator is worth the move:
- Import everything: Your bookmarks, passwords, and settings transfer seamlessly from Chrome, Firefox, or other browsers
- It’s still Chromium: Browser Operator is built on Chromium, so your favorite websites work exactly as expected
- Keep your extensions: Most Chrome extensions work perfectly in Browser Operator
- Gradual adoption: You don’t have to switch completely - use Browser Operator for AI-powered tasks while keeping your regular browser
- Immediate value: Within minutes, you’ll automate tasks that would take hours manually
Think of it this way: When smartphones arrived, we didn’t stop making calls - we just gained superpowers. Browser Operator doesn’t replace web browsing; it transforms it into something far more powerful.
Many users tell us they intended to “just try it” but found themselves using Browser Operator as their primary browser within a week because the AI capabilities become indispensable.
Why do I need a new browser instead of an extension?
Browser extensions are limited by security restrictions and can’t provide true human-AI collaboration:
- Full browser control: Native integration enables capabilities impossible with extensions
- Cross-site workflows: Work seamlessly across multiple websites without security limitations
- Performance: Built into the browser engine for faster, more reliable operation
- Deep integration: AI collaboration is part of the browser’s core architecture
- Privacy: No data sent to extension developers or cloud services
How does Browser Operator make any website have AI agents?
Browser Operator transforms every website into an AI-enabled platform by injecting intelligent capabilities directly into the browser:
- Universal AI Integration: Every website automatically gains AI agent capabilities without needing website-specific modifications
- Contextual Understanding: AI agents can read, understand, and interact with any webpage content in real-time
- Cross-site Workflows: Agents can work across multiple websites simultaneously, carrying context and data between them
- Dynamic Adaptation: AI adapts to different website layouts, forms, and interfaces automatically
- No Permissions Needed: Works immediately on any website without requiring special access or API integrations
For example, an AI agent can help you compare products across shopping sites, research topics across academic databases, or automate form filling across various platforms - all working seamlessly regardless of the website’s original design.
Can I create my own AI agents?
Yes! Browser Operator includes a built-in AI Agent Studio:
- No coding required: Visual tools make agent creation accessible to everyone
- Custom workflows: Build agents for your specific tasks and needs
- Community sharing: Share your agents and discover ones made by others
- Instant deployment: Test and use your agents immediately
- Continuous improvement: Agents can learn and improve from your feedback
Is my AI agent interaction private?
Yes, Browser Operator’s AI agent layer is designed with privacy at its core:
What We Control (AI Agent Layer):
- Local agent processing available: Run AI agents entirely on your machine with local LLMs
- No agent telemetry: We don’t track or collect your AI agent usage
- Your prompts, your control: Choose where AI processing happens - locally or cloud
- Open-source agent code: Audit exactly how agents handle your data
- No data retention: We never store your agent interactions or automations
Your Choices:
- Maximum Privacy: Use local LLM models - all agent processing stays on your device
- Cloud LLMs: If you choose OpenAI, Claude, etc., your prompts follow their privacy policies
- Hybrid Usage: Use local models for sensitive automation, cloud for general tasks
Note: Browser Operator is built on Chromium, so standard web browsing follows Chromium’s privacy model. Our privacy guarantees apply specifically to the AI agent layer we’ve built.
Can I use OpenAI GPT-4 or Claude API for more powerful models?
Yes! Browser Operator supports multiple LLM providers for enhanced AI capabilities:
- OpenAI Integration: Connect your OpenAI API key to use GPT-4, GPT-4 Turbo, and other OpenAI models
- Claude API Support: Use Anthropic’s Claude models through API integration
- LiteLLM Proxy: Connect to any LLM provider supported by LiteLLM, including Google Gemini, Mistral AI, Local models via Ollama, Azure OpenAI, and many more
- Model Switching: Easily switch between different models based on your task requirements
- Cost Control: Set usage limits and monitor API costs directly in the browser
To configure external APIs, go to the AI settings panel in Browser Operator and add your API keys. The browser will securely store them locally and never send them to our servers.
Why make Browser Operator open source?
We are developers who value freedom and privacy. By building open source, it helps us bring light into opaque agents that other platforms don’t share. Here’s why open source matters to us:
- Transparency: You can inspect exactly how AI agents work and what they do with your data
- Trust through verification: No hidden behaviors or secret data collection - everything is auditable
- Community-driven innovation: Anyone can contribute improvements and new features
- Ethical AI development: Open source ensures AI tools remain accountable to users, not corporations
- Educational value: Developers can learn from and build upon our work
- Long-term sustainability: The project can’t be shut down or locked behind paywalls by a single company
We believe the future of AI should be open, transparent, and controlled by users - not hidden behind proprietary walls.
Technical Deep Dive
Why make an AI Agent platform?
We believe that people need choice to customize the way their agents work, based on their own approach. Many AI agents currently available use pre-defined prompts that are generic to all of them but not specific to how you handle tasks. By building an easy to implement platform we allow users to not only see and understand how the AI Agents work but also customize to work well for you.
- Personal workflows: Everyone has unique ways of working - your agents should adapt to you, not the other way around
- Domain expertise: Create specialized agents that understand your industry’s specific terminology and processes
- Iterative improvement: See exactly how agents make decisions and refine them based on real results
- No black boxes: Unlike closed platforms, you can see and modify every prompt and behavior
- Collaborative development: Share your custom agents with teams or the community to help others
- Future-proof flexibility: As your needs change, your agents can evolve with you
The power of AI should be in your hands, shaped by your expertise and tailored to your specific needs.
What tools do the AI Agents have access to?
We have: URL Navigation, schema extraction to convert a webpage to structured data, screenshot capture for visual understanding of the webpage when DOM is not enough, action tools to perform typing, form filling, clicks, scroll and so on, HTML to Markdown tools to convert a page to Markdown, Document Semantic Search to search data stored in VectorDB, and Document Store to store the data in VectorDB.
- URL Navigation: Navigate to any website, go back/forward, refresh pages
- Schema Extraction: Convert unstructured web content into structured JSON/CSV data
- Screenshot Capture: Visual understanding for complex layouts or when DOM parsing isn’t sufficient
- Action Tools: Click buttons, fill forms, type text, scroll, hover, and interact with any element
- HTML to Markdown: Clean conversion of web pages to readable markdown format
- Document Store: Save important information to VectorDB for later retrieval
- Semantic Search: Intelligently search through stored documents using natural language
These tools work together to enable complex workflows. For example, an agent could navigate to multiple product pages, extract pricing data, store it in the VectorDB, and later search for the best deals using semantic search.
Why build a multi-agent framework?
We found that smaller, focused agents work better at solving tasks than generic single agents. Our research has found that it’s better to compose a pattern of orchestrator agent and execution agents. You could use a powerful thinking model as orchestrator agent while small open source models for execution agents. This way you can get the speed of smaller agents while using the orchestrator agent’s thinking approach to handle unknown tasks.
- Specialized expertise: Each agent can be optimized for specific tasks (research, data extraction, form filling, etc.)
- Parallel processing: Multiple agents can work simultaneously on different parts of a complex task
- Cost optimization: Use expensive models only for orchestration, cheaper models for execution
- Faster response times: Small, focused agents respond much quicker than large general-purpose models
- Better error handling: If one agent fails, others can continue or retry with different approaches
- Modular architecture: Easy to add, remove, or update individual agents without affecting the entire system
- Scalable complexity: Handle both simple and complex tasks by dynamically composing agent teams
This architecture mirrors how human teams work - a manager coordinates while specialists execute, resulting in better outcomes than any individual could achieve alone.
This multi-agent approach is becoming the industry standard, as seen in:
I want to know more about your Multi-Agent framework
Every user message is sent to the Orchestrator agent which then calls Tool Agents like tool calling. Each tool-calling agent has its own memory context and system prompt and they in turn have more tool agents or regular tools. This way you can build layers upon layers until the final task by LLM is a single composable step. The Orchestrator agent only sees the final response of the tool-calling agent response - this way the orchestrator’s context is not depleted quickly.
- Hierarchical structure: Orchestrator → Tool Agents → Sub-agents → Final tools
- Isolated contexts: Each agent maintains its own memory and conversation history
- Clean interfaces: Agents communicate through well-defined inputs and outputs
- Context preservation: Orchestrator’s context stays clean by only seeing final results
- Composable architecture: Complex tasks broken down into single, manageable steps
- Scalable depth: Add layers of agents as needed for task complexity
Will we need multi-agent architecture if the context window becomes larger? Maybe not, but with the current limits of LLM models, we see multi-agent architecture solves the limitations. Think of it as microservices in normal scalable systems.
Just like microservices allow you to scale different parts of your application independently, our multi-agent framework lets you optimize each agent for its specific task - using different models, prompts, and tools as needed.
Why have AI agents on the Browser instead of the cloud
We could have AI Agents run on the cloud and access your browser remotely - this would be like cloud agents calling remote tools on your machine. But this approach creates fundamental problems:
- Security nightmare: You’d need to expose your browser to the internet for cloud agents to control it
- Complex setup: Users would need to run local servers, configure ports, manage authentication
- Privacy concerns: All your browsing data would flow through cloud servers
- Dependency hell: Can’t run agents without cloud access AND local framework running
- Latency issues: Every click, scroll, or action requires a round trip to the cloud
- Session management: Cloud agents can’t easily maintain your logged-in sessions
By building agents directly into the browser, we eliminate these problems. The browser becomes the runtime environment for agents - no external dependencies, no exposed ports, no privacy concerns. It’s like having a powerful computer that can also browse the web, rather than trying to remote control a browser from far away.
This is similar to why we run JavaScript in browsers rather than on remote servers - direct access, better performance, and simpler architecture.
Can I use custom trained LLM models?
Yes, you can use LiteLLM proxy to proxy a model that you run through vLLM or Llama.cpp. This is actually better because sometimes you might need a better price to performance ratio that can be achieved through training open source models with your custom data and then using them.
- Cost efficiency: Custom models can be much cheaper to run than commercial APIs for specific tasks
- Specialized performance: Fine-tuned models often outperform general models on domain-specific tasks
- Data privacy: Keep sensitive training data and inference completely on your infrastructure
- Local inference: Run models on your own hardware using vLLM or Llama.cpp
- Complete control: Customize model behavior, response format, and performance characteristics
Here’s an example workflow you can follow to train your model and then use it with the agents running on Browser Operator: ART·E: How We Built an Email Research Agent That Beats o3
This approach lets you create highly specialized agents that are both cost-effective and performant for your specific use cases.
Do you support MCP and Agent protocols?
Yes, we plan to add support for tools such that you can bring your own tools. For example, we can navigate to Gmail and Google Calendar and understand context or perform actions, but we can also support MCP connectors to Gmail or GCalendar so that the tool calling is simpler for agents. Agent protocol actually fits well into the multi-agent architecture because we can hand off some tasks to agents running on cloud so whichever ones you connect/configure can do async tasks or scheduled tasks.
- Bring your own tools: Integrate custom MCP connectors and tools into the agent ecosystem
- Simplified tool calling: Use MCP connectors for direct API access instead of web scraping
- Cloud agent handoffs: Delegate tasks to agents running on remote servers
- Async task support: Handle long-running operations in the background
- Scheduled workflows: Configure agents to run tasks at specific times or intervals
- Hybrid approach: Combine web automation with API-based tools for maximum flexibility
For example, an agent could use web automation to research products on e-commerce sites, then use MCP connectors to add events to your calendar or send summaries via email APIs.
If you are interested in adding MCP support or any agent protocol, feel free to create a PR to our GitHub repository. We are always looking for people who are interested in contributing to the community.
Still have questions?
Can't find what you're looking for? Join our community or reach out directly.
Browser Operator is revolutionizing how we interact with the web through AI-powered automation and collaboration.