Skip to content

Local control: in-SDK sidecar + desktop/browser drivers#161

Open
abonneth wants to merge 11 commits into
mainfrom
antoine/local-control
Open

Local control: in-SDK sidecar + desktop/browser drivers#161
abonneth wants to merge 11 commits into
mainfrom
antoine/local-control

Conversation

@abonneth

@abonneth abonneth commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Made with Cursor


Note

High Risk
Enables remote agents to control the local browser/desktop and optionally run shell commands, page scripts, cookies/storage, and secrets—large security and abuse surface despite opt-in policy flags.

Overview
Adds local computer-use so agents can drive the user’s machine via an in-process sidecar that long-polls the API for commands and executes them on desktop (pyautogui/pynput) or browser (Selenium attached to Chrome’s debug port) drivers.

Packaging: new optional extras desktop, browser, and expanded all; browser JS assets are force-included in the wheel.

SDK: Client / AsyncClient now expose agents and sessions subclasses that auto-wire user_device environments—injecting a deterministic session_id from env id, API key, and capability—before create/update agent or create session calls.

Browser driver (highlighted in diff): SeleniumWebDriver implements navigation (with blocked URL schemes), CDP mouse input, tabs, screenshots/observation bundles, viewport markdown (Defuddle + markdownify), cookies/storage when policy allows, and host checks on secret entry.

CLI: hai local browser / hai local desktop run the sidecar with opt-in flags for shell, scripts, cookies, and secrets. CapabilityPolicy gates dangerous driver methods by default.

Reviewed by Cursor Bugbot for commit 8a1810f. Bugbot is set up for automated code reviews on this repo. Configure here.

abonneth and others added 7 commits June 29, 2026 17:31
Add a deny-by-default CapabilityPolicy that gates which command names a local
browser/desktop driver will execute (shell, arbitrary scripts, cookies/storage,
and secrets are opt-in), a name-keyed driver registry so one package can host
many drivers, and the command-name contract mirroring the hai_drivers interfaces.

Co-authored-by: Cursor <cursoragent@cursor.com>
Long-polling sidecar (single-owner lease, connect-time drain, command_uid
replay cache + echo), capability policy (deny-by-default with opt-ins),
driver registry, pyautogui desktop driver and Selenium browser driver.

Co-authored-by: Cursor <cursoragent@cursor.com>
…e open

Co-authored-by: Cursor <cursoragent@cursor.com>
…+ config knobs

Policy now derives allowed commands from the driver's public methods minus the
danger sets (shell/scripts/cookies/secrets), removing the hand-maintained method
lists that duplicated the drivers. Replace the driver registry with a direct lazy
factory and trim SidecarConfig to essentials.

Co-authored-by: Cursor <cursoragent@cursor.com>
- serialize_result recurses into dicts (fixes get_observation_snapshot crash)
- browser: reject file/chrome/js/data URLs; real markdown via markdownify; guard get_logs on CDP attach
- desktop: run_command merges os.environ instead of replacing it
- sidecar: interrupt long-poll on stop, reconnect on 404, back off on 429, tear down driver on shutdown
- drop dead dedup cache + racy drain-on-connect (server delivers one cmd at a time, fresh uid, no replay)
- split drivers into desktop/ and browser/ subpackages

Co-authored-by: Cursor <cursoragent@cursor.com>
…constants

Co-authored-by: Cursor <cursoragent@cursor.com>
…down

- vendor h.js + defuddle.full.js; execute_script auto-injects hjs with iframe guard
- extract_markdown -> Defuddle (main-content, in-browser)
- get_viewport_html -> hjs_0x2a.collectViewportHTML() (screen-bounds pruned DOM)
- viewport_markdown -> collectViewportHTML then CustomMarkdownify (markdownify), full-page fallback
- ship js assets via wheel force-include

Co-authored-by: Cursor <cursoragent@cursor.com>
@abonneth abonneth marked this pull request as ready for review June 29, 2026 18:15
@abonneth abonneth requested a review from adeprezh as a code owner June 29, 2026 18:15
Comment thread src/hai_agents/local/browser/driver.py
Comment thread src/hai_agents/local/browser/driver.py Outdated
Comment thread src/hai_agents/local/browser/driver.py Outdated
…l` CLI

Client now injects the local session_id for any source:"local" environment on
create_agent/update_agent/patch_agent and on inline-agent create_session, so
callers only pass source:"local" and the env id. Adds `hai local browser` and
`hai local desktop` to run the sidecar from the CLI.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/hai_agents/local/wiring.py
abonneth and others added 2 commits June 29, 2026 23:10
…e, typed envs)

- enter_secret clicks (x, y) to focus the target before typing, so the secret
  lands in the field the agent pointed at instead of stale focus.
- get_tab_title honors tab_id by switching, reading, and restoring the tab.
- close_active_tab guards against an empty handle list after closing the last tab.
- localize_environments/localize_agent now wire source:"local" envs whether they
  arrive as dicts or typed Pydantic models.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
if not allow_cookies:
allowed -= _COOKIES
if not allow_secrets:
allowed -= _SECRETS

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Script policy bypass via helpers

High Severity

With allow_scripts disabled, CapabilityPolicy only removes execute_script, but other allowed browser driver commands such as get_viewport_html, extract_markdown, scroll_page, and observation_bundle still execute page JavaScript internally, so the CLI --allow-scripts gate does not actually block script execution.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0d1c561. Configure here.

Co-authored-by: Cursor <cursoragent@cursor.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8a1810f. Configure here.


selenium_key = self._key_map.get(key, key)
if selenium_key in self._modifiers:
self.modifiers_mask ^= self._modifiers_bitmap[selenium_key]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifier mask XOR desync

Medium Severity

In release_key, modifier bits in modifiers_mask are cleared with XOR. Releasing a modifier that was never pressed, or releasing the same modifier twice, flips the bit on instead of off. CDP mouse events then use a wrong modifiers value until the mask is corrected.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8a1810f. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant