Local control: in-SDK sidecar + desktop/browser drivers#161
Conversation
Add a deny-by-default CapabilityPolicy that gates which command names a local browser/desktop driver will execute (shell, arbitrary scripts, cookies/storage, and secrets are opt-in), a name-keyed driver registry so one package can host many drivers, and the command-name contract mirroring the hai_drivers interfaces. Co-authored-by: Cursor <cursoragent@cursor.com>
Long-polling sidecar (single-owner lease, connect-time drain, command_uid replay cache + echo), capability policy (deny-by-default with opt-ins), driver registry, pyautogui desktop driver and Selenium browser driver. Co-authored-by: Cursor <cursoragent@cursor.com>
…e open Co-authored-by: Cursor <cursoragent@cursor.com>
…+ config knobs Policy now derives allowed commands from the driver's public methods minus the danger sets (shell/scripts/cookies/secrets), removing the hand-maintained method lists that duplicated the drivers. Replace the driver registry with a direct lazy factory and trim SidecarConfig to essentials. Co-authored-by: Cursor <cursoragent@cursor.com>
- serialize_result recurses into dicts (fixes get_observation_snapshot crash) - browser: reject file/chrome/js/data URLs; real markdown via markdownify; guard get_logs on CDP attach - desktop: run_command merges os.environ instead of replacing it - sidecar: interrupt long-poll on stop, reconnect on 404, back off on 429, tear down driver on shutdown - drop dead dedup cache + racy drain-on-connect (server delivers one cmd at a time, fresh uid, no replay) - split drivers into desktop/ and browser/ subpackages Co-authored-by: Cursor <cursoragent@cursor.com>
…constants Co-authored-by: Cursor <cursoragent@cursor.com>
…down - vendor h.js + defuddle.full.js; execute_script auto-injects hjs with iframe guard - extract_markdown -> Defuddle (main-content, in-browser) - get_viewport_html -> hjs_0x2a.collectViewportHTML() (screen-bounds pruned DOM) - viewport_markdown -> collectViewportHTML then CustomMarkdownify (markdownify), full-page fallback - ship js assets via wheel force-include Co-authored-by: Cursor <cursoragent@cursor.com>
…l` CLI Client now injects the local session_id for any source:"local" environment on create_agent/update_agent/patch_agent and on inline-agent create_session, so callers only pass source:"local" and the env id. Adds `hai local browser` and `hai local desktop` to run the sidecar from the CLI. Co-authored-by: Cursor <cursoragent@cursor.com>
…e, typed envs) - enter_secret clicks (x, y) to focus the target before typing, so the secret lands in the field the agent pointed at instead of stale focus. - get_tab_title honors tab_id by switching, reading, and restoring the tab. - close_active_tab guards against an empty handle list after closing the last tab. - localize_environments/localize_agent now wire source:"local" envs whether they arrive as dicts or typed Pydantic models. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
| if not allow_cookies: | ||
| allowed -= _COOKIES | ||
| if not allow_secrets: | ||
| allowed -= _SECRETS |
There was a problem hiding this comment.
Script policy bypass via helpers
High Severity
With allow_scripts disabled, CapabilityPolicy only removes execute_script, but other allowed browser driver commands such as get_viewport_html, extract_markdown, scroll_page, and observation_bundle still execute page JavaScript internally, so the CLI --allow-scripts gate does not actually block script execution.
Reviewed by Cursor Bugbot for commit 0d1c561. Configure here.
Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8a1810f. Configure here.
|
|
||
| selenium_key = self._key_map.get(key, key) | ||
| if selenium_key in self._modifiers: | ||
| self.modifiers_mask ^= self._modifiers_bitmap[selenium_key] |
There was a problem hiding this comment.
Modifier mask XOR desync
Medium Severity
In release_key, modifier bits in modifiers_mask are cleared with XOR. Releasing a modifier that was never pressed, or releasing the same modifier twice, flips the bit on instead of off. CDP mouse events then use a wrong modifiers value until the mask is corrected.
Reviewed by Cursor Bugbot for commit 8a1810f. Configure here.


Made with Cursor
Note
High Risk
Enables remote agents to control the local browser/desktop and optionally run shell commands, page scripts, cookies/storage, and secrets—large security and abuse surface despite opt-in policy flags.
Overview
Adds local computer-use so agents can drive the user’s machine via an in-process sidecar that long-polls the API for commands and executes them on desktop (pyautogui/pynput) or browser (Selenium attached to Chrome’s debug port) drivers.
Packaging: new optional extras
desktop,browser, and expandedall; browser JS assets are force-included in the wheel.SDK:
Client/AsyncClientnow exposeagentsandsessionssubclasses that auto-wireuser_deviceenvironments—injecting a deterministicsession_idfrom env id, API key, and capability—before create/update agent or create session calls.Browser driver (highlighted in diff):
SeleniumWebDriverimplements navigation (with blocked URL schemes), CDP mouse input, tabs, screenshots/observation bundles, viewport markdown (Defuddle + markdownify), cookies/storage when policy allows, and host checks on secret entry.CLI:
hai local browser/hai local desktoprun the sidecar with opt-in flags for shell, scripts, cookies, and secrets. CapabilityPolicy gates dangerous driver methods by default.Reviewed by Cursor Bugbot for commit 8a1810f. Bugbot is set up for automated code reviews on this repo. Configure here.