Local control: in-SDK sidecar + desktop/browser drivers by abonneth · Pull Request #161 · hcompai/hai-agents-python

abonneth · 2026-06-29T16:08:55Z

Made with Cursor

Note

High Risk
Enables remote agents to control the local browser/desktop and optionally run shell commands, page scripts, cookies/storage, and secrets—large security and abuse surface despite opt-in policy flags.

Overview
Adds local computer-use so agents can drive the user’s machine via an in-process sidecar that long-polls the API for commands and executes them on desktop (pyautogui/pynput) or browser (Selenium attached to Chrome’s debug port) drivers.

Packaging: new optional extras desktop, browser, and expanded all; browser JS assets are force-included in the wheel.

SDK: Client / AsyncClient now expose agents and sessions subclasses that auto-wire user_device environments—injecting a deterministic session_id from env id, API key, and capability—before create/update agent or create session calls.

Browser driver (highlighted in diff): SeleniumWebDriver implements navigation (with blocked URL schemes), CDP mouse input, tabs, screenshots/observation bundles, viewport markdown (Defuddle + markdownify), cookies/storage when policy allows, and host checks on secret entry.

CLI: hai local browser / hai local desktop run the sidecar with opt-in flags for shell, scripts, cookies, and secrets. CapabilityPolicy gates dangerous driver methods by default.

^{Reviewed by Cursor Bugbot for commit 8a1810f. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add a deny-by-default CapabilityPolicy that gates which command names a local browser/desktop driver will execute (shell, arbitrary scripts, cookies/storage, and secrets are opt-in), a name-keyed driver registry so one package can host many drivers, and the command-name contract mirroring the hai_drivers interfaces. Co-authored-by: Cursor <cursoragent@cursor.com>

Long-polling sidecar (single-owner lease, connect-time drain, command_uid replay cache + echo), capability policy (deny-by-default with opt-ins), driver registry, pyautogui desktop driver and Selenium browser driver. Co-authored-by: Cursor <cursoragent@cursor.com>

…e open Co-authored-by: Cursor <cursoragent@cursor.com>

…+ config knobs Policy now derives allowed commands from the driver's public methods minus the danger sets (shell/scripts/cookies/secrets), removing the hand-maintained method lists that duplicated the drivers. Replace the driver registry with a direct lazy factory and trim SidecarConfig to essentials. Co-authored-by: Cursor <cursoragent@cursor.com>

- serialize_result recurses into dicts (fixes get_observation_snapshot crash) - browser: reject file/chrome/js/data URLs; real markdown via markdownify; guard get_logs on CDP attach - desktop: run_command merges os.environ instead of replacing it - sidecar: interrupt long-poll on stop, reconnect on 404, back off on 429, tear down driver on shutdown - drop dead dedup cache + racy drain-on-connect (server delivers one cmd at a time, fresh uid, no replay) - split drivers into desktop/ and browser/ subpackages Co-authored-by: Cursor <cursoragent@cursor.com>

…constants Co-authored-by: Cursor <cursoragent@cursor.com>

…down - vendor h.js + defuddle.full.js; execute_script auto-injects hjs with iframe guard - extract_markdown -> Defuddle (main-content, in-browser) - get_viewport_html -> hjs_0x2a.collectViewportHTML() (screen-bounds pruned DOM) - viewport_markdown -> collectViewportHTML then CustomMarkdownify (markdownify), full-page fallback - ship js assets via wheel force-include Co-authored-by: Cursor <cursoragent@cursor.com>

…l` CLI Client now injects the local session_id for any source:"local" environment on create_agent/update_agent/patch_agent and on inline-agent create_session, so callers only pass source:"local" and the env id. Adds `hai local browser` and `hai local desktop` to run the sidecar from the CLI. Co-authored-by: Cursor <cursoragent@cursor.com>

…e, typed envs) - enter_secret clicks (x, y) to focus the target before typing, so the secret lands in the field the agent pointed at instead of stale focus. - get_tab_title honors tab_id by switching, reading, and restoring the tab. - close_active_tab guards against an empty handle list after closing the last tab. - localize_environments/localize_agent now wire source:"local" envs whether they arrive as dicts or typed Pydantic models. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor · 2026-06-29T21:38:35Z

+        if not allow_cookies:
+            allowed -= _COOKIES
+        if not allow_secrets:
+            allowed -= _SECRETS


Script policy bypass via helpers

High Severity

With allow_scripts disabled, CapabilityPolicy only removes execute_script, but other allowed browser driver commands such as get_viewport_html, extract_markdown, scroll_page, and observation_bundle still execute page JavaScript internally, so the CLI --allow-scripts gate does not actually block script execution.

^{Reviewed by Cursor Bugbot for commit 0d1c561. Configure here.}

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8a1810f. Configure here.}

cursor · 2026-06-29T22:01:12Z

+
+        selenium_key = self._key_map.get(key, key)
+        if selenium_key in self._modifiers:
+            self.modifiers_mask ^= self._modifiers_bitmap[selenium_key]


Modifier mask XOR desync

Medium Severity

In release_key, modifier bits in modifiers_mask are cleared with XOR. Releasing a modifier that was never pressed, or releasing the same modifier twice, flips the bit on instead of off. CDP mouse events then use a wrong modifiers value until the mask is corrected.

^{Reviewed by Cursor Bugbot for commit 8a1810f. Configure here.}

abonneth and others added 7 commits June 29, 2026 17:31

fix(local): browser destroy stops chromedriver, leaves attached Chrom…

f6d3585

…e open Co-authored-by: Cursor <cursoragent@cursor.com>

refactor(local): drop leading underscore on new module-level helpers/…

a26e295

…constants Co-authored-by: Cursor <cursoragent@cursor.com>

abonneth marked this pull request as ready for review June 29, 2026 18:15

abonneth requested a review from adeprezh as a code owner June 29, 2026 18:15

cursor Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread src/hai_agents/local/browser/driver.py

Comment thread src/hai_agents/local/browser/driver.py Outdated

Comment thread src/hai_agents/local/browser/driver.py Outdated

cursor Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread src/hai_agents/local/wiring.py

abonneth and others added 2 commits June 29, 2026 23:10

refactor(local): source values user_device/cloud (was local/remote)

0d1c561

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jun 29, 2026

View reviewed changes

refactor(local): source->host in autowiring

8a1810f

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Local control: in-SDK sidecar + desktop/browser drivers#161

Local control: in-SDK sidecar + desktop/browser drivers#161
abonneth wants to merge 11 commits into
mainfrom
antoine/local-control

abonneth commented Jun 29, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 29, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

abonneth commented Jun 29, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 29, 2026

Choose a reason for hiding this comment

Script policy bypass via helpers

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 29, 2026

Choose a reason for hiding this comment

Modifier mask XOR desync

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

abonneth commented Jun 29, 2026 •

edited by cursor Bot

Loading