Releases: brightdata/sdk-python
v2.4.0 — Sync parity, colorless job verbs, dataset error reporting
Highlights
- Sync client parity —
SyncBrightDataClientnow mirrors the async surface. Addsclient.datasets(fixes the
AttributeError), the 5 missing scrapers (scrape.tiktok/youtube/reddit/perplexity/digikey), the 2 missing
search verticals (search.tiktok/youtube), Pinterest trigger/status/fetch, and Instagram-searchprofiles/reels_all. - Service-level job verbs (colorless pattern) — every scraper now exposes generic
status/wait/fetch/
to_result(snapshot_id)(onBaseWebScraper), andDiscoverServicegainsstatus/wait/fetch/to_result(task_id).
A triggered job can be driven by its id alone, like the crawler. Purely additive — existingjob.fetch()etc. are unchanged. - Discover sync manual path —
SyncBrightDataClientaddsdiscover_status/discover_wait/discover_fetch/
discover_to_result(task_id)plus a colorlessDiscoverSnapshothandle. - Better dataset errors — failed snapshots now expose the API failure reason (and the raw status response as a fallback)
plus thesnapshot_id, instead ofDatasetError: Snapshot failed: None.SnapshotStatusretains the full response (.raw)
and matches more reason keys (error/error_message/message/failure_reason). The sync path inherits the fix.
⚠️ Breaking / Contract change
- Sync
discover_trigger()now returns aDiscoverSnapshot(a typed, drivable handle) instead of the async-onlyDiscoverJob,
which couldn't be used from sync.
Migration: poll viaclient.discover_status(snap.task_id)/client.discover_fetch(snap.task_id).
Full changelog: https://github.com/brightdata/sdk-python/blob/main/CHANGELOG.md
v2.3.1
Add browser-api to readme & various fixes to discovery api and scrapers
v2.3.0 — Scraper Studio, Cleanup & Test Suite Rewrite
Bright Data SDK Release Notes (v2.3.0)
We are excited to announce the latest release of the Bright Data SDK! This update brings major new capabilities to our Web Scraper API, introduces new supported targets, and includes a massive under-the-hood cleanup to improve stability, maintainability, and test coverage.
New Features
- Scraper Studio Integration: You can now seamlessly trigger and fetch results directly from your custom scrapers built within Bright Data's IDE.
- New Built-in Scrapers: Added official out-of-the-box support for DigiKey and Reddit scrapers.
Bug Fixes
- Scraping Reliability: Resolved a critical issue that caused a crash when calling
ScrapeJob.to_result(), ensuring smoother data extraction workflows.
Maintenance & Code Quality
- Codebase Cleanup: We've performed a major spring cleaning, removing dead code and deprecating legacy modules. This resulted in a significantly leaner SDK with a net reduction of 12,000 lines of code.
- Enhanced Test Coverage: Added 365 new unit tests utilizing shared fixtures. Our key modules now boast robust test coverage ranging from 87% to 98%, ensuring greater reliability for future updates.
v2.2.1 — Datasets API with 100+ Integrations
What's New
Datasets API
Access 100+ ready-made datasets from Bright Data — pre-collected, structured data from popular platforms.
- Callable datasets — trigger snapshots directly: await client.datasets.imdb_movies(filter=..., records_limit=5)
- sample() method — quick data sampling without specifying filters
- get_metadata() — discover available fields and types per dataset
- Export utilities — export_json() , export_csv() , export_jsonl()
Supported Categories
E-commerce (Amazon, Walmart, Shopee, Zalando, Zara, H&M, IKEA, Shein, Sephora), Social media (Instagram, TikTok, Pinterest,
YouTube, Facebook), Business intelligence (ZoomInfo, PitchBook, Owler, Slintel, G2, Trustpilot), Jobs & HR (Glassdoor, Indeed,
Xing), Real estate (Zillow, Airbnb + 8 regional), Luxury brands (Chanel, Dior, Prada, Hermes, YSL), Entertainment (IMDB, NBA,
Goodreads), and more.
Fixes
- LinkedIn search tests updated to match pythonic parameter names ( first_name / last_name instead of firstName / lastName )
v2.1.1 - Instagram Scrapers & Version Centralization
What's New
Instagram Scraper Support
| Method | Description |
|---|---|
| client.scrape.instagram.profiles/posts/reels/comments(url) | Extract data from URL |
| client.search.instagram.profiles(user_name) | Find profile by username |
| client.search.instagram.posts/reels/reels_all(url, ...) | Discover content with filters |
Improvements
- Version centralization - Single source of truth in pyproject.toml
- Bug fix - Discovery endpoints now correctly include type=discover_new&discover_by=... query params
Full Changelog: v2.1.0...v2.1.1
v2.1.0 - Async Mode for SERP and Web Unlocker
What's New
Async Mode
Non-blocking async mode for SERP and Web Unlocker APIs using mode="async":
SERP
result = await client.search.google(query="python", mode="async")
Web Unlocker
result = await client.scrape_url(url="https://example.com", mode="async")
How it works: Triggers request → gets response_id → polls until ready
Bug Fixes
- Fix SyncBrightDataClient: remove unused customer_id parameter
- Fix default poll_timeout for Web Unlocker async mode
API Changes
- Remove _async suffix from method names (products() instead of products_async())
- Remove GenericScraper - use client.scrape_url() directly
Documentation
- Added docs/async_mode_guide.md
Full Changelog: https://github.com/brightdata/sdk-python/blob/main/CHANGELOG.md
v2.0.0 - Breaking Changes
🚀 v2.0.0 - Complete Architecture Rewrite
⚠️ Breaking Changes - Migration Required
This is a major breaking release requiring code changes. Python 3.9+ now required.
Client Initialization
# ❌ Old
from brightdata import bdclient
client = bdclient(api_token="your_token")
# ✅ New
from brightdata import BrightDataClient
client = BrightDataClient(token="your_token")API Structure - Hierarchical Methods
# ❌ Old - Flat API
client.scrape_linkedin.profiles(url)
client.search_linkedin.jobs()
result = client.scrape(url, zone="my_zone")
# ✅ New - Hierarchical API
client.scrape.linkedin.profiles(url)
client.search.linkedin.jobs()
result = client.scrape_url(url, zone="my_zone")Platform-Specific Scraping
# ✅ New - Recommended approach
client.scrape.amazon.products(url)
client.scrape.amazon.reviews(url)
client.scrape.amazon.sellers(url)
client.scrape.linkedin.profiles(url)
client.scrape.instagram.profiles(url)
client.scrape.facebook.posts(url)Search Operations
# ❌ Old
results = client.search(query, search_engine="google")
# ✅ New - Dedicated methods
client.search.google(query)
client.search.bing(query)
client.search.yandex(query)Async Support (New)
# ✅ Sync (still supported)
client = BrightDataClient(token="...")
result = client.scrape_url(url)
# ✅ Async (recommended for performance)
async with BrightDataClient(token="...") as client:
result = await client.scrape_url_async(url)
# ✅ Async batch operations
async def scrape_multiple():
async with BrightDataClient(token="...") as client:
tasks = [client.scrape_url_async(url) for url in urls]
results = await asyncio.gather(*tasks)Manual Job Control (New)
# ✅ Fine-grained control
job = await scraper.trigger(url)
# Do other work...
status = await job.status_async()
if status == "ready":
data = await job.fetch_async()Type-Safe Payloads (New)
# ❌ Old - untyped dicts
payload = {"url": "...", "reviews_count": 100}
# ✅ New - structured with validation
from brightdata import AmazonProductPayload
payload = AmazonProductPayload(
url="https://amazon.com/dp/B123",
reviews_count=100
)
result = client.scrape.amazon.products(payload)Return Types
# ✅ New - structured objects with metadata
result = client.scrape.amazon.products(url)
print(result.data) # Actual scraped data
print(result.timing) # Performance metrics
print(result.cost) # Cost tracking
print(result.snapshot_id) # Job identifierCLI Tool (New)
# ✅ Command-line interface
brightdata scrape amazon products --url https://amazon.com/dp/B123
brightdata search google --query "python sdk"
brightdata search linkedin jobs --location "Paris"
brightdata crawler discover --url https://example.com --depth 3Configuration Changes
# ❌ Old
client = bdclient(
api_token="token", # Changed parameter name
auto_create_zones=True, # Default changed to False
web_unlocker_zone="sdk_unlocker", # Default changed
serp_zone="sdk_serp", # Default changed
browser_zone="sdk_browser" # Default changed
)
# ✅ New
client = BrightDataClient(
token="token", # Renamed from api_token
auto_create_zones=False, # New default
web_unlocker_zone="web_unlocker1", # New default name
serp_zone="serp_api1", # New default name
browser_zone="browser_api1", # New default name
timeout=30, # New parameter
rate_limit=10, # New parameter (optional)
rate_period=1.0 # New parameter
)✨ New Features
Platform Coverage
| Platform | Status | Methods |
|---|---|---|
| Amazon | ✅ NEW | products(), reviews(), sellers() |
| ✅ NEW | profiles(), posts(), comments(), reels() |
|
| ✅ NEW | posts(), comments(), groups() |
|
| ✅ Enhanced | Full scraping and search | |
| ChatGPT | ✅ Enhanced | Improved interaction |
| Google/Bing/Yandex | ✅ Enhanced | Dedicated services |
Performance
- ⚡ 10x better concurrency - Event loop-based architecture
- 🔌 Advanced connection pooling - 100 total, 30 per host
- 🎯 Built-in rate limiting - Configurable request throttling
✅ Upgrade Checklist
- Update Python to 3.9+
- Change imports:
bdclient→BrightDataClient - Update parameter:
api_token=→token= - Migrate method calls to hierarchical structure
- Handle new
ScrapeResult/SearchResultreturn types - Review zone configuration defaults
- Consider async for better performance
- Test in staging environment
📚 Resources
Full Changelog: v1.1.3...v2.0.0
v1.1.3
New Features:
- Added url parameter to extract function for direct URL specification
- Added output_scheme parameter for OpenAI Structured Outputs support
- Enhanced parse_content to auto-detect multiple results from batch operations
Improvements:
- Added user-agent headers to all dataset API requests for better tracking
- Improved schema validation for OpenAI Structured Outputs compatibility
- Updated examples with proper formatting
Bug Fixes:
- Fixed parse_content handling of multiple scraping results
- Fixed OpenAI schema validation requirements
v1.1.2: AI-Powered Extract Function and LinkedIn Sync Improvements
New Features
- AI-Powered Extract Function: New
extract()function that combines web scraping with OpenAI's language models to extract targeted information from web pages using natural language queries - LinkedIn Sync Mode Fix: Fixed LinkedIn scraping sync mode to use the correct API endpoint and request structure for immediate data retrieval
Improvements
- Set sync=True as default for all LinkedIn scraping methods for better user experience
- Improved unit test coverage
- Enhanced error handling for LinkedIn API responses
Examples
- Added
extract_example.pydemonstrating AI-powered content extraction capabilities - Updated LinkedIn examples to showcase sync functionality
Technical Changes
- Use correct
/scrapeendpoint for synchronous LinkedIn requests - Pass dataset_id as URL parameter with proper flags
- Handle both 200 and 202 status codes appropriately
- Maintain backward compatibility for async operations
v1.1.1: Documentation Updates & Bug Fixes
Updates
- Enhanced README with examples for
crawl(),parse_content(), andconnect_browser()functions - Added complete client parameter documentation
- Fixed browser connection example import issues
- Improved CI workflow for PyPI package testing
Bug Fixes
- Fixed missing Playwright import in browser example
- Corrected example URL typo
- Updated test workflow to prevent PyPI race conditions