17 Jun 13:53

shahar-brd

e55918b

v2.4.0 — Sync parity, colorless job verbs, dataset error reporting Latest

Latest

Highlights

Sync client parity — SyncBrightDataClient now mirrors the async surface. Adds client.datasets (fixes the
AttributeError), the 5 missing scrapers (scrape.tiktok / youtube / reddit / perplexity / digikey), the 2 missing
search verticals (search.tiktok / youtube), Pinterest trigger/status/fetch, and Instagram-search profiles / reels_all.
Service-level job verbs (colorless pattern) — every scraper now exposes generic status / wait / fetch /
to_result(snapshot_id) (on BaseWebScraper), and DiscoverService gains status / wait / fetch / to_result(task_id).
A triggered job can be driven by its id alone, like the crawler. Purely additive — existing job.fetch() etc. are unchanged.
Discover sync manual path — SyncBrightDataClient adds discover_status / discover_wait / discover_fetch /
discover_to_result(task_id) plus a colorless DiscoverSnapshot handle.
Better dataset errors — failed snapshots now expose the API failure reason (and the raw status response as a fallback)
plus the snapshot_id, instead of DatasetError: Snapshot failed: None. SnapshotStatus retains the full response (.raw)
and matches more reason keys (error / error_message / message / failure_reason). The sync path inherits the fix.

⚠️ Breaking / Contract change

Sync discover_trigger() now returns a DiscoverSnapshot (a typed, drivable handle) instead of the async-only DiscoverJob,
which couldn't be used from sync.
Migration: poll via client.discover_status(snap.task_id) / client.discover_fetch(snap.task_id).

Full changelog: https://github.com/brightdata/sdk-python/blob/main/CHANGELOG.md

Assets 2

23 Apr 08:17

shahar-brd

v2.3.1

280143f

v2.3.1

Add browser-api to readme & various fixes to discovery api and scrapers

Assets 2

09 Mar 15:23

shahar-brd

v2.3.0

43504e6

v2.3.0 — Scraper Studio, Cleanup & Test Suite Rewrite

Bright Data SDK Release Notes (v2.3.0)

We are excited to announce the latest release of the Bright Data SDK! This update brings major new capabilities to our Web Scraper API, introduces new supported targets, and includes a massive under-the-hood cleanup to improve stability, maintainability, and test coverage.

New Features

Scraper Studio Integration: You can now seamlessly trigger and fetch results directly from your custom scrapers built within Bright Data's IDE.
New Built-in Scrapers: Added official out-of-the-box support for DigiKey and Reddit scrapers.

Bug Fixes

Scraping Reliability: Resolved a critical issue that caused a crash when calling ScrapeJob.to_result(), ensuring smoother data extraction workflows.

Maintenance & Code Quality

Codebase Cleanup: We've performed a major spring cleaning, removing dead code and deprecating legacy modules. This resulted in a significantly leaner SDK with a net reduction of 12,000 lines of code.
Enhanced Test Coverage: Added 365 new unit tests utilizing shared fixtures. Our key modules now boast robust test coverage ranging from 87% to 98%, ensuring greater reliability for future updates.

Assets 2

23 Feb 09:38

shahar-brd

v2.2.1

6699b56

v2.2.1 — Datasets API with 100+ Integrations

What's New

Datasets API

Access 100+ ready-made datasets from Bright Data — pre-collected, structured data from popular platforms.

Callable datasets — trigger snapshots directly: ⁠ await client.datasets.imdb_movies(filter=..., records_limit=5) ⁠
⁠ sample() ⁠ method — quick data sampling without specifying filters
⁠ get_metadata() ⁠ — discover available fields and types per dataset
Export utilities — ⁠ export_json() ⁠, ⁠ export_csv() ⁠, ⁠ export_jsonl() ⁠

Supported Categories

E-commerce (Amazon, Walmart, Shopee, Zalando, Zara, H&M, IKEA, Shein, Sephora), Social media (Instagram, TikTok, Pinterest,
YouTube, Facebook), Business intelligence (ZoomInfo, PitchBook, Owler, Slintel, G2, Trustpilot), Jobs & HR (Glassdoor, Indeed,
Xing), Real estate (Zillow, Airbnb + 8 regional), Luxury brands (Chanel, Dior, Prada, Hermes, YSL), Entertainment (IMDB, NBA,
Goodreads), and more.

Fixes

LinkedIn search tests updated to match pythonic parameter names (⁠ first_name ⁠/⁠ last_name ⁠ instead of ⁠ firstName ⁠/⁠ lastName ⁠)

Assets 2

20 Jan 11:58

shahar-brd

v2.1.1

da52dee

v2.1.1 - Instagram Scrapers & Version Centralization

What's New

Instagram Scraper Support

Method	Description
⁠ client.scrape.instagram.profiles/posts/reels/comments(url) ⁠	Extract data from URL
⁠ client.search.instagram.profiles(user_name) ⁠	Find profile by username
⁠ client.search.instagram.posts/reels/reels_all(url, ...) ⁠	Discover content with filters

Improvements

Version centralization - Single source of truth in ⁠ pyproject.toml ⁠
Bug fix - Discovery endpoints now correctly include ⁠ type=discover_new&discover_by=... ⁠ query params

Full Changelog: v2.1.0...v2.1.1

Assets 2

07 Jan 11:53

shahar-brd

v2.1.0

37d502a

v2.1.0 - Async Mode for SERP and Web Unlocker

What's New

Async Mode

Non-blocking async mode for SERP and Web Unlocker APIs using mode="async":

SERP

result = await client.search.google(query="python", mode="async")

Web Unlocker

result = await client.scrape_url(url="https://example.com", mode="async")

How it works: Triggers request → gets response_id → polls until ready

Bug Fixes

Fix SyncBrightDataClient: remove unused customer_id parameter
Fix default poll_timeout for Web Unlocker async mode

API Changes

Remove _async suffix from method names (products() instead of products_async())
Remove GenericScraper - use client.scrape_url() directly

Documentation

Added docs/async_mode_guide.md

Full Changelog: https://github.com/brightdata/sdk-python/blob/main/CHANGELOG.md

Assets 2

01 Dec 17:51

shahar-brd

v2.0.0

4108b23

v2.0.0 - Breaking Changes

🚀 v2.0.0 - Complete Architecture Rewrite

⚠️ Breaking Changes - Migration Required

This is a major breaking release requiring code changes. Python 3.9+ now required.

Client Initialization

# ❌ Old
from brightdata import bdclient
client = bdclient(api_token="your_token")

# ✅ New
from brightdata import BrightDataClient
client = BrightDataClient(token="your_token")

API Structure - Hierarchical Methods

# ❌ Old - Flat API
client.scrape_linkedin.profiles(url)
client.search_linkedin.jobs()
result = client.scrape(url, zone="my_zone")

# ✅ New - Hierarchical API
client.scrape.linkedin.profiles(url)
client.search.linkedin.jobs()
result = client.scrape_url(url, zone="my_zone")

Platform-Specific Scraping

# ✅ New - Recommended approach
client.scrape.amazon.products(url)
client.scrape.amazon.reviews(url)
client.scrape.amazon.sellers(url)
client.scrape.linkedin.profiles(url)
client.scrape.instagram.profiles(url)
client.scrape.facebook.posts(url)

Search Operations

# ❌ Old
results = client.search(query, search_engine="google")

# ✅ New - Dedicated methods
client.search.google(query)
client.search.bing(query)
client.search.yandex(query)

Async Support (New)

# ✅ Sync (still supported)
client = BrightDataClient(token="...")
result = client.scrape_url(url)

# ✅ Async (recommended for performance)
async with BrightDataClient(token="...") as client:
    result = await client.scrape_url_async(url)
    
# ✅ Async batch operations
async def scrape_multiple():
    async with BrightDataClient(token="...") as client:
        tasks = [client.scrape_url_async(url) for url in urls]
        results = await asyncio.gather(*tasks)

Manual Job Control (New)

# ✅ Fine-grained control
job = await scraper.trigger(url)
# Do other work...
status = await job.status_async()
if status == "ready":
    data = await job.fetch_async()

Type-Safe Payloads (New)

# ❌ Old - untyped dicts
payload = {"url": "...", "reviews_count": 100}

# ✅ New - structured with validation
from brightdata import AmazonProductPayload
payload = AmazonProductPayload(
    url="https://amazon.com/dp/B123",
    reviews_count=100
)
result = client.scrape.amazon.products(payload)

Return Types

# ✅ New - structured objects with metadata
result = client.scrape.amazon.products(url)
print(result.data)        # Actual scraped data
print(result.timing)      # Performance metrics
print(result.cost)        # Cost tracking
print(result.snapshot_id) # Job identifier

CLI Tool (New)

# ✅ Command-line interface
brightdata scrape amazon products --url https://amazon.com/dp/B123
brightdata search google --query "python sdk"
brightdata search linkedin jobs --location "Paris"
brightdata crawler discover --url https://example.com --depth 3

Configuration Changes

# ❌ Old
client = bdclient(
    api_token="token",              # Changed parameter name
    auto_create_zones=True,          # Default changed to False
    web_unlocker_zone="sdk_unlocker", # Default changed
    serp_zone="sdk_serp",            # Default changed
    browser_zone="sdk_browser"       # Default changed
)

# ✅ New
client = BrightDataClient(
    token="token",                   # Renamed from api_token
    auto_create_zones=False,         # New default
    web_unlocker_zone="web_unlocker1", # New default name
    serp_zone="serp_api1",           # New default name
    browser_zone="browser_api1",     # New default name
    timeout=30,                      # New parameter
    rate_limit=10,                   # New parameter (optional)
    rate_period=1.0                  # New parameter
)

✨ New Features

Platform Coverage

Platform	Status	Methods
Amazon	✅ NEW	`products()`, `reviews()`, `sellers()`
Instagram	✅ NEW	`profiles()`, `posts()`, `comments()`, `reels()`
Facebook	✅ NEW	`posts()`, `comments()`, `groups()`
LinkedIn	✅ Enhanced	Full scraping and search
ChatGPT	✅ Enhanced	Improved interaction
Google/Bing/Yandex	✅ Enhanced	Dedicated services

Performance

⚡ 10x better concurrency - Event loop-based architecture
🔌 Advanced connection pooling - 100 total, 30 per host
🎯 Built-in rate limiting - Configurable request throttling

✅ Upgrade Checklist

Update Python to 3.9+
Change imports: bdclient → BrightDataClient
Update parameter: api_token= → token=
Migrate method calls to hierarchical structure
Handle new ScrapeResult/SearchResult return types
Review zone configuration defaults
Consider async for better performance
Test in staging environment

📚 Resources

Full Changelog: v1.1.3...v2.0.0

Assets 2

07 Sep 18:20

Idanvilenski

v1.1.3

5a81b1c

v1.1.3

New Features:

Added url parameter to extract function for direct URL specification
Added output_scheme parameter for OpenAI Structured Outputs support
Enhanced parse_content to auto-detect multiple results from batch operations

Improvements:

Added user-agent headers to all dataset API requests for better tracking
Improved schema validation for OpenAI Structured Outputs compatibility
Updated examples with proper formatting

Bug Fixes:

Fixed parse_content handling of multiple scraping results
Fixed OpenAI schema validation requirements

Assets 2

04 Sep 14:53

Idanvilenski

v1.1.2

b4aec7e

v1.1.2: AI-Powered Extract Function and LinkedIn Sync Improvements

New Features

AI-Powered Extract Function: New extract() function that combines web scraping with OpenAI's language models to extract targeted information from web pages using natural language queries
LinkedIn Sync Mode Fix: Fixed LinkedIn scraping sync mode to use the correct API endpoint and request structure for immediate data retrieval

Improvements

Set sync=True as default for all LinkedIn scraping methods for better user experience
Improved unit test coverage
Enhanced error handling for LinkedIn API responses

Examples

Added extract_example.py demonstrating AI-powered content extraction capabilities
Updated LinkedIn examples to showcase sync functionality

Technical Changes

Use correct /scrape endpoint for synchronous LinkedIn requests
Pass dataset_id as URL parameter with proper flags
Handle both 200 and 202 status codes appropriately
Maintain backward compatibility for async operations

Assets 2

03 Sep 10:22

Idanvilenski

v1.1.1

ef3d827

v1.1.1: Documentation Updates & Bug Fixes

Updates

Enhanced README with examples for crawl(), parse_content(), and connect_browser() functions
Added complete client parameter documentation
Fixed browser connection example import issues
Improved CI workflow for PyPI package testing

Bug Fixes

Fixed missing Playwright import in browser example
Corrected example URL typo
Updated test workflow to prevent PyPI race conditions

Assets 2

Releases: brightdata/sdk-python

v2.4.0 — Sync parity, colorless job verbs, dataset error reporting

Highlights

⚠️ Breaking / Contract change

Uh oh!

v2.3.1

Uh oh!

v2.3.0 — Scraper Studio, Cleanup & Test Suite Rewrite

Bright Data SDK Release Notes (v2.3.0)

New Features

Bug Fixes

Maintenance & Code Quality

Uh oh!

v2.2.1 — Datasets API with 100+ Integrations

What's New

Datasets API

Supported Categories

Fixes

Uh oh!

v2.1.1 - Instagram Scrapers & Version Centralization

What's New

Instagram Scraper Support

Improvements

Uh oh!

v2.1.0 - Async Mode for SERP and Web Unlocker

What's New

Async Mode

SERP

Web Unlocker

Bug Fixes

API Changes

Documentation

Uh oh!

v2.0.0 - Breaking Changes

🚀 v2.0.0 - Complete Architecture Rewrite

⚠️ Breaking Changes - Migration Required

Client Initialization

API Structure - Hierarchical Methods

Platform-Specific Scraping

Search Operations

Async Support (New)

Manual Job Control (New)

Type-Safe Payloads (New)

Return Types

CLI Tool (New)

Configuration Changes

✨ New Features

Platform Coverage

Performance

✅ Upgrade Checklist

📚 Resources

Uh oh!

v1.1.3

Uh oh!

v1.1.2: AI-Powered Extract Function and LinkedIn Sync Improvements

New Features

Improvements

Examples

Technical Changes

Uh oh!

v1.1.1: Documentation Updates & Bug Fixes

Updates

Bug Fixes

Uh oh!