This n8n workflow automates the process of scraping comprehensive business information from Yelp using individual business URLs. It integrates with Scrape.do for professional web scraping with anti-bot bypass capabilities and Google Sheets for centralized data storage, providing detailed business intelligence for market research, competitor analysis, and lead generation.
| Property | Value |
|---|---|
| Type | Form Trigger |
| Purpose | Initiates the workflow with user-submitted Yelp business URL |
| Input Fields | Yelp Business URL |
| Function | Captures target business URL to start the scraping process |
| Property | Value |
|---|---|
| Type | HTTP Request (POST) |
| Purpose | Creates an async scraping job via Scrape.do API |
| Endpoint | https://q.scrape.do/api/v1/jobs |
| Authentication | X-Token header |
Request Parameters:
true (uses residential/mobile proxies for better success rate)us (targets US-based content)desktopnetworkidle2 wait conditionFunction: Initiates comprehensive business data extraction from Yelp with headless browser rendering to handle dynamic content.
| Property | Value |
|---|---|
| Type | Code Node (JavaScript) |
| Purpose | Extracts structured business data from raw HTML |
| Mode | Run once for each item |
Function: Parses the scraped HTML content using regex patterns and JSON-LD extraction to retrieve:
| Property | Value |
|---|---|
| Type | Google Sheets Node |
| Purpose | Stores scraped business data for analysis and storage |
| Operation | Append rows |
| Target | "Yelp Scraper Data - Scrape.do" sheet |
Data Mapping:
Form Input → Create Scrape.do Job → Parse Yelp HTML → Store to Google Sheet
│ │ │ │
▼ ▼ ▼ ▼
User submits API creates job JavaScript code Data appended
Yelp URL with JS rendering extracts fields to spreadsheet
| Credential | Purpose |
|---|---|
| Scrape.do API Token | Required for Yelp business scraping with anti-bot bypass |
| Google Sheets OAuth2 | For data storage and export access |
| n8n Form Webhook | For user input collection |
| Parameter | Description |
|---|---|
YOUR_SCRAPEDO_TOKEN |
Your Scrape.do API token (appears in 3 places) |
YOUR_GOOGLE_SHEET_ID |
Target spreadsheet identifier |
YOUR_GOOGLE_SHEETS_CREDENTIAL_ID |
OAuth2 authentication reference |
networkidle2 wait condition ensures complete page load| Field | Description | Example |
|---|---|---|
name |
Business name | "Joe's Pizza Restaurant" |
overall_rating |
Average customer rating | "4.5" |
reviews_count |
Total number of reviews | "247" |
url |
Original Yelp business URL | "https://www.yelp.com/biz/..." |
phone |
Business phone number | "(555) 123-4567" |
address |
Full street address | "123 Main St, New York, NY 10001" |
price_range |
Price indicator | "$$" |
categories |
Business categories | "Pizza, Italian, Delivery" |
website |
Business website URL | "https://joespizza.com" |
hours |
Operating hours | "Mon-Fri 11:00-22:00" |
images_videos_urls |
Media content links | "https://s3-media1.fl.yelpcdn.com/..." |
scraped_at |
Extraction timestamp | "2025-01-15T10:30:00Z" |
| Specification | Value |
|---|---|
| Processing Time | 15-45 seconds per business URL |
| Data Accuracy | 95%+ for publicly available business information |
| Success Rate | 99.98% (Scrape.do guarantee) |
| Proxy Pool | 110M+ residential, mobile, and datacenter IPs |
| JS Rendering | Full headless browser with networkidle2 wait |
| Data Format | JSON with structured field mapping |
| Storage Format | Structured Google Sheets with 12 predefined columns |
Get your API token:
Update workflow references (3 places):
🔍 Create Scrape.do Job node → Headers → X-Token📡 Check Job Status node → Headers → X-Token📥 Fetch Task Results node → Headers → X-TokenReplace YOUR_SCRAPEDO_TOKEN with your actual API token.
Create target spreadsheet:
name | overall_rating | reviews_count | url | phone | address | price_range | categories | website | hours | images_videos_urls | scraped_at
/d/ and /edit)Set up OAuth2 credentials:
Update workflow references:
YOUR_GOOGLE_SHEET_ID with your actual Sheet IDYOUR_GOOGLE_SHEETS_CREDENTIAL_ID with credential referenceTest with sample URL:
https://www.yelp.com/biz/example-business-city)Activate workflow:
The workflow captures comprehensive business information including:
| Category | Data Points |
|---|---|
| Basic Information | Name, category, location |
| Performance Metrics | Ratings, review counts, popularity |
| Contact Details | Phone, website, address |
| Visual Content | Photos, videos, gallery URLs |
| Operational Data | Hours, services, price range |
Modify the input to accept multiple URLs by updating the job creation body:
{
"Targets": [
"https://www.yelp.com/biz/business-1",
"https://www.yelp.com/biz/business-2",
"https://www.yelp.com/biz/business-3"
],
"Super": true,
"GeoCode": "us",
"Render": {
"WaitUntil": "networkidle2",
"CustomWait": 3000
}
}
For complex Yelp pages, add browser interactions:
{
"Render": {
"BlockResources": false,
"WaitUntil": "networkidle2",
"CustomWait": 5000,
"WaitSelector": ".biz-page-header",
"PlayWithBrowser": [
{ "Action": "Scroll", "Direction": "down" },
{ "Action": "Wait", "Timeout": 2000 }
]
}
}
Add alert mechanisms:
| Issue | Cause | Solution |
|---|---|---|
| Invalid URL | URL is not a valid Yelp business page | Ensure URL format: https://www.yelp.com/biz/... |
| 401 Unauthorized | Invalid or missing API token | Verify X-Token header value |
| Job Timeout | Page too complex or slow | Increase CustomWait value |
| Empty Data | HTML parsing failed | Check page structure, update regex patterns |
| Rate Limiting | Too many concurrent requests | Reduce request frequency or upgrade plan |
| Metric | Value |
|---|---|
| Processing Time | 15-45 seconds per business URL |
| Data Accuracy | 95%+ for publicly available information |
| Success Rate | 99.98% (with Scrape.do anti-bot bypass) |
| Concurrent Processing | Depends on Scrape.do plan limits |
| Storage Capacity | Unlimited (Google Sheets based) |
| Proxy Pool | 110M+ IPs across 150 countries |
| Endpoint | Method | Purpose |
|---|---|---|
/api/v1/jobs |
POST | Create new scraping job |
/api/v1/jobs/{jobID} |
GET | Check job status |
/api/v1/jobs/{jobID}/{taskID} |
GET | Retrieve task results |
/api/v1/me |
GET | Get account information |
| Status | Description |
|---|---|
queuing |
Job is being prepared |
queued |
Job is in queue waiting to be processed |
pending |
Job is currently being processed |
rotating |
Job is retrying with different proxies |
success |
Job completed successfully |
error |
Job failed |
canceled |
Job was canceled by user |
For complete API documentation, visit: Scrape.do Documentation
This workflow is powered by Scrape.do - Reliable, Scalable, Unstoppable Web Scraping