Web Scraper
Scrape a URL and pass the extracted content to the next node, powered by Apify.
The Web Scraper node connects to your Apify account and runs a scraper actor for each URL it receives from an upstream node. The extracted content is passed downstream as structured JSON — ready to feed into an AI node, a CSV export, or any other step.
Setup
You need an Apify account and an API token. Connect it once in the node settings — the token is stored securely and reused for every run.
Configuration
| Field | Description |
|---|---|
| Scraper | Which Apify actor to use. See the Scrapers section below. Only applies when Smart URL Routing is off. |
| Smart URL Routing | When enabled, the node inspects the URL and automatically selects the best actor for that platform. Falls back to the Generic Web Scraper for all other URLs. |
Scrapers
| Scraper | Best for |
|---|---|
| Generic Web Scraper (default) | Any public website — company pages, blogs, landing pages, news articles |
| Twitter / X | Tweets and thread content from a twitter.com or x.com URL |
| YouTube | Video metadata, description, channel info, view and like counts |
| Google Maps | Business listings — name, address, rating, reviews, phone, website |
| Amazon Products | Product pages — title, price, brand, rating, review count, ASIN |
When Smart URL Routing is on, the node detects the platform from the URL and picks the right scraper automatically. You don't need to set anything else.
Inputs & Outputs
| Name | Description | |
|---|---|---|
| Input | json_data | JSON from an upstream node containing a url field to scrape |
| Output | action_result | Structured JSON with the scraped content (fields vary by scraper — see below) |
Output fields by scraper
Generic Web Scraper
| Field | Description |
|---|---|
url | The scraped URL |
title | Page title |
description | Meta description |
author | Page author if present |
keywords | Meta keywords |
language | Detected language code |
text | Full plain-text content |
markdown | Content as Markdown |
Twitter / X
| Field | Description |
|---|---|
url | Tweet URL |
author | Username |
text | Tweet text |
likes | Like count |
retweets | Retweet count |
replies | Reply count |
posted_at | Post timestamp |
YouTube
| Field | Description |
|---|---|
url | Video URL |
title | Video title |
channel | Channel name |
channel_url | Channel URL |
subscribers | Subscriber count |
description | Video description |
views | View count |
likes | Like count |
duration | Video duration |
published_at | Publish date |
hashtags | Comma-separated hashtags |
Google Maps
| Field | Description |
|---|---|
url | Place URL |
title | Business name |
address | Full address |
rating | Average rating score |
reviews_count | Total review count |
website | Business website |
phone | Phone number |
category | Business category |
Amazon Products
| Field | Description |
|---|---|
url | Product URL |
title | Product title |
price | Listed price |
brand | Brand name |
rating | Average star rating |
reviews_count | Total review count |
description | Product description |
asin | Amazon product identifier |
Tips
- Connect a Google Sheets Trigger upstream to scrape one URL per new row automatically
- Pass the output to a Use AI Model node to extract structured fields from the raw scraped content
- Use Smart URL Routing when your sheet contains a mix of URLs from different platforms — the node picks the right scraper for each one
- The Generic Web Scraper uses a lightweight HTTP crawler by default. It works well for most public pages and keeps costs low on the free Apify tier.
- If a URL returns no results (e.g. a login-gated page), the node fails and stops the flow — the Flow Completion Event node can catch this and send you an alert