Field notes

The scraping journal.

Honest, practical writing on anti-bot bypass, managed scraping, and keeping data pipelines alive at scale.

Anti-bot bypass, honestly: what we crack and what's genuinely hard

A candid look at bypassing Cloudflare, DataDome, PerimeterX and Akamai at scale — what works reliably, what stays hard, and how we get partial wins on the toughest targets.

Jun 20, 2026 · 6 min read→Buyer's guide

Managed scraping vs DIY vs proxy providers: the real trade-offs

Build it in-house, buy proxies, or hire a managed team? An honest comparison of cost, reliability and maintenance for web data at scale.

Jun 12, 2026 · 5 min read→Pipelines

Reliable web-data pipelines: selector drift, QA, and delivery

Why scraping pipelines break and how to keep them running — selector drift, schema validation, quality checks, and getting data into your API, S3, SFTP or warehouse.

Jun 5, 2026 · 6 min read→Marketplace

Marketplace & pricing intelligence across Amazon, eBay and Shopee

Track price, stock, reviews and buy-box across marketplaces — what's straightforward, what's region-locked, and how to turn it into repricing and brand-protection signals.

May 29, 2026 · 6 min read→Data products

From job boards to LLM corpora: structured data for talent analytics and AI

Two ends of the web-data spectrum — hiring and compensation data from dozens of job boards, and clean, licensed training corpora for AI. How each is built and delivered.

May 22, 2026 · 6 min read→Legal

Is web scraping legal? A practical, non-lawyer overview

Public data, terms of service, copyright, personal data and robots.txt — a plain-English map of the web-scraping legal landscape. Not legal advice.

May 15, 2026 · 7 min read→