Tracking 14 Retailers Across Two Countries

When we first started building Crawlbot, we scraped one retailer: Currys. One retailer, a handful of laptop categories, once an hour. Today, we run 160 active Share of Voice schedules across 14 retailers in the United Kingdom and South Africa, covering 9 product categories with hourly granularity. Getting here required solving a different technical puzzle for nearly every retailer we added, because no two e-commerce sites are built the same way.

The Retailer Map

Our SoV monitoring now spans two distinct markets. In the United Kingdom, we track 12 retailers: Currys, Amazon, Argos, AO, Box, John Lewis, Laptops Direct, Overclockers, Scan, Very, Costco, and EE. In South Africa, we cover 6 retailers: Takealot, Incredible, ComputerMania, Makro, Game, and Amazon ZA.

Each retailer is monitored across up to 9 product categories: Laptops, Chromebooks, Gaming Laptops, Monitors, Gaming Monitors, Desktops, Gaming Desktops, All-in-Ones, and Projectors. Not every retailer carries every category -- EE does not sell desktops, for example -- but the total comes to 160 distinct schedule entries, each firing once per hour. That is 3,840 category page scrapes per day, every day.

Every Site Is Different

The core challenge of multi-retailer scraping is that every website structures its product data differently. There is no standard. We have encountered four fundamentally different extraction methods across our 14 retailers, and most sites require a mix of approaches.

JSON-LD and Structured Data

The cleanest sites embed product data as JSON-LD in their page source -- structured <script type="application/ld+json"> blocks that contain product names, prices, brands, and availability in a machine-readable format. When this works, it is fast and reliable. But many retailers only include JSON-LD on product detail pages, not on category listings, making it useless for SoV monitoring where we need to read an entire category grid.

API Interception

Some retailers load product grids dynamically via XHR or fetch requests to internal APIs. Scan, for instance, renders its category pages by calling a backend API and injecting the results into the DOM via JavaScript. For these retailers, we intercept the network response using Playwright's route interception, parse the JSON payload directly, and skip DOM parsing entirely. This is faster and more resilient to frontend redesigns because the API response format tends to be more stable than the HTML structure.

Pure DOM Parsing

Most retailers require us to parse the rendered DOM -- the actual HTML elements on the page after JavaScript has finished executing. This means identifying the correct CSS selectors for product cards, titles, prices, brand names, and position indicators. When Currys redesigned their product grid in February 2026, changing their product card class from .Product-display to .product-tile, we had to update our selectors within hours. This is the most common extraction method and the most fragile.

Direct API Calls

Some retailers expose their product catalog through public or semi-public APIs, making browser automation unnecessary. Game.co.za, a South African electronics retailer, runs on the SAP Hybris Commerce platform. Their product data is accessible through an OCC (Omni Commerce Connect) REST API at a predictable endpoint. We skip Playwright entirely for Game and make direct HTTP requests through Bright Data's Web Unlocker, fetching product data as clean JSON. No browser overhead, no rendering delays -- just API calls with rate limiting to avoid PX (PerimeterX) blocks.

The Sponsored Detection Problem

Share of Voice monitoring is not just about which products appear on a category page. It is about understanding which of those placements are organic (earned through search relevance, reviews, and sales velocity) versus sponsored (paid for through the retailer's advertising platform). This distinction is critical for brands managing retail media budgets, but every retailer marks sponsored products differently.

Currys uses Criteo as its retail media platform. Sponsored products on Currys category pages contain Criteo-specific markers in the HTML -- data attributes and class names injected by the Criteo ad server. Our scraper looks for these markers to flag sponsored placements.

Amazon uses its own advertising system. Sponsored products are wrapped in elements with the AdHolder class, or contain the text "Sponsored" in a label above the product card. Our Amazon scraper checks for both patterns because Amazon A/B tests different markup structures across sessions.

John Lewis uses a cleaner approach with a semantic data attribute: [data-testid="sponsored-product-tag"]. This is the most developer-friendly sponsored marker we have encountered -- clear, consistent, and unlikely to change accidentally during a redesign.

Several retailers in our coverage -- Scan, Laptops Direct, Overclockers, and most South African retailers -- do not have sponsored product placements at all. Their category pages show products in a single organic ranking. For these retailers, every position is organic, and our SoV data reflects pure search and merchandising performance.

Retailer-Specific Technical Challenges

Beyond the extraction method and sponsored detection, each retailer presented unique technical hurdles that required custom solutions.

Amazon was our most difficult integration. Amazon's bot detection is aggressive -- their systems fingerprint browser instances, check for headless mode indicators, and rate-limit suspicious request patterns. Our original Playwright-based Amazon scraper was returning zero products as Amazon began blocking it entirely. We rewrote the scraper to bypass the browser altogether, using Bright Data's Web Unlocker to fetch raw HTML over HTTP with a UK country code, then parsing products from the response using regex patterns against data-asin attributes and s-result-item blocks. We also had to force a UK location by injecting a UB7 0DQ postcode (a West London address near Heathrow) to ensure we saw UK pricing and availability, and we implemented ASIN-based deduplication because Amazon frequently shows the same product in both organic and sponsored positions.

John Lewis loads products progressively. The initial page render shows approximately 30 products, but most categories contain 60 or more. To get the full product list, our scraper clicks the "Show more" button repeatedly until all results are loaded, waiting for the DOM to update between each click. When we expanded John Lewis from 1 category to 9 in February 2026, we had to update the category URL patterns as well -- their laptop category moved from a simple path to a faceted navigation URL with an /_/N-a8f suffix.

Box.co.uk added Cloudflare Turnstile in February 2026, which blocked all headless browser access. We completely rewrote the Box scraper to use Bright Data's Web Unlocker instead of Playwright, fetching category pages over HTTP and parsing the product data from the response. Testing showed 24 products per page, with pagination support that successfully traversed all 34 pages of their laptop category (797 products total). A subtle but important detail: Box's API returns both minimum_price and final_price fields, and we had to learn through trial and error that final_price is the correct one to use for the displayed price.

ComputerMania runs on Shopify, which uses singular collection handles. We initially configured URLs with plurals -- /collections/chromebooks and /collections/gaming-laptops -- only to discover they returned 404 errors. The correct handles were /collections/chromebook and /collections/gaming-laptop. A small fix, but one that silently returned zero products until we caught it.

Why South Africa?

The South Africa expansion was driven entirely by client demand. Several of our customers are multinational consumer electronics brands that sell through both UK and South African retail channels. They were already using Crawlbot for UK visibility and asked whether we could extend coverage to South Africa, where they had no competitive intelligence tooling at all.

The South African market has its own characteristics. Retailer websites tend to be less sophisticated in their anti-bot measures (with the notable exception of PerimeterX on Game.co.za), but the platform diversity is greater. We are scraping sites built on Shopify (ComputerMania), SAP Hybris (Game), custom platforms (Takealot, Incredible), and WooCommerce-based stores (Makro). Each required a fresh scraper implementation, but the underlying BullMQ infrastructure handled the expansion without any changes -- we simply added schedule entries and scraper files.

160 Schedules, Hourly

The result of this expansion is a comprehensive, hourly picture of the consumer electronics retail landscape across two countries. For every category page, for every retailer, for every hour of every day, we capture: which products appear, in what position, at what price, whether the placement is sponsored or organic, and which brand owns that product. This data feeds into our SoV analytics dashboard, where brands can track their visibility over time, compare performance against competitors with full brand attribution (not anonymized placeholders), and identify the most competitive hours of the day for each retailer.

We started with one retailer and a simple question: "How visible are we on Currys?" Today we answer that question across 14 retailers, 9 categories, and two countries -- every hour, automatically, and with the kind of granularity that retail media platforms cannot provide. We are already working on extending this further, and the distributed architecture we built makes adding the next retailer as straightforward as writing one TypeScript file and adding a line to a JSON config.