[Solved] BeautifulSoup – Amazon and Google identify me as a robot; how can i fix it?


  • Rotating proxies
  • Delays
  • Avoid the same pattern
  • IP rate limit (probably your issue)

IP rate limit. It’s a basic security system that can ban or block incoming requests from the same IP. It means that a regular user would not make 100 requests to the same domain in a few seconds with the exact same pattern (scroll, click, scroll, click, open. As an example).

How to reduce the chance of being blocked while web scraping search engines.


Alternatively, you can use Google Shopping Results API from SerpApi. It’s a paid API with a free plan.

The difference in your case is that you don’t have to spend time figuring out how to bypass blocks from Google since it’s already done for the end-user.

Example code to integrate to parse data from Google Shopping and example in the online IDE:

import os
from serpapi import GoogleSearch


params = {
    "api_key": os.getenv("API_KEY"),
    "engine": "google_product",
    "product_id": "14506091995175728218", # can be iterated over multiple product ids
    "gl": "us",                           # country to search from
    "hl": "en"                            # language
}

search = GoogleSearch(params)
results = search.get_dict()

title = results['product_results']['title']
prices = results['product_results']['prices']
reviews = results['product_results']['reviews']
rating = results['product_results']['rating']
extensions = results['product_results']['extensions']
description = results['product_results']['description']
user_reviews = results['product_results']['reviews']
reviews_results = results['reviews_results']['ratings']

print(f'{title}\n'
    f'{prices}\n'
    f'{reviews}\n'
    f'{rating}\n'
    f'{extensions}\n'
    f'{description}\n'
    f'{user_reviews}\n'
    f'{reviews_results}')


'''
Google Pixel 4 White 64 GB, Unlocked
['$247.79', '$245.00', '$439.00']
526
3.7
['October 2019', 'Google', 'Pixel Family', 'Pixel 4', 'Android', '5.7″', 'Facial Recognition', '8 MP front camera', 'Smartphone', 'With Wireless Charging']
Point and shoot for the perfect photo. Capture brilliant color and control the exposure balance of different parts of your photos. Get the shot without the flash. Night Sight is now faster and easier to use it can even take photos of the Milky Way. Get more done with your voice. The new Google Assistant is the easiest way to send texts, share photos, and more. A new way to control your phone. Quick Gestures let you skip songs and silence calls – just by waving your hand above the screen. End the robocalls. With Call Screen, the Google Assistant helps you proactively filter our spam before your phone ever rings.
526
[{'stars': 1, 'amount': 101}, {'stars': 2, 'amount': 43}, {'stars': 3, 'amount': 39}, {'stars': 4, 'amount': 73}, {'stars': 5, 'amount': 270}]
'''

Example to iterate over multiple item ID’s:

# import os
# from serpapi import GoogleSearch


# random numbers except the first one
products = ['14506091995175728218', '1450609199517512118', '145129895175728218']


for product in products:
    params = {
        "api_key": os.getenv("API_KEY"),
        "engine": "google_product",
        "product_id": product,
        "gl": "us",
        "hl": "en"   
    }

    search = GoogleSearch(params)
    results = search.get_dict()

    title = results['product_results']['title']

    print(title, sep='\n')  # prints 3 titles from 3 different products

Disclaimer, I work for SerpApi.

solved BeautifulSoup – Amazon and Google identify me as a robot; how can i fix it?