[Solved] I believe my scraper got blocked, but I can access the website via a regular browser, how can they do this? [closed]


I am wondering both how the website was able to do this without blocking my IP outright and …

By examining all manner of things about your request, some straight-forward and some arcane. Straight-forward items include user-agent headers, cookies, correctly spelling of dynamic URLs.

Arcane items include your IP address, the timing of your request, the frequency of related requests, the content of other headers.

… if anyone has any tips for avoiding this in the future.

Yes. Contact the owners of the website in question and cooperate with any restrictions they have in place. Examine the terms of your license to use their website (if it is a general public license, it is often called “Terms of Service”). Ensure that you operate exclusively within those terms.

If the website data is available via an API, and your use falls within the API’s license terms, use it instead of screen-scraping. The format of the data will be more consistent, your code will run faster, and you will be less of a burden (or threat) to the website owner.

1

solved I believe my scraper got blocked, but I can access the website via a regular browser, how can they do this? [closed]