[Solved] Scrapy keeps getting blocked

Question

Your xpath expressions aren’t correct. When you are using relative xpath expressions they need to start with a "./" and using class specifiers is much easier than indexing in my opinion.

    def parse(self, response):
        for row in response.xpath('//table[@class="list"]//tr'):
            name =  row.xpath('./td[@class="name"]/a/text()').get()
            address = row.xpath('./td[@class="location"]/text()').get()
            yield {
                'Name':name,
                'Address':address,
            }
        next_page = response.xpath("//a[@class="next-page"]/@href").get()
        if next_page:
            yield scrapy.Request(response.urljoin(next_page))

OUTPUT

...
...
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': None, 'Address': None}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': ' Airdome', 'Address': '\n                Ardmore, OK, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': ' Liberty Theatre', 'Address': '\n                Chickamauga, GA, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': ' Route 54 Drive-In', 'Address': '\n                Tularosa, NM, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '#1 Auto Theatre', 'Address': '\n                Daytona Beach, FL, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '#1 Drive-In', 'Address': '\n                Apalachicola, FL, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '$1.00 Cinema', 'Address': '\n                Sherman, TX, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '$uper Cinemas', 'Address': '\n                East Lansing, MI, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '0only Outdoor Theatre', 'Address': '\n                Little Chute, WI, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '10 Hi Drive-In', 'Address': '\n                St. Cloud, MN, United States\n              '}
...
...

Accepted Answer

Your xpath expressions aren’t correct. When you are using relative xpath expressions they need to start with a "./" and using class specifiers is much easier than indexing in my opinion.

    def parse(self, response):
        for row in response.xpath('//table[@class="list"]//tr'):
            name =  row.xpath('./td[@class="name"]/a/text()').get()
            address = row.xpath('./td[@class="location"]/text()').get()
            yield {
                'Name':name,
                'Address':address,
            }
        next_page = response.xpath("//a[@class="next-page"]/@href").get()
        if next_page:
            yield scrapy.Request(response.urljoin(next_page))

OUTPUT

...
...
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': None, 'Address': None}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': ' Airdome', 'Address': '\n                Ardmore, OK, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': ' Liberty Theatre', 'Address': '\n                Chickamauga, GA, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': ' Route 54 Drive-In', 'Address': '\n                Tularosa, NM, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '#1 Auto Theatre', 'Address': '\n                Daytona Beach, FL, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '#1 Drive-In', 'Address': '\n                Apalachicola, FL, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '$1.00 Cinema', 'Address': '\n                Sherman, TX, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '$uper Cinemas', 'Address': '\n                East Lansing, MI, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '0only Outdoor Theatre', 'Address': '\n                Little Chute, WI, United States\n              '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '10 Hi Drive-In', 'Address': '\n                St. Cloud, MN, United States\n              '}
...
...