Your xpath
expressions aren’t correct. When you are using relative xpath
expressions they need to start with a "./"
and using class specifiers is much easier than indexing in my opinion.
def parse(self, response):
for row in response.xpath('//table[@class="list"]//tr'):
name = row.xpath('./td[@class="name"]/a/text()').get()
address = row.xpath('./td[@class="location"]/text()').get()
yield {
'Name':name,
'Address':address,
}
next_page = response.xpath("//a[@class="next-page"]/@href").get()
if next_page:
yield scrapy.Request(response.urljoin(next_page))
OUTPUT
...
...
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': None, 'Address': None}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': ' Airdome', 'Address': '\n Ardmore, OK, United States\n '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': ' Liberty Theatre', 'Address': '\n Chickamauga, GA, United States\n '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': ' Route 54 Drive-In', 'Address': '\n Tularosa, NM, United States\n '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '#1 Auto Theatre', 'Address': '\n Daytona Beach, FL, United States\n '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '#1 Drive-In', 'Address': '\n Apalachicola, FL, United States\n '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '$1.00 Cinema', 'Address': '\n Sherman, TX, United States\n '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '$uper Cinemas', 'Address': '\n East Lansing, MI, United States\n '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '0only Outdoor Theatre', 'Address': '\n Little Chute, WI, United States\n '}
2022-09-09 08:22:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://cinematreasures.org/theaters/united-states?page=1&status=all>
{'Name': '10 Hi Drive-In', 'Address': '\n St. Cloud, MN, United States\n '}
...
...
0
solved Scrapy keeps getting blocked