screen-scraping Archives

[Solved] How to search words from android app to any website(prize bond)? [closed]

December 23, 2022 by Kirat

Yes, it is possible. You won’t find a solution online to something that specific. Do research on how to pull data from web pages. One easy way I can suggest is pay attention to the HTML tags used for the data you want, and search only through the relevant tags. That will cut down on … Read more

[Solved] How to scrape HTML using Python for NOWTV available movies

November 25, 2022 by Kirat

You can mimic what the page is doing in terms of paginated results (https://www.nowtv.com/stream/all-movies/page/1) and extract movies from the script tag of each page. Although the below could use some re-factoring it shows how to obtain the total number of films, calculate the films per page, and issue requests to get all films using Session … Read more

[Solved] Skipp the error while scraping a list of urls form a csv

October 10, 2022 by Kirat

Here is working version, from bs4 import BeautifulSoup import requests import csv with open(‘urls.csv’, ‘r’) as csvFile, open(‘results.csv’, ‘w’, newline=””) as results: reader = csv.reader(csvFile, delimiter=”;”) writer = csv.writer(results) for row in reader: # get the url url = row[0] # fetch content from server html = requests.get(url).content # soup fetched content soup = BeautifulSoup(html, … Read more

[Solved] organizing data that I am pulling and saving to CSV

October 6, 2022 by Kirat

You can use pandas to do that. Collect all the data into a dataframe, then just write the dataframe to file. import pandas as pd import requests import bs4 root_url=”https://www.estatesales.net” url_list=[‘https://www.estatesales.net/companies/NJ/Northern-New-Jersey’] results = pd.DataFrame() for url in url_list: response = requests.get(url) soup = bs4.BeautifulSoup(response.text, ‘html.parser’) companies = soup.find_all(‘app-company-city-view-row’) for company in companies: try: link = … Read more

[Solved] How to work around a site forbidding me to scrape their images with PHP

October 4, 2022 by Kirat

Actually it was quite simple. As @Leigh suggested, it only took spoofing an http referrer with the option CURLOPT_REFERER. In fact for every request, I just provided the domain name as the referrer and it worked. 0 solved How to work around a site forbidding me to scrape their images with PHP

[Solved] Check if Python has written the targeted text

September 15, 2022 by Kirat

This is an alternative to know if python found the text you are looking for: import requests from bs4 import BeautifulSoup urls = [‘https://www.google.com’] for i in range(len(urls)): r = requests.get(urls[i]) soup = BeautifulSoup(r.content, ‘lxml’) items = soup.find_all(‘p’) for item in items: if “2016 – Privacidad – Condiciones” in item.text: print “Python has found the … Read more