[Solved] How to scrape multiple result having same tags and class

You need to parse your data from the script tag rather than the spans and divs. Try this: import requests from bs4 import BeautifulSoup import re import pandas as pd from pandas import json_normalize import json def get_page(url): response = requests.get(url) if not response.ok: print(‘server responded:’, response.status_code) else: soup = BeautifulSoup(response.text, ‘lxml’) return soup def … Read more

[Solved] Scraping data off site using 4 urls for one day using R

You can turn all the tables into a wide data frame with list operations: library(rvest) library(magrittr) library(dplyr) date <- 20130701 rng <- c(1:4) my_tabs <- lapply(rng, function(i) { url <- sprintf(“http://apims.doe.gov.my/apims/hourly%d.php?date=%s”, i, date) pg <- html(url) pg %>% html_nodes(“table”) %>% extract2(1) %>% html_table(header=TRUE) }) glimpse(plyr::join_all(my_tabs, by=colnames(my_tabs[[1]][1:2]))) ## Observations: 52 ## Variables: ## $ NEGERI / … Read more

[Solved] How to get data from a combobox using Beautifulsoup and Python?

From what I can see of the html, there is no span with id=”sexo- button”, so BeautifulSoup(login_request.text, ‘lxml’).find(“span”,id=”sexo- button”) would have returned None, which is why you got the error from get_text. As for your second attempt, I don’t think bs4 Tags have a value property, which is why you’d be getting None that time. … Read more

[Solved] How to get a link with web scraping

In the future, provide some code to show what you have attempted. I have expanded on Fabix answer. The following code gets the Youtube link, song name, and artist for all 20 pages on the source website. from bs4 import BeautifulSoup import requests master_url=”https://www.last.fm/tag/rock/tracks?page={}” headers = { “User-Agent”: “Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like … Read more

[Solved] organizing data that I am pulling and saving to CSV

You can use pandas to do that. Collect all the data into a dataframe, then just write the dataframe to file. import pandas as pd import requests import bs4 root_url=”https://www.estatesales.net” url_list=[‘https://www.estatesales.net/companies/NJ/Northern-New-Jersey’] results = pd.DataFrame() for url in url_list: response = requests.get(url) soup = bs4.BeautifulSoup(response.text, ‘html.parser’) companies = soup.find_all(‘app-company-city-view-row’) for company in companies: try: link = … Read more

[Solved] How to make this crawler more efficient [closed]

Provided your intentions are not nefarious– As mentioned in the comment, one way to achieve this is executing the crawler in parallel (multithreading)—as opposed to doing one domain at a time. Something like: exec(‘php crawler.php > /dev/null 2>&1 &’); exec(‘php crawler.php > /dev/null 2>&1 &’); exec(‘php crawler.php > /dev/null 2>&1 &’); exec(‘php crawler.php > /dev/null … Read more

[Solved] I believe my scraper got blocked, but I can access the website via a regular browser, how can they do this? [closed]

I am wondering both how the website was able to do this without blocking my IP outright and … By examining all manner of things about your request, some straight-forward and some arcane. Straight-forward items include user-agent headers, cookies, correctly spelling of dynamic URLs. Arcane items include your IP address, the timing of your request, … Read more

[Solved] Web Scraping From .asp URLs

I would recommend using JSoup for this. To do so add below to pom.xml <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.11.2</version> </dependency> Then you fire a first request to just get cookied Connection.Response initialPage = Jsoup.connect(“https://www.flightview.com/flighttracker/”) .headers(headers) .method(Connection.Method.GET) .userAgent(userAgent) .execute(); Map<String, String> initialCookies = initialPage.cookies(); Then you fire the next request with these cookies Connection.Response flights = Jsoup.connect(“https://www.flightview.com/TravelTools/FlightTrackerQueryResults.asp”) … Read more

[Solved] Extracting variables from Javascript inside HTML

You could use BeautifulSoup to extract the <script> tag, but you would still need an alternative approach to extract the information inside. Some Python can be used to first extract flashvars and then pass this to demjson to convert the Javascript dictionary into a Python one. For example: import demjson content = “””<script type=”text/javascript”>/* <![CDATA[ … Read more

[Solved] Click on “Show more deals” in webpage with Selenium

To click on the element with text as Show 10 more deals on the page https://www.uswitch.com/broadband/compare/deals_and_offers/ you can use the following solution: Code Block: from selenium import webdriver from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By url = “https://www.uswitch.com/broadband/compare/deals_and_offers/” options = webdriver.ChromeOptions() options.add_argument(“start-maximized”) options.add_argument(‘disable-infobars’) browser = webdriver.Chrome(chrome_options=options, executable_path=r’C:\Utility\BrowserDrivers\chromedriver.exe’) browser.get(url) … Read more

[Solved] Python – ETFs Daily Data Web Scraping

Yes, I agree that Beautiful Soup is a good approach. Here is some Python code which uses the Beautiful Soup library to extract the intraday price from the IVV fund page: import requests from bs4 import BeautifulSoup r = requests.get(“https://www.marketwatch.com/investing/fund/ivv”) html = r.text soup = BeautifulSoup(html, “html.parser”) if soup.h1.string == “Pardon Our Interruption…”: print(“They detected … Read more