web-scraping Archives - Page 4 of 5

[Solved] Web Scraping & BeautifulSoup – Next Page parsing

September 28, 2022 by Kirat

Try this: If you want cvs file then you finish the line print(df) and use df.to_csv(“prod.csv”) I have written in code to get csv file import requests from bs4 import BeautifulSoup import pandas as pd headers = {‘User-Agent’: ‘Mozilla/5.0’} temp=[] for page in range(1, 20): response = requests.get(“https://www.avbuyer.com/aircraft/private-jets/page-{page}”.format(page=page),headers=headers,) soup = BeautifulSoup(response.content, ‘html.parser’) postings = soup.find_all(‘div’, … Read more

[Solved] Web Server Block libwww-perl requests

September 26, 2022 by Kirat

Use agent method provided by LWP::UserAgent to change “user agent identification string”. It should solve blocking based on client identification string. It will not solve blocking based on abusive behavior. perldoc LWP::UserAgent agent my $agent = $ua->agent; $ua->agent(‘Checkbot/0.4 ‘); # append the default to the end $ua->agent(‘Mozilla/5.0’); $ua->agent(“”); # don’t identify Get/set the product token … Read more

[Solved] Web scraping program cannot find element which I can see in the browser

September 26, 2022 by Kirat

The element you’re interested in is dynamically generated, after the initial page load, which means that your browser executed JavaScript, made other network requests, etc. in order to build the page. Requests is just an HTTP library, and as such will not do those things. You could use a tool like Selenium, or perhaps even … Read more

[Solved] Data scraping from a list split into pages

September 24, 2022 by Kirat

I made a quick API for you to the site and managed to get more than 20 pages. If you visit the link below: https://import.io/data/mine/?id=01ac4491-e40a-4e2b-a427-c057692e3d96 you can see a button called next page that should get you the other search results after the 10th result. Let me know how you get on. 0 solved Data … Read more

[Solved] How do I convert a web-scraped table into a csv?

September 21, 2022 by Kirat

You Can use pd.read_html for this. import pandas as pd Data = pd.read_html(r’https://www.boxofficemojo.com/chart/top_lifetime_gross/’) for data in Data: data.to_csv(‘Data.csv’, ‘,’) 2.Using Bs4 import pandas as pd from bs4 import BeautifulSoup import requests URL = r’https://www.boxofficemojo.com/chart/top_lifetime_gross/’ print(‘\n>> Exctracting Data using Beautiful Soup for :’+ URL) try: res = requests.get(URL) except Exception as e: print(repr(e)) print(‘\n<> URL present … Read more

[Solved] python requests only returning empty sets when scraping

September 21, 2022 by Kirat

Selenium treats frames as separated pages (because it has to load it separatelly) and it doesn’t search in frames. And page_source doesn’t return HTML from frame. You have to find <frame> and switch to correct frame switch_to.frame(..) to work with it. frames = driver.find_elements_by_tag_name(‘frame’) driver.switch_to.frame(frames[0]) import urllib from bs4 import BeautifulSoup from selenium import webdriver … Read more

[Solved] Scraping data from a dynamic web database with Python [closed]

September 19, 2022 by Kirat

You can solve it with requests (for maintaining a web-scraping session) + BeautifulSoup (for HTML parsing) + regex for extracting a value of a javascript variable containing the desired data inside a script tag and ast.literal_eval() for making a python list out of js list: from ast import literal_eval import re from bs4 import BeautifulSoup … Read more

[Solved] Python Web Scraping – Failed to extract a list from the website

September 16, 2022 by Kirat

It most likely dynamically writes the data using JavaScript. You could use libraries like Selenium or Dryscape to get the data. You might want to look into Web Scraping JavaScritp page with Python. Or, if you insist on using scrapy, look into Selecting dynamically-loaded content. solved Python Web Scraping – Failed to extract a list … Read more

[Solved] Data screaping based on Search engines

September 15, 2022 by Kirat

You can do that using google api https://developers.google.com/custom-search/json-api/v1/overview and a related php client https://github.com/google/google-api-php-client. Later on you need to write a web scraper to download the websites (curl) and parse the html parser (i.e. https://github.com/paquettg/php-html-parser). I would, however, not recommend php for the latter task. There are much more sophisticated scraping tools available for python … Read more

[Solved] WebRequest not returning HTML

September 14, 2022 by Kirat

You need CookieCollection to get cookies and set UseCookie to true in HtmlWeb. CookieCollection cookieCollection = null; var web = new HtmlWeb { //AutoDetectEncoding = true, UseCookies = true, CacheOnly = false, PreRequest = request => { if (cookieCollection != null && cookieCollection.Count > 0) request.CookieContainer.Add(cookieCollection); return true; }, PostResponse = (request, response) => { … Read more

[Solved] Simple JS Full Page Web Scraping [closed]

September 10, 2022 by Kirat

Actually, it is possible to do from javascript. If another site has CORS enabled, you can use ajax to fetch remote url contents. If it does not CORS enabled, you can use your own server to fetch remote url contents. So, you can send ajax request to your server, your server will fetch remote contents … Read more

[Solved] Scraping Project Euler site with scrapy [closed]

September 10, 2022 by Kirat

I think I have found a simplest yet fitting solution (at least for my purpose), in respect to existent code written to scrape projecteuler: # -*- coding: utf-8 -*- import scrapy from eulerscraper.items import Problem from scrapy.loader import ItemLoader class EulerSpider(scrapy.Spider): name = “euler’ allowed_domains = [‘projecteuler.net’] start_urls = [“https://projecteuler.net/archives”] def parse(self, response): numpag = … Read more

[Solved] Python: Get data BeautifulSoup

September 4, 2022 by Kirat

You should use the bs4.Tag.find_all method or something similar. soup.find_all(attrs={“face”:”arial”,”font-size”:”16px”,”color”:”navy”}) Example: >>>import bs4 >>>html=””‘<div id=”accounts” class=”elementoOculto”> <table align=”center” border=”0″ cellspacing=0 width=”90%”> <tr><th align=”left” colspan=2> permisos </th></tr><tr> <td colspan=2> <table width=100% align=center border=0 cellspacing=1> <tr> <th align=center width=”20%”>cuen</th> <th align=center>Mods</th> </tr> </table> </td> </tr> </table> <table align=”center” border=”0″ cellspacing=1 width=”90%”> <tr bgcolor=”whitesmoke” height=”08″> <td align=”left” width=”20%”> … Read more

[Solved] Get exchange rates – help me update URL in Excel VBA code that used to work [closed]

September 1, 2022 by Kirat

Split: Now you have obtained the JSON string you can parse with Split function. Here I am reading the JSON in the comments from a cell Option Explicit Public Sub GetExchangeRate() Dim json As String json = [A1] Debug.Print Split(Split(json, “””5. Exchange Rate””: “)(1), “,”)(0) End Sub JSON Parser: Here you can use a JSON … Read more

[Solved] How to select an specific item on a drop down list on ASPX site

August 30, 2022 by Kirat

This particular web page isn’t using <select> and <option>. That suggests to me that they are using some custom JavaScript to simulate a drop-down list using the illustrated <div> and <span> elements. In addition, they are using onselect rather than onclick to trigger event handlers. I can’t replicate your test case. However, I did make … Read more