[Solved] python requests only returning empty sets when scraping

Selenium treats frames as separated pages (because it has to load it separatelly) and it doesn’t search in frames. And page_source doesn’t return HTML from frame. You have to find <frame> and switch to correct frame switch_to.frame(..) to work with it. frames = driver.find_elements_by_tag_name(‘frame’) driver.switch_to.frame(frames[0]) import urllib from bs4 import BeautifulSoup from selenium import webdriver … Read more

[Solved] Scraping data from a dynamic web database with Python [closed]

You can solve it with requests (for maintaining a web-scraping session) + BeautifulSoup (for HTML parsing) + regex for extracting a value of a javascript variable containing the desired data inside a script tag and ast.literal_eval() for making a python list out of js list: from ast import literal_eval import re from bs4 import BeautifulSoup … Read more

[Solved] Python Web Scraping – Failed to extract a list from the website

It most likely dynamically writes the data using JavaScript. You could use libraries like Selenium or Dryscape to get the data. You might want to look into Web Scraping JavaScritp page with Python. Or, if you insist on using scrapy, look into Selecting dynamically-loaded content. solved Python Web Scraping – Failed to extract a list … Read more

[Solved] Data screaping based on Search engines

You can do that using google api https://developers.google.com/custom-search/json-api/v1/overview and a related php client https://github.com/google/google-api-php-client. Later on you need to write a web scraper to download the websites (curl) and parse the html parser (i.e. https://github.com/paquettg/php-html-parser). I would, however, not recommend php for the latter task. There are much more sophisticated scraping tools available for python … Read more

[Solved] WebRequest not returning HTML

You need CookieCollection to get cookies and set UseCookie to true in HtmlWeb. CookieCollection cookieCollection = null; var web = new HtmlWeb { //AutoDetectEncoding = true, UseCookies = true, CacheOnly = false, PreRequest = request => { if (cookieCollection != null && cookieCollection.Count > 0) request.CookieContainer.Add(cookieCollection); return true; }, PostResponse = (request, response) => { … Read more

[Solved] Scraping Project Euler site with scrapy [closed]

I think I have found a simplest yet fitting solution (at least for my purpose), in respect to existent code written to scrape projecteuler: # -*- coding: utf-8 -*- import scrapy from eulerscraper.items import Problem from scrapy.loader import ItemLoader class EulerSpider(scrapy.Spider): name = “euler’ allowed_domains = [‘projecteuler.net’] start_urls = [“https://projecteuler.net/archives”] def parse(self, response): numpag = … Read more

[Solved] Python: Get data BeautifulSoup

You should use the bs4.Tag.find_all method or something similar. soup.find_all(attrs={“face”:”arial”,”font-size”:”16px”,”color”:”navy”}) Example: >>>import bs4 >>>html=””‘<div id=”accounts” class=”elementoOculto”> <table align=”center” border=”0″ cellspacing=0 width=”90%”> <tr><th align=”left” colspan=2> permisos </th></tr><tr> <td colspan=2> <table width=100% align=center border=0 cellspacing=1> <tr> <th align=center width=”20%”>cuen</th> <th align=center>Mods</th> </tr> </table> </td> </tr> </table> <table align=”center” border=”0″ cellspacing=1 width=”90%”> <tr bgcolor=”whitesmoke” height=”08″> <td align=”left” width=”20%”> … Read more

[Solved] Get exchange rates – help me update URL in Excel VBA code that used to work [closed]

Split: Now you have obtained the JSON string you can parse with Split function. Here I am reading the JSON in the comments from a cell Option Explicit Public Sub GetExchangeRate() Dim json As String json = [A1] Debug.Print Split(Split(json, “””5. Exchange Rate””: “)(1), “,”)(0) End Sub JSON Parser: Here you can use a JSON … Read more

[Solved] How to select an specific item on a drop down list on ASPX site

This particular web page isn’t using <select> and <option>. That suggests to me that they are using some custom JavaScript to simulate a drop-down list using the illustrated <div> and <span> elements. In addition, they are using onselect rather than onclick to trigger event handlers. I can’t replicate your test case. However, I did make … Read more