beautifulsoup Archives - Page 2 of 3

[Solved] Unable to communicate with API

October 22, 2022 by Kirat

CONCLUSION – 07-25-2021 After looking at this problem in more detail, I believe that it is NOT technically possible to use Python Requests to scrape the website and table in your question. Which means that your question cannot be solved in the manner that you would prefer. Why? The website employs anti-scraping mechanisms. The GBK … Read more

[Solved] BeautifulSoup – Amazon and Google identify me as a robot; how can i fix it?

October 12, 2022 by Kirat

Rotating proxies Delays Avoid the same pattern IP rate limit (probably your issue) IP rate limit. It’s a basic security system that can ban or block incoming requests from the same IP. It means that a regular user would not make 100 requests to the same domain in a few seconds with the exact same … Read more

[Solved] BeautifulSoup table data extraction – data not showing up

October 11, 2022 by Kirat

As you yourself found out, the element is not present in the page source, and is loaded dynamically through an AJAX request. The urllib module (or requests) returns the page source, which is why you won’t be able to get that value directly. Go to Developer Tools > Network > XHR and refresh the page. … Read more

[Solved] How to scrape multiple result having same tags and class

October 11, 2022 by Kirat

You need to parse your data from the script tag rather than the spans and divs. Try this: import requests from bs4 import BeautifulSoup import re import pandas as pd from pandas import json_normalize import json def get_page(url): response = requests.get(url) if not response.ok: print(‘server responded:’, response.status_code) else: soup = BeautifulSoup(response.text, ‘lxml’) return soup def … Read more

[Solved] Can’t extract an email address from a webpage

October 10, 2022 by Kirat

There are no email addresses on that page. This is a typical way that is used to make contacting possible without giving an email address to the public. What happens when you press the “Send enquiry” -button is that your browser sends a HTTP POST request towards some address*, to a webserver, which then handles … Read more

[Solved] How to get data from a combobox using Beautifulsoup and Python?

October 8, 2022 by Kirat

From what I can see of the html, there is no span with id=”sexo- button”, so BeautifulSoup(login_request.text, ‘lxml’).find(“span”,id=”sexo- button”) would have returned None, which is why you got the error from get_text. As for your second attempt, I don’t think bs4 Tags have a value property, which is why you’d be getting None that time. … Read more

[Solved] How to get a link with web scraping

October 8, 2022 by Kirat

In the future, provide some code to show what you have attempted. I have expanded on Fabix answer. The following code gets the Youtube link, song name, and artist for all 20 pages on the source website. from bs4 import BeautifulSoup import requests master_url=”https://www.last.fm/tag/rock/tracks?page={}” headers = { “User-Agent”: “Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like … Read more

[Solved] How to get similar tags in beautiful soup?

October 4, 2022 by Kirat

for a in soup.select(“#listing-details-list li span”): There is no problem with this line, assuming you’re trying to get all the span tags under the listing-details-list id. See: for a in soup.select(“#listing-details-list li span”): print a <span> Property Reference: </span> <span> Furnished: </span> <span> Listed By: </span> <span> Rent Is Paid: </span> <span> Building: </span> <span> … Read more

[Solved] How to navigate through HTMl pages that have paging for their content using Python? [closed]

October 3, 2022 by Kirat

As I can see in this page, you need to interact with java script that is invoked by button Go or Next Page button. For Go button you need to fill the textbox each time. You can use different approaches to work around this: 1) Selenium – Web Browser Automation 2) spynner – Programmatic web … Read more

[Solved] Extracting variables from Javascript inside HTML

October 2, 2022 by Kirat

You could use BeautifulSoup to extract the <script> tag, but you would still need an alternative approach to extract the information inside. Some Python can be used to first extract flashvars and then pass this to demjson to convert the Javascript dictionary into a Python one. For example: import demjson content = “””<script type=”text/javascript”>/* <![CDATA[ … Read more

[Solved] Python – ETFs Daily Data Web Scraping

September 28, 2022 by Kirat

Yes, I agree that Beautiful Soup is a good approach. Here is some Python code which uses the Beautiful Soup library to extract the intraday price from the IVV fund page: import requests from bs4 import BeautifulSoup r = requests.get(“https://www.marketwatch.com/investing/fund/ivv”) html = r.text soup = BeautifulSoup(html, “html.parser”) if soup.h1.string == “Pardon Our Interruption…”: print(“They detected … Read more

[Solved] Web Scraping & BeautifulSoup – Next Page parsing

September 28, 2022 by Kirat

Try this: If you want cvs file then you finish the line print(df) and use df.to_csv(“prod.csv”) I have written in code to get csv file import requests from bs4 import BeautifulSoup import pandas as pd headers = {‘User-Agent’: ‘Mozilla/5.0’} temp=[] for page in range(1, 20): response = requests.get(“https://www.avbuyer.com/aircraft/private-jets/page-{page}”.format(page=page),headers=headers,) soup = BeautifulSoup(response.content, ‘html.parser’) postings = soup.find_all(‘div’, … Read more

[Solved] parse a HTML file with table using Python

September 26, 2022 by Kirat

Find all tr tags and get td tags by class attribute: # encoding: utf-8 from bs4 import BeautifulSoup data = u””” <table> <tr> <td class=”zeit”><div>03.12. 10:45:00</div></td> <td class=”system”><div><a target=”_blank” href=”https://stackoverflow.com/questions/27272247/detail.php?host=CG&factor=2&delay=1&Y=15″>CG</div></a></td> <td class=”fehlertext”><div>System steht nicht zur Verfügung!</div></td> </tr> <tr> <td class=”zeit”><div>03.12. 10:10:01</div></td> <td class=”system”><div><a target=”_blank” href=”detail.php?host=DEXProd&factor=2&delay=5&Y=15″>DEX</div></a></td> <td class=”fehlertext”><div>ssh: Connection refused Couldn’t read packet: Connection reset by … Read more

[Solved] Web scraping program cannot find element which I can see in the browser

September 26, 2022 by Kirat

The element you’re interested in is dynamically generated, after the initial page load, which means that your browser executed JavaScript, made other network requests, etc. in order to build the page. Requests is just an HTTP library, and as such will not do those things. You could use a tool like Selenium, or perhaps even … Read more

[Solved] How do I convert a web-scraped table into a csv?

September 21, 2022 by Kirat

You Can use pd.read_html for this. import pandas as pd Data = pd.read_html(r’https://www.boxofficemojo.com/chart/top_lifetime_gross/’) for data in Data: data.to_csv(‘Data.csv’, ‘,’) 2.Using Bs4 import pandas as pd from bs4 import BeautifulSoup import requests URL = r’https://www.boxofficemojo.com/chart/top_lifetime_gross/’ print(‘\n>> Exctracting Data using Beautiful Soup for :’+ URL) try: res = requests.get(URL) except Exception as e: print(repr(e)) print(‘\n<> URL present … Read more