[Solved] Python – ETFs Daily Data Web Scraping


Yes, I agree that Beautiful Soup is a good approach. Here is some Python code which uses the Beautiful Soup library to extract the intraday price from the IVV fund page:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.marketwatch.com/investing/fund/ivv")
html = r.text

soup = BeautifulSoup(html, "html.parser")

if soup.h1.string == "Pardon Our Interruption...":
    print("They detected we are a bot. We hit a captcha.")
else:
    price = soup.find("h3", class_="intraday__price").find("bg-quote").string
    print(price)

The fact that the price changes frequently is not a problem. The names and classes of the HTML tags will remain constant. And this is all you need for Beautiful Soup to work.

Your main challenge is that the website is able to detect you are not using an Internet browser, and will display a captcha to your Python script. So you will need to find a method around this. Also, I recommend checking the legality of scraping and whether it violates their terms of service.

You can learn more about Beautiful Soup here:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

0

solved Python – ETFs Daily Data Web Scraping