[Solved] Can’t deal with some complicated laid-out content from a webpage


You can take advantage of CSS selector span[id$=lblResultsRaceName], which finds all spans that’s id ends with lblResultsRaceName and 'td > span', which finds all spans that have direct parent <td>:

This code snippet will go through all racing result and prints all races:

import requests
from bs4 import BeautifulSoup

url = "https://www.thedogs.com.au/Racing/Results.aspx?SearchDate=3-Jun-2018"

def get_info(session,link):
    session.headers['User-Agent'] = "Mozilla/5.0"
    res = session.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    formdata = {i['name']: i['value'] for i in soup.select('input[type=hidden]')}
    for race_name, i in  zip(soup.select('span[id$=lblResultsRaceName]'), soup.select('input[id$=btnViewResults]')):
        print(race_name.text.strip())
        formdata[i['name']] = 'Results'
        req = session.post(link,data = formdata)
        soup = BeautifulSoup(req.text,"lxml")
        for panel in soup.select('div[id^=ctl00_ContentPlaceHolder1_tabContainerRaces_tabRace]'):
            print(panel.select('td > span')[0].text.strip(), panel.select('td > span')[1].text.strip())
        print('#' * 80)

if __name__ == '__main__':
    with requests.Session() as session:
        get_info(session,url)

Prints:

Healsville
Race 1 Grade:  M   300 metres
Race 2 Grade:  M   350 metres
Race 3 Grade:  6/7   350 metres
Race 4 Grade:  R/W   300 metres
Race 5 Grade:  5   350 metres
Race 6 Grade:  SE   350 metres
Race 7 Grade:  4/5   350 metres
Race 8 Grade:  SE   350 metres
Race 9 Grade:  7   300 metres
Race 10 Grade:  6/7   300 metres
Race 11 Grade:  4/5   300 metres
Race 12 Grade:  5   300 metres
################################################################################
Sale
Race 1 Grade:  M   440 metres
Race 2 Grade:  M   440 metres
Race 3 Grade:  R/W   520 metres
Race 4 Grade:  7   440 metres
Race 5 Grade:  R/W   440 metres
Race 6 Grade:  4/5   520 metres
Race 7 Grade:  R/W   440 metres
Race 8 Grade:  4/5   440 metres
Race 9 Grade:  6/7   440 metres
Race 10 Grade:  R/W   440 metres
Race 11 Grade:  R/W   440 metres
Race 12 Grade:  5   520 metres
################################################################################
...and so on.

4

solved Can’t deal with some complicated laid-out content from a webpage