[Solved] python requests only returning empty sets when scraping

Question

Selenium treats frames as separated pages (because it has to load it separatelly) and it doesn’t search in frames. And page_source doesn’t return HTML from frame.

You have to find <frame> and switch to correct frame switch_to.frame(..) to work with it.

frames = driver.find_elements_by_tag_name('frame')
driver.switch_to.frame(frames[0])

import urllib
from bs4 import BeautifulSoup
from selenium import webdriver

url="http://oulim.kr/"

driver = webdriver.Chrome('./driver/chromedriver')
driver.get(url)

# --- switch frame ---

frames = driver.find_elements_by_tag_name('frame')
driver.switch_to.frame(frames[0])

# --- CSS without BeautifulSoup ---

a = driver.find_element_by_css_selector("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a.text)

# --- CSS with BeautifulSoup ---

html = driver.page_source
soup = BeautifulSoup(html)

a = soup.select("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a[0].text)

Accepted Answer

Selenium treats frames as separated pages (because it has to load it separatelly) and it doesn’t search in frames. And page_source doesn’t return HTML from frame.

You have to find <frame> and switch to correct frame switch_to.frame(..) to work with it.

frames = driver.find_elements_by_tag_name('frame')
driver.switch_to.frame(frames[0])

import urllib
from bs4 import BeautifulSoup
from selenium import webdriver

url="http://oulim.kr/"

driver = webdriver.Chrome('./driver/chromedriver')
driver.get(url)

# --- switch frame ---

frames = driver.find_elements_by_tag_name('frame')
driver.switch_to.frame(frames[0])

# --- CSS without BeautifulSoup ---

a = driver.find_element_by_css_selector("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a.text)

# --- CSS with BeautifulSoup ---

html = driver.page_source
soup = BeautifulSoup(html)

a = soup.select("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a[0].text)