[Solved] How to scrape HTML using Python for NOWTV available movies


You can mimic what the page is doing in terms of paginated results (https://www.nowtv.com/stream/all-movies/page/1) and extract movies from the script tag of each page. Although the below could use some re-factoring it shows how to obtain the total number of films, calculate the films per page, and issue requests to get all films using Session for efficiency. Result is 1425 movies.

import requests
import re
import json
import math
import pandas as pd

titles = []
links = []
base="https://www.nowtv.com"
headers = {'User-Agent' : 'Mozilla/5.0'}

with requests.Session() as s:
    res = s.get('https://www.nowtv.com/stream/all-movies/page/1') 
    r = re.compile(r"var propStore = (.*);")
    data = json.loads(r.findall(res.text)[0])
    first_section = data[next(iter(data))]
    movies_section = first_section['props']['data']['list']
    movies_per_page = len(movies_section)
    total_movies = int(first_section['props']['data']['count'])
    pages = math.ceil(total_movies / movies_per_page)

    for movie in movies_section:
        titles.append(movie['title'])
        links.append(base + movie['slug'])

    if pages > 1:
        for page in range(2, pages + 1):
            res = s.get('https://www.nowtv.com/stream/all-movies/page/{}'.format(page)) 
            r = re.compile(r"var propStore = (.*);")
            data = json.loads(r.findall(res.text)[0])
            first_section = data[next(iter(data))]
            movies_section = first_section['props']['data']['list']
            for movie in movies_section:
                titles.append(movie['title'])
                links.append(base + movie['slug'])

df = pd.DataFrame(list(zip(titles, links)), columns = ['Title', 'Link'])

solved How to scrape HTML using Python for NOWTV available movies