{"id":21876,"date":"2022-11-16T11:23:04","date_gmt":"2022-11-16T05:53:04","guid":{"rendered":"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/"},"modified":"2022-11-16T11:23:04","modified_gmt":"2022-11-16T05:53:04","slug":"solved-extract-all-pages-from-a-table","status":"publish","type":"post","link":"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/","title":{"rendered":"[Solved] Extract all pages from a Table"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div id=\"answer-50668024\" class=\"answer js-answer accepted-answer js-accepted-answer\" data-answerid=\"50668024\" data-parentid=\"50667127\" data-score=\"0\" data-position-on-page=\"1\" data-highest-scored=\"1\" data-question-has-accepted-highest-score=\"1\" itemprop=\"acceptedAnswer\" itemscope itemtype=\"https:\/\/schema.org\/Answer\">\n<div class=\"post-layout\">\n<div class=\"votecell post-layout--left\"><\/div>\n<div class=\"answercell post-layout--right\">\n<div class=\"s-prose js-post-body\" itemprop=\"text\">\n<p>To scrape all the pages, observe that trailing parameter in the url increments by <code>2<\/code>, rather than <code>1<\/code>. Thus, the code below finds the maximum page in the listing, multiples the latter result by 2, and utilizes the result as a range:<\/p>\n<pre><code>import requests, re, contextlib\nfrom bs4 import BeautifulSoup as soup\nimport csv\n\n@contextlib.contextmanager\ndef scrape_table(url):\n  d = soup(requests.get(url).text, 'html.parser')\n  headers = [i.text for i in d.find_all('td', {'class':re.compile('table\\-top')})]\n  full_table = [i.text for i in d.find_all('td', {'class':re.compile('screener-body-table-nw')})]\n  grouped_table = [full_table[i:i+len(headers)] for i in range(0, len(full_table), len(headers))]\n  yield [dict(zip(headers, i)) for i in grouped_table]\n\nstart=\"https:\/\/www.finviz.com\/screener.ashx?v=161\"\nmax_link = int(max(soup(requests.get(start).text, 'html.parser').find_all('a', {'class':'screener-pages'}), key=lambda x:int(x.text)).text)\nheaders = [i.text for i in soup(requests.get(start).text, 'html.parser').find_all('td', {'class':re.compile('table\\-top')})]\nwith open('filename.csv', 'a') as f:\n  write = csv.writer(f)\n  write.writerow(headers)\n  with scrape_table(start) as r1:\n    write.writerows(list(filter(None, [[i[b] for b in headers] for i in r1])))\n  for i in range(1, max_link):\n    url=\"https:\/\/www.finviz.com\/screener.ashx?v=161&amp;r={}1\".format(i*2)\n    with scrape_table(url) as result:\n      _r = list(filter(None, [[i[b] for b in headers] for i in result]))\n      if _r:\n        write.writerows(_r)\n<\/code><\/pre>\n<p>Result (first page):<\/p>\n<pre><code>{'Profit M': '4.60%', 'Ticker': 'ABAC', 'Price': '2.17', 'ROI': '1.10%', 'Quick R': '17.40', 'Market Cap': '17.60M', 'Curr R': '19.00', 'Gross M': '16.30%', 'ROA': '1.40%', 'Dividend': '-', 'Earnings': 'Apr 02\/a', 'LTDebt\/Eq': '0.00', 'No.': '21', 'Volume': '1,744', 'ROE': '1.40%', 'Debt\/Eq': '0.02', 'Change': '-1.36%', 'Oper M': '4.20%'}, {'Profit M': '11.10%', 'Ticker': 'ABAX', 'Price': '82.74', 'ROI': '8.90%', 'Quick R': '5.00', 'Market Cap': '1.88B', 'Curr R': '6.00', 'Gross M': '54.60%', 'ROA': '8.30%', 'Dividend': '0.87%', 'Earnings': 'Apr 26\/a', 'LTDebt\/Eq': '0.00', 'No.': '22', 'Volume': '253,661', 'ROE': '9.70%', 'Debt\/Eq': '0.00', 'Change': '-0.07%', 'Oper M': '15.80%'}, {'Profit M': '5.90%', 'Ticker': 'ABB', 'Price': '23.23', 'ROI': '11.50%', 'Quick R': '0.90', 'Market Cap': '48.87B', 'Curr R': '1.20', 'Gross M': '30.40%', 'ROA': '4.90%', 'Dividend': '3.57%', 'Earnings': 'Apr 19\/b', 'LTDebt\/Eq': '0.39', 'No.': '23', 'Volume': '1,371,355', 'ROE': '14.80%', 'Debt\/Eq': '0.58', 'Change': '2.15%', 'Oper M': '9.40%'}, {'Profit M': '21.50%', 'Ticker': 'ABBV', 'Price': '98.05', 'ROI': '25.40%', 'Quick R': '1.10', 'Market Cap': '157.01B', 'Curr R': '1.20', 'Gross M': '75.80%', 'ROA': '9.20%', 'Dividend': '3.92%', 'Earnings': 'Apr 26\/b', 'LTDebt\/Eq': '8.70', 'No.': '24', 'Volume': '14,832,723', 'ROE': '119.30%', 'Debt\/Eq': '10.49', 'Change': '-0.90%', 'Oper M': '34.00%'}, {'Profit M': '0.50%', 'Ticker': 'ABC', 'Price': '83.34', 'ROI': '8.70%', 'Quick R': '0.50', 'Market Cap': '18.05B', 'Curr R': '0.90', 'Gross M': '2.90%', 'ROA': '2.40%', 'Dividend': '1.82%', 'Earnings': 'May 02\/b', 'LTDebt\/Eq': '1.46', 'No.': '25', 'Volume': '1,020,497', 'ROE': '32.00%', 'Debt\/Eq': '1.53', 'Change': '1.46%', 'Oper M': '0.60%'}\n<\/code><\/pre>\n<\/p><\/div>\n<div class=\"mt24\"><\/div>\n<\/div>\n<p>            <span class=\"d-none\" itemprop=\"commentCount\">0<\/span> <\/p><\/div>\n<\/div>\n<p>[ad_2]<\/p>\n<p>solved Extract all pages from a Table <\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] To scrape all the pages, observe that trailing parameter in the url increments by 2, rather than 1. Thus, the code below finds the maximum page in the listing, multiples the latter result by 2, and utilizes the result as a range: import requests, re, contextlib from bs4 import BeautifulSoup as soup import csv &#8230; <a title=\"[Solved] Extract all pages from a Table\" class=\"read-more\" href=\"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/\" aria-label=\"More on [Solved] Extract all pages from a Table\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[320],"tags":[622,483,349],"class_list":["post-21876","post","type-post","status-publish","format-standard","hentry","category-solved","tag-beautifulsoup","tag-csv","tag-python"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>[Solved] Extract all pages from a Table - JassWeb<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"[Solved] Extract all pages from a Table - JassWeb\" \/>\n<meta property=\"og:description\" content=\"[ad_1] To scrape all the pages, observe that trailing parameter in the url increments by 2, rather than 1. Thus, the code below finds the maximum page in the listing, multiples the latter result by 2, and utilizes the result as a range: import requests, re, contextlib from bs4 import BeautifulSoup as soup import csv ... Read more\" \/>\n<meta property=\"og:url\" content=\"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/\" \/>\n<meta property=\"og:site_name\" content=\"JassWeb\" \/>\n<meta property=\"article:published_time\" content=\"2022-11-16T05:53:04+00:00\" \/>\n<meta name=\"author\" content=\"Kirat\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kirat\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/\"},\"author\":{\"name\":\"Kirat\",\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31\"},\"headline\":\"[Solved] Extract all pages from a Table\",\"datePublished\":\"2022-11-16T05:53:04+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/\"},\"wordCount\":58,\"publisher\":{\"@id\":\"https:\/\/jassweb.com\/solved\/#organization\"},\"keywords\":[\"beautifulsoup\",\"csv\",\"python\"],\"articleSection\":[\"Solved\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/\",\"url\":\"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/\",\"name\":\"[Solved] Extract all pages from a Table - JassWeb\",\"isPartOf\":{\"@id\":\"https:\/\/jassweb.com\/solved\/#website\"},\"datePublished\":\"2022-11-16T05:53:04+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/jassweb.com\/solved\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"[Solved] Extract all pages from a Table\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/jassweb.com\/solved\/#website\",\"url\":\"https:\/\/jassweb.com\/solved\/\",\"name\":\"JassWeb\",\"description\":\"Build High-quality Websites\",\"publisher\":{\"@id\":\"https:\/\/jassweb.com\/solved\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/jassweb.com\/solved\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/jassweb.com\/solved\/#organization\",\"name\":\"Jass Web\",\"url\":\"https:\/\/jassweb.com\/solved\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png\",\"contentUrl\":\"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png\",\"width\":693,\"height\":132,\"caption\":\"Jass Web\"},\"image\":{\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31\",\"name\":\"Kirat\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1776403586\",\"contentUrl\":\"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1776403586\",\"caption\":\"Kirat\"},\"sameAs\":[\"http:\/\/jassweb.com\"],\"url\":\"https:\/\/jassweb.com\/solved\/author\/jaspritsinghghumangmail-com\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"[Solved] Extract all pages from a Table - JassWeb","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/","og_locale":"en_US","og_type":"article","og_title":"[Solved] Extract all pages from a Table - JassWeb","og_description":"[ad_1] To scrape all the pages, observe that trailing parameter in the url increments by 2, rather than 1. Thus, the code below finds the maximum page in the listing, multiples the latter result by 2, and utilizes the result as a range: import requests, re, contextlib from bs4 import BeautifulSoup as soup import csv ... Read more","og_url":"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/","og_site_name":"JassWeb","article_published_time":"2022-11-16T05:53:04+00:00","author":"Kirat","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Kirat","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/#article","isPartOf":{"@id":"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/"},"author":{"name":"Kirat","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31"},"headline":"[Solved] Extract all pages from a Table","datePublished":"2022-11-16T05:53:04+00:00","mainEntityOfPage":{"@id":"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/"},"wordCount":58,"publisher":{"@id":"https:\/\/jassweb.com\/solved\/#organization"},"keywords":["beautifulsoup","csv","python"],"articleSection":["Solved"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/","url":"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/","name":"[Solved] Extract all pages from a Table - JassWeb","isPartOf":{"@id":"https:\/\/jassweb.com\/solved\/#website"},"datePublished":"2022-11-16T05:53:04+00:00","breadcrumb":{"@id":"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/jassweb.com\/solved\/solved-extract-all-pages-from-a-table\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/jassweb.com\/solved\/"},{"@type":"ListItem","position":2,"name":"[Solved] Extract all pages from a Table"}]},{"@type":"WebSite","@id":"https:\/\/jassweb.com\/solved\/#website","url":"https:\/\/jassweb.com\/solved\/","name":"JassWeb","description":"Build High-quality Websites","publisher":{"@id":"https:\/\/jassweb.com\/solved\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/jassweb.com\/solved\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/jassweb.com\/solved\/#organization","name":"Jass Web","url":"https:\/\/jassweb.com\/solved\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/","url":"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png","contentUrl":"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png","width":693,"height":132,"caption":"Jass Web"},"image":{"@id":"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31","name":"Kirat","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/person\/image\/","url":"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1776403586","contentUrl":"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1776403586","caption":"Kirat"},"sameAs":["http:\/\/jassweb.com"],"url":"https:\/\/jassweb.com\/solved\/author\/jaspritsinghghumangmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/posts\/21876","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/comments?post=21876"}],"version-history":[{"count":0,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/posts\/21876\/revisions"}],"wp:attachment":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/media?parent=21876"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/categories?post=21876"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/tags?post=21876"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}