{"id":4516,"date":"2022-08-23T05:13:16","date_gmt":"2022-08-22T23:43:16","guid":{"rendered":"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/"},"modified":"2022-08-23T05:13:16","modified_gmt":"2022-08-22T23:43:16","slug":"solved-scrape-the-about-page-of-websites-with-python-closed","status":"publish","type":"post","link":"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/","title":{"rendered":"[Solved] scrape the about page of websites with Python [closed]"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div id=\"answer-11707760\" class=\"answer js-answer accepted-answer js-accepted-answer\" data-answerid=\"11707760\" data-parentid=\"11707709\" data-score=\"3\" data-position-on-page=\"1\" data-highest-scored=\"1\" data-question-has-accepted-highest-score=\"1\" itemprop=\"acceptedAnswer\" itemscope itemtype=\"https:\/\/schema.org\/Answer\">\n<div class=\"post-layout\">\n<div class=\"votecell post-layout--left\"><\/div>\n<div class=\"answercell post-layout--right\">\n<div class=\"s-prose js-post-body\" itemprop=\"text\">\n<p>Depending on how redundant is the structure of the data you want to extract, you could use several tools.<\/p>\n<ul>\n<li>If you&#8217;re looking for extracting data always stored in the same DOM structure, <a rel=\"nofollow noopener\" target=\"_blank\" href=\"http:\/\/scrapy.org\">Scrapy<\/a> could do the job.<\/li>\n<li>If the data is sparse and is stored in various places, maybe <a rel=\"nofollow noopener\" target=\"_blank\" href=\"http:\/\/www.crummy.com\/software\/BeautifulSoup\/bs4\/doc\/\">BeautfulSoup4<\/a> or <a rel=\"nofollow noopener\" target=\"_blank\" href=\"http:\/\/lxml.de\/index.html#documentation\">lxml<\/a> could help you.<\/li>\n<li>If the data is generated by some JS code, have a look at <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/selenium.googlecode.com\/svn\/trunk\/docs\/api\/py\/index.html\">Selenium<\/a><\/li>\n<\/ul>\n<p>Here are a couple of resources you might find useful:<\/p>\n<ul>\n<li>PyCon 2012 Tutorial about web-scraping: <a rel=\"nofollow noopener\" target=\"_blank\" href=\"http:\/\/pyvideo.org\/video\/609\/web-scraping-reliably-and-efficiently-pull-data\/\">http:\/\/pyvideo.org\/video\/609\/web-scraping-reliably-and-efficiently-pull-data\/<\/a><\/li>\n<li><a rel=\"nofollow noopener\" target=\"_blank\" href=\"http:\/\/isbullsh.it\/2012\/04\/Web-crawling-with-scrapy\/\">http:\/\/isbullsh.it\/2012\/04\/Web-crawling-with-scrapy\/<\/a> (full disclosure, I wrote that)<\/li>\n<li><a rel=\"nofollow noopener\" target=\"_blank\" href=\"http:\/\/www.packtpub.com\/article\/web-scraping-with-python\">http:\/\/www.packtpub.com\/article\/web-scraping-with-python<\/a><\/li>\n<li><a rel=\"nofollow noopener\" target=\"_blank\" href=\"http:\/\/wwwsearch.sourceforge.net\/mechanize\/\">http:\/\/wwwsearch.sourceforge.net\/mechanize\/<\/a><\/li>\n<\/ul>\n<\/div>\n<div class=\"mt24\"><\/div>\n<\/div>\n<p>            <span class=\"d-none\" itemprop=\"commentCount\">1<\/span> <\/p><\/div>\n<\/div>\n<p>[ad_2]<\/p>\n<p>solved scrape the about page of websites with Python [closed] <\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Depending on how redundant is the structure of the data you want to extract, you could use several tools. If you&#8217;re looking for extracting data always stored in the same DOM structure, Scrapy could do the job. If the data is sparse and is stored in various places, maybe BeautfulSoup4 or lxml could help &#8230; <a title=\"[Solved] scrape the about page of websites with Python [closed]\" class=\"read-more\" href=\"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/\" aria-label=\"More on [Solved] scrape the about page of websites with Python [closed]\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[320],"tags":[349],"class_list":["post-4516","post","type-post","status-publish","format-standard","hentry","category-solved","tag-python"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>[Solved] scrape the about page of websites with Python [closed] - JassWeb<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"[Solved] scrape the about page of websites with Python [closed] - JassWeb\" \/>\n<meta property=\"og:description\" content=\"[ad_1] Depending on how redundant is the structure of the data you want to extract, you could use several tools. If you&#8217;re looking for extracting data always stored in the same DOM structure, Scrapy could do the job. If the data is sparse and is stored in various places, maybe BeautfulSoup4 or lxml could help ... Read more\" \/>\n<meta property=\"og:url\" content=\"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/\" \/>\n<meta property=\"og:site_name\" content=\"JassWeb\" \/>\n<meta property=\"article:published_time\" content=\"2022-08-22T23:43:16+00:00\" \/>\n<meta name=\"author\" content=\"Kirat\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kirat\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/\"},\"author\":{\"name\":\"Kirat\",\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31\"},\"headline\":\"[Solved] scrape the about page of websites with Python [closed]\",\"datePublished\":\"2022-08-22T23:43:16+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/\"},\"wordCount\":131,\"publisher\":{\"@id\":\"https:\/\/jassweb.com\/solved\/#organization\"},\"keywords\":[\"python\"],\"articleSection\":[\"Solved\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/\",\"url\":\"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/\",\"name\":\"[Solved] scrape the about page of websites with Python [closed] - JassWeb\",\"isPartOf\":{\"@id\":\"https:\/\/jassweb.com\/solved\/#website\"},\"datePublished\":\"2022-08-22T23:43:16+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/jassweb.com\/solved\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"[Solved] scrape the about page of websites with Python [closed]\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/jassweb.com\/solved\/#website\",\"url\":\"https:\/\/jassweb.com\/solved\/\",\"name\":\"JassWeb\",\"description\":\"Build High-quality Websites\",\"publisher\":{\"@id\":\"https:\/\/jassweb.com\/solved\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/jassweb.com\/solved\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/jassweb.com\/solved\/#organization\",\"name\":\"Jass Web\",\"url\":\"https:\/\/jassweb.com\/solved\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png\",\"contentUrl\":\"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png\",\"width\":693,\"height\":132,\"caption\":\"Jass Web\"},\"image\":{\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31\",\"name\":\"Kirat\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/jassweb.com\/solved\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1775798750\",\"contentUrl\":\"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1775798750\",\"caption\":\"Kirat\"},\"sameAs\":[\"http:\/\/jassweb.com\"],\"url\":\"https:\/\/jassweb.com\/solved\/author\/jaspritsinghghumangmail-com\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"[Solved] scrape the about page of websites with Python [closed] - JassWeb","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/","og_locale":"en_US","og_type":"article","og_title":"[Solved] scrape the about page of websites with Python [closed] - JassWeb","og_description":"[ad_1] Depending on how redundant is the structure of the data you want to extract, you could use several tools. If you&#8217;re looking for extracting data always stored in the same DOM structure, Scrapy could do the job. If the data is sparse and is stored in various places, maybe BeautfulSoup4 or lxml could help ... Read more","og_url":"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/","og_site_name":"JassWeb","article_published_time":"2022-08-22T23:43:16+00:00","author":"Kirat","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Kirat","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/#article","isPartOf":{"@id":"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/"},"author":{"name":"Kirat","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31"},"headline":"[Solved] scrape the about page of websites with Python [closed]","datePublished":"2022-08-22T23:43:16+00:00","mainEntityOfPage":{"@id":"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/"},"wordCount":131,"publisher":{"@id":"https:\/\/jassweb.com\/solved\/#organization"},"keywords":["python"],"articleSection":["Solved"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/","url":"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/","name":"[Solved] scrape the about page of websites with Python [closed] - JassWeb","isPartOf":{"@id":"https:\/\/jassweb.com\/solved\/#website"},"datePublished":"2022-08-22T23:43:16+00:00","breadcrumb":{"@id":"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/jassweb.com\/solved\/solved-scrape-the-about-page-of-websites-with-python-closed\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/jassweb.com\/solved\/"},{"@type":"ListItem","position":2,"name":"[Solved] scrape the about page of websites with Python [closed]"}]},{"@type":"WebSite","@id":"https:\/\/jassweb.com\/solved\/#website","url":"https:\/\/jassweb.com\/solved\/","name":"JassWeb","description":"Build High-quality Websites","publisher":{"@id":"https:\/\/jassweb.com\/solved\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/jassweb.com\/solved\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/jassweb.com\/solved\/#organization","name":"Jass Web","url":"https:\/\/jassweb.com\/solved\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/","url":"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png","contentUrl":"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png","width":693,"height":132,"caption":"Jass Web"},"image":{"@id":"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31","name":"Kirat","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/person\/image\/","url":"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1775798750","contentUrl":"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1775798750","caption":"Kirat"},"sameAs":["http:\/\/jassweb.com"],"url":"https:\/\/jassweb.com\/solved\/author\/jaspritsinghghumangmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/posts\/4516","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/comments?post=4516"}],"version-history":[{"count":0,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/posts\/4516\/revisions"}],"wp:attachment":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/media?parent=4516"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/categories?post=4516"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/tags?post=4516"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}