{"id":6005,"date":"2022-08-31T21:48:05","date_gmt":"2022-08-31T16:18:05","guid":{"rendered":"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/"},"modified":"2022-08-31T21:48:05","modified_gmt":"2022-08-31T16:18:05","slug":"solved-convert-pdf-to-excel-closed","status":"publish","type":"post","link":"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/","title":{"rendered":"[Solved] Convert PDF to Excel [closed]"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div id=\"answer-30371480\" class=\"answer js-answer accepted-answer js-accepted-answer\" data-answerid=\"30371480\" data-parentid=\"30370507\" data-score=\"2\" data-position-on-page=\"1\" data-highest-scored=\"1\" data-question-has-accepted-highest-score=\"1\" itemprop=\"acceptedAnswer\" itemscope itemtype=\"https:\/\/schema.org\/Answer\">\n<div class=\"post-layout\">\n<div class=\"votecell post-layout--left\"><\/div>\n<div class=\"answercell post-layout--right\">\n<div class=\"s-prose js-post-body\" itemprop=\"text\">\n<p>Getting data out from a pdf file is pretty messy. If the pdf table is ordered and has got a unique pattern embedded along with it, the best way to get the data is by converting the pdf to xml. For this you can use: <strong>pdftohtml<\/strong>. <\/p>\n<p>Installation: <code>sudo apt-get install pdftohtml<\/code><\/p>\n<p>Usage: <code>pdftohtml -xml *Your File.pdf* *Output File.xml*<\/code><\/p>\n<p>You can run this command directly in the terminal.<\/p>\n<p>The xml file which you will get now will have tags just like html which you can use to get the data from the generated xml output.<\/p>\n<p>PS: One thing to be noted if the pdf table is not ordered then it becomes very difficult to get the data out from that xml because the tags will have some attributes which will not match the pattern. In that case you will need to hard code things.<\/p>\n<\/p><\/div>\n<div class=\"mt24\"><\/div>\n<\/div>\n<p>            <span class=\"d-none\" itemprop=\"commentCount\">2<\/span> <\/p><\/div>\n<\/div>\n<p>[ad_2]<\/p>\n<p>solved Convert PDF to Excel [closed] <\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Getting data out from a pdf file is pretty messy. If the pdf table is ordered and has got a unique pattern embedded along with it, the best way to get the data is by converting the pdf to xml. For this you can use: pdftohtml. Installation: sudo apt-get install pdftohtml Usage: pdftohtml -xml &#8230; <a title=\"[Solved] Convert PDF to Excel [closed]\" class=\"read-more\" href=\"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/\" aria-label=\"More on [Solved] Convert PDF to Excel [closed]\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[320],"tags":[323,489,349],"class_list":["post-6005","post","type-post","status-publish","format-standard","hentry","category-solved","tag-java","tag-pdf","tag-python"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>[Solved] Convert PDF to Excel [closed] - JassWeb<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"[Solved] Convert PDF to Excel [closed] - JassWeb\" \/>\n<meta property=\"og:description\" content=\"[ad_1] Getting data out from a pdf file is pretty messy. If the pdf table is ordered and has got a unique pattern embedded along with it, the best way to get the data is by converting the pdf to xml. For this you can use: pdftohtml. Installation: sudo apt-get install pdftohtml Usage: pdftohtml -xml ... Read more\" \/>\n<meta property=\"og:url\" content=\"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/\" \/>\n<meta property=\"og:site_name\" content=\"JassWeb\" \/>\n<meta property=\"article:published_time\" content=\"2022-08-31T16:18:05+00:00\" \/>\n<meta name=\"author\" content=\"Kirat\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kirat\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/solved-convert-pdf-to-excel-closed\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/solved-convert-pdf-to-excel-closed\\\/\"},\"author\":{\"name\":\"Kirat\",\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/#\\\/schema\\\/person\\\/65c9c7b7958150c0dc8371fa35dd7c31\"},\"headline\":\"[Solved] Convert PDF to Excel [closed]\",\"datePublished\":\"2022-08-31T16:18:05+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/solved-convert-pdf-to-excel-closed\\\/\"},\"wordCount\":147,\"publisher\":{\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/#organization\"},\"keywords\":[\"java\",\"pdf\",\"python\"],\"articleSection\":[\"Solved\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/solved-convert-pdf-to-excel-closed\\\/\",\"url\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/solved-convert-pdf-to-excel-closed\\\/\",\"name\":\"[Solved] Convert PDF to Excel [closed] - JassWeb\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/#website\"},\"datePublished\":\"2022-08-31T16:18:05+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/solved-convert-pdf-to-excel-closed\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/jassweb.com\\\/solved\\\/solved-convert-pdf-to-excel-closed\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/solved-convert-pdf-to-excel-closed\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"[Solved] Convert PDF to Excel [closed]\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/#website\",\"url\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/\",\"name\":\"JassWeb\",\"description\":\"Build High-quality Websites\",\"publisher\":{\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/#organization\",\"name\":\"Jass Web\",\"url\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/jassweb.com\\\/wp-content\\\/uploads\\\/2021\\\/02\\\/jass-website-logo-1.png\",\"contentUrl\":\"https:\\\/\\\/jassweb.com\\\/wp-content\\\/uploads\\\/2021\\\/02\\\/jass-website-logo-1.png\",\"width\":693,\"height\":132,\"caption\":\"Jass Web\"},\"image\":{\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/#\\\/schema\\\/person\\\/65c9c7b7958150c0dc8371fa35dd7c31\",\"name\":\"Kirat\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/wp-content\\\/litespeed\\\/avatar\\\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1776403586\",\"url\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/wp-content\\\/litespeed\\\/avatar\\\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1776403586\",\"contentUrl\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/wp-content\\\/litespeed\\\/avatar\\\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1776403586\",\"caption\":\"Kirat\"},\"sameAs\":[\"http:\\\/\\\/jassweb.com\"],\"url\":\"https:\\\/\\\/jassweb.com\\\/solved\\\/author\\\/jaspritsinghghumangmail-com\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"[Solved] Convert PDF to Excel [closed] - JassWeb","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/","og_locale":"en_US","og_type":"article","og_title":"[Solved] Convert PDF to Excel [closed] - JassWeb","og_description":"[ad_1] Getting data out from a pdf file is pretty messy. If the pdf table is ordered and has got a unique pattern embedded along with it, the best way to get the data is by converting the pdf to xml. For this you can use: pdftohtml. Installation: sudo apt-get install pdftohtml Usage: pdftohtml -xml ... Read more","og_url":"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/","og_site_name":"JassWeb","article_published_time":"2022-08-31T16:18:05+00:00","author":"Kirat","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Kirat","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/#article","isPartOf":{"@id":"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/"},"author":{"name":"Kirat","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31"},"headline":"[Solved] Convert PDF to Excel [closed]","datePublished":"2022-08-31T16:18:05+00:00","mainEntityOfPage":{"@id":"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/"},"wordCount":147,"publisher":{"@id":"https:\/\/jassweb.com\/solved\/#organization"},"keywords":["java","pdf","python"],"articleSection":["Solved"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/","url":"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/","name":"[Solved] Convert PDF to Excel [closed] - JassWeb","isPartOf":{"@id":"https:\/\/jassweb.com\/solved\/#website"},"datePublished":"2022-08-31T16:18:05+00:00","breadcrumb":{"@id":"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/jassweb.com\/solved\/solved-convert-pdf-to-excel-closed\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/jassweb.com\/solved\/"},{"@type":"ListItem","position":2,"name":"[Solved] Convert PDF to Excel [closed]"}]},{"@type":"WebSite","@id":"https:\/\/jassweb.com\/solved\/#website","url":"https:\/\/jassweb.com\/solved\/","name":"JassWeb","description":"Build High-quality Websites","publisher":{"@id":"https:\/\/jassweb.com\/solved\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/jassweb.com\/solved\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/jassweb.com\/solved\/#organization","name":"Jass Web","url":"https:\/\/jassweb.com\/solved\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/","url":"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png","contentUrl":"https:\/\/jassweb.com\/wp-content\/uploads\/2021\/02\/jass-website-logo-1.png","width":693,"height":132,"caption":"Jass Web"},"image":{"@id":"https:\/\/jassweb.com\/solved\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/jassweb.com\/solved\/#\/schema\/person\/65c9c7b7958150c0dc8371fa35dd7c31","name":"Kirat","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1776403586","url":"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1776403586","contentUrl":"https:\/\/jassweb.com\/solved\/wp-content\/litespeed\/avatar\/1261af3c9451399fa1336d28b98ea3bb.jpg?ver=1776403586","caption":"Kirat"},"sameAs":["http:\/\/jassweb.com"],"url":"https:\/\/jassweb.com\/solved\/author\/jaspritsinghghumangmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/posts\/6005","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/comments?post=6005"}],"version-history":[{"count":0,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/posts\/6005\/revisions"}],"wp:attachment":[{"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/media?parent=6005"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/categories?post=6005"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jassweb.com\/solved\/wp-json\/wp\/v2\/tags?post=6005"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}