[Solved] Convert PDF to Excel [closed]


Getting data out from a pdf file is pretty messy. If the pdf table is ordered and has got a unique pattern embedded along with it, the best way to get the data is by converting the pdf to xml. For this you can use: pdftohtml.

Installation: sudo apt-get install pdftohtml

Usage: pdftohtml -xml *Your File.pdf* *Output File.xml*

You can run this command directly in the terminal.

The xml file which you will get now will have tags just like html which you can use to get the data from the generated xml output.

PS: One thing to be noted if the pdf table is not ordered then it becomes very difficult to get the data out from that xml because the tags will have some attributes which will not match the pattern. In that case you will need to hard code things.

2

solved Convert PDF to Excel [closed]