[Solved] PDF Type Detection [closed]

Question

You can convert the PDF to HTML format using PDFMiner.
Then you can use beautifulsoup to find if it contains only <img> tag then it’s totally a scanned PDF, otherwise, if any text data found then it is electronic.
Moreover, you can decide this based on the percentage of text extracted.

Accepted Answer

You can convert the PDF to HTML format using PDFMiner.
Then you can use beautifulsoup to find if it contains only <img> tag then it’s totally a scanned PDF, otherwise, if any text data found then it is electronic.
Moreover, you can decide this based on the percentage of text extracted.