[Solved] PDF Type Detection [closed]

[ad_1]

You can convert the PDF to HTML format using PDFMiner.
Then you can use beautifulsoup to find if it contains only <img> tag then it’s totally a scanned PDF, otherwise, if any text data found then it is electronic.
Moreover, you can decide this based on the percentage of text extracted.

1

[ad_2]

solved PDF Type Detection [closed]