[Solved] PDF Type Detection [closed]


You can convert the PDF to HTML format using PDFMiner.
Then you can use beautifulsoup to find if it contains only <img> tag then it’s totally a scanned PDF, otherwise, if any text data found then it is electronic.
Moreover, you can decide this based on the percentage of text extracted.

1

solved PDF Type Detection [closed]