You have to flatten your lists as shown here, and then use classification_report
from scikit-learn:
correct = [['*','*'],['*','PER','*','GPE','ORG'],['GPE','*','*','*','ORG']]
predicted = [['PER','*'],['*','ORG','*','GPE','ORG'],['PER','*','*','*','MISC']]
target_names = ['PER','ORG','MISC','LOC','GPE'] # leave out '*'
correct_flat = [item for sublist in correct for item in sublist]
predicted_flat = [item for sublist in predicted for item in sublist]
from sklearn.metrics import classification_report
print(classification_report(correct_flat, predicted_flat, target_names=target_names))
Result:
precision recall f1-score support
PER 1.00 0.86 0.92 7
ORG 1.00 0.50 0.67 2
MISC 0.00 0.00 0.00 0
LOC 0.50 0.50 0.50 2
GPE 0.00 0.00 0.00 1
avg / total 0.83 0.67 0.73 12
In this particular example, you will also get a warning:
UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples.
which is due to 'MISC'
not being present in the true labels here (correct
), but arguably this should not happen in your real data.
1
solved How to calculate precision and recall for two nested arrays [closed]