[Solved] How to calculate precision and recall for two nested arrays [closed]

Question

You have to flatten your lists as shown here, and then use classification_report from scikit-learn:

correct = [['*','*'],['*','PER','*','GPE','ORG'],['GPE','*','*','*','ORG']]
predicted = [['PER','*'],['*','ORG','*','GPE','ORG'],['PER','*','*','*','MISC']]
target_names = ['PER','ORG','MISC','LOC','GPE'] # leave out '*'

correct_flat = [item for sublist in correct for item in sublist]
predicted_flat = [item for sublist in predicted for item in sublist]

from sklearn.metrics import classification_report
print(classification_report(correct_flat, predicted_flat, target_names=target_names))

Result:

             precision    recall  f1-score   support

        PER       1.00      0.86      0.92         7
        ORG       1.00      0.50      0.67         2
       MISC       0.00      0.00      0.00         0
        LOC       0.50      0.50      0.50         2
        GPE       0.00      0.00      0.00         1

avg / total       0.83      0.67      0.73        12

In this particular example, you will also get a warning:

UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples.

which is due to 'MISC' not being present in the true labels here (correct), but arguably this should not happen in your real data.

Accepted Answer

You have to flatten your lists as shown here, and then use classification_report from scikit-learn:

correct = [['*','*'],['*','PER','*','GPE','ORG'],['GPE','*','*','*','ORG']]
predicted = [['PER','*'],['*','ORG','*','GPE','ORG'],['PER','*','*','*','MISC']]
target_names = ['PER','ORG','MISC','LOC','GPE'] # leave out '*'

correct_flat = [item for sublist in correct for item in sublist]
predicted_flat = [item for sublist in predicted for item in sublist]

from sklearn.metrics import classification_report
print(classification_report(correct_flat, predicted_flat, target_names=target_names))

Result:

             precision    recall  f1-score   support

        PER       1.00      0.86      0.92         7
        ORG       1.00      0.50      0.67         2
       MISC       0.00      0.00      0.00         0
        LOC       0.50      0.50      0.50         2
        GPE       0.00      0.00      0.00         1

avg / total       0.83      0.67      0.73        12

In this particular example, you will also get a warning:

UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples.

which is due to 'MISC' not being present in the true labels here (correct), but arguably this should not happen in your real data.