Document Matching Report

Document Matching Report

Generated on 2024-12-19T14:26:36.916667

This report evaluates how accurately our system extracts and matches information from legal documents by comparing system predictions against verified ground truth data.


Document & Field Counts

Overview of the key metrics related to documents and fields processed by our system.

  • Total Documents: Number of unique documents processed.
  • Total Fields: Total individual pieces of information we attempted to extract.
  • Correct Fields: Fields where our extraction met matching criteria.

Matching Process

The methodology used to compare system predictions with ground truth data.

  • Ground Truth vs Prediction: We compare each extracted field against verified ground truth data.
  • Different field types have different matching criteria.
  • Some fields (like legal descriptions) may have valid extra predictions.

Field Type-Specific Matching

Specific criteria applied to different types of fields to ensure accurate matching.

  • Dates (recorded_date, document_date): Must be exact matches.
  • Identifiers (book, page, instrument): Must be exact matches after normalization.
  • Party Names: Uses fuzzy matching with strict thresholds.
  • Amounts: Must match exactly after normalizing format.
  • Legal Descriptions: Special handling that:
    • Compares standard fields against ground truth.
    • Captures additional valid fields not in ground truth.
    • Shows extra predicted fields separately in the UI.

Metadata Fields

System automatically filters out internal matching metadata:

  • Match Info Matched To
  • Match Info Match Tier
  • Match Info Confidence
  • Match Info Similarity Score

These aren't included in comparisons since they're system metadata.


Quality Metrics

Precision

How accurate are our successful extractions?

Calculation: (True Positives) / (True Positives + False Positives)

Example: If we extract 100 fields and 78 meet criteria, precision is 78%.

Recall

How comprehensively do we capture available information?

Calculation: (True Positives) / (True Positives + False Negatives)

Example: If there are 100 fields and we capture 96, recall is 96%.

F1 Score

Balanced measure of precision and recall.

Calculation: 2 * (Precision * Recall) / (Precision + Recall)

Example: Combining precision of 78% and recall of 96% results in an F1 score of approximately 86.6%.

A high F1 score indicates both accurate and comprehensive extraction.

Similarity

  • For exact match fields: 100% if exact, 0% if not.
  • For fuzzy match fields: Uses string similarity algorithms.
  • Legal descriptions: May show high similarity even with different field names.

Summary

Total Documents

116

Total Fields

815

Correct Fields

718

Metrics

Precision

89.64%

Recall

98.09%

F1 Score

93.67%

Accuracy

88.10%

Similarity

87.87%

Document Type Metrics

Deed

Total Documents: 74

Total Fields: 499

Correct Fields: 441

Precision: 90.93%

Recall: 96.92%

F1 Score: 93.83%

Accuracy: 88.38%

Similarity: 88.14%

Mortgage

Total Documents: 42

Total Fields: 316

Correct Fields: 277

Precision: 87.66%

Recall: 100.00%

F1 Score: 93.42%

Accuracy: 87.66%

Similarity: 87.44%

Detailed Results