Document Matching Report
Generated on 2024-12-19T14:26:36.916667
This report evaluates how accurately our system extracts and matches information from legal documents by comparing system predictions against verified ground truth data.
Document & Field Counts
Overview of the key metrics related to documents and fields processed by our system.
- Total Documents: Number of unique documents processed.
- Total Fields: Total individual pieces of information we attempted to extract.
- Correct Fields: Fields where our extraction met matching criteria.
Matching Process
The methodology used to compare system predictions with ground truth data.
- Ground Truth vs Prediction: We compare each extracted field against verified ground truth data.
- Different field types have different matching criteria.
- Some fields (like legal descriptions) may have valid extra predictions.
Field Type-Specific Matching
Specific criteria applied to different types of fields to ensure accurate matching.
- Dates (recorded_date, document_date): Must be exact matches.
- Identifiers (book, page, instrument): Must be exact matches after normalization.
- Party Names: Uses fuzzy matching with strict thresholds.
- Amounts: Must match exactly after normalizing format.
-
Legal Descriptions: Special handling that:
- Compares standard fields against ground truth.
- Captures additional valid fields not in ground truth.
- Shows extra predicted fields separately in the UI.
Metadata Fields
System automatically filters out internal matching metadata:
- Match Info Matched To
- Match Info Match Tier
- Match Info Confidence
- Match Info Similarity Score
These aren't included in comparisons since they're system metadata.
Quality Metrics
Precision
How accurate are our successful extractions?
Calculation: (True Positives) / (True Positives + False Positives)
Example: If we extract 100 fields and 78 meet criteria, precision is 78%.
Recall
How comprehensively do we capture available information?
Calculation: (True Positives) / (True Positives + False Negatives)
Example: If there are 100 fields and we capture 96, recall is 96%.
F1 Score
Balanced measure of precision and recall.
Calculation: 2 * (Precision * Recall) / (Precision + Recall)
Example: Combining precision of 78% and recall of 96% results in an F1 score of approximately 86.6%.
A high F1 score indicates both accurate and comprehensive extraction.
Similarity
- For exact match fields: 100% if exact, 0% if not.
- For fuzzy match fields: Uses string similarity algorithms.
- Legal descriptions: May show high similarity even with different field names.
Summary
Total Documents
116
Total Fields
815
Correct Fields
718
Metrics
Precision
89.64%
Recall
98.09%
F1 Score
93.67%
Accuracy
88.10%
Similarity
87.87%
Document Type Metrics
Deed
Total Documents: 74
Total Fields: 499
Correct Fields: 441
Precision: 90.93%
Recall: 96.92%
F1 Score: 93.83%
Accuracy: 88.38%
Similarity: 88.14%
Mortgage
Total Documents: 42
Total Fields: 316
Correct Fields: 277
Precision: 87.66%
Recall: 100.00%
F1 Score: 93.42%
Accuracy: 87.66%
Similarity: 87.44%
Detailed Results
Similarity:
Precision:
Recall:
F1 Score:
Accuracy:
Field Comparisons
Field | Ground Truth | Prediction | Similarity | Status |
---|---|---|---|---|
[Extra] | Extra Field |