financial-SOTA (spaCy PII NER for financial documents)

spaCy NER model trained from blank English pipeline on synthetic financial documents generated with Faker. Based on Kush-Mishra-403/PII-Detection-and-Anonymization.

Labels

name, company, address, email, phone, url, credit_card, ssn

Reported metrics (on author's synthetic test set)

  • Precision: 0.913
  • Recall: 0.867
  • F1: 0.881

Usage

from huggingface_hub import snapshot_download
import spacy

local_dir = snapshot_download(repo_id="MinhTriet/financial-SOTA")
nlp = spacy.load(local_dir)

doc = nlp("John Smith works at Acme LLC, email john@acme.com")
for ent in doc.ents:
    print(ent.text, ent.label_, ent.start_char, ent.end_char)

Requires spacy==3.8.5 (version the pipeline was trained with).

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support