Problem Statement
Transaction categorisation at scale requires underwriters to read, interpret, and classify thousands of records manually. Manual overrides are common, time-consuming, and represent valuable institutional knowledge that is lost once the decision is made. The business needed automation that could replicate that knowledge and get smarter as it received feedback.
Proposed Solution & Architecture
- Two-phase architecture: rule-based categorisation first, ML-powered classification second
- AWS Textract extracting transaction description text from PDFs, receipts, and unstructured documents
- SageMaker ML model trained on historical transactions and manual underwriter corrections — improving accuracy with every cycle
- Underwriter interface for manual review and override — every correction feeds the training dataset
- Secure data ingestion via API or batch upload to S3 with encrypted storage and transmission
- FCA and data protection regulatory compliance maintained throughout processing and storage
AWS Services & Technologies
What We Delivered
- Built Phase 1 rule-based categorisation — Lambda-based text matching rules applied to transaction descriptions extracted by Textract.
- Designed and built the underwriter override interface — allowing corrections to be applied easily and captured for model training.
- Trained and deployed Phase 2 SageMaker classification model on historical transaction data and accumulated manual corrections.
- Implemented automated retraining pipeline — model accuracy improves progressively as the corrections dataset grows.
- Provisioned secure infrastructure: encrypted storage, encrypted API transmission, IAM access controls, and FCA-aligned data governance.
- Validated accuracy and performance through rigorous testing under production-representative transaction volumes.
Outcomes & Success Metrics
- Transaction categorisation automated — underwriters review exceptions rather than processing every record manually.
- ML model continuously improving — every underwriter correction makes the next categorisation more accurate.
- FCA-compliant data handling in place — secure, encrypted, and auditable throughout.
- System handles growing transaction volumes without performance degradation.
- Underwriters freed from routine categorisation to focus on complex decisions that require human judgement.