Financial fraud costs the global economy over $5 trillion annually. Our client, a major payment processor, needed a system that could score 15 million+ transactions daily with sub-50ms latency while catching more fraud than their existing rules-based system. Here's how we built it.
The Architecture: Ensemble of Specialists
Instead of relying on a single model, we built an ensemble of two complementary models. XGBoost handles tabular features - transaction amount, merchant category, time of day, velocity checks - with exceptional speed and accuracy. An LSTM network captures sequential patterns in user behavior over time, detecting anomalies in spending sequences that point-in-time features miss.
The two models' scores are combined using a learned meta-classifier that weighs each model's contribution based on the transaction context. For example, the LSTM gets more weight for card-present transactions where behavioral patterns matter most.
Feature Engineering at Scale
We engineered over 200 features across multiple time windows (1 hour, 24 hours, 7 days, 30 days). These include velocity features (transaction count and amount over time windows), behavioral features (deviation from typical spending patterns), network features (merchant risk scores, geographic patterns), and device fingerprinting signals.
The real-time feature store, built on Redis and Apache Kafka, computes these features with sub-millisecond latency, ensuring the overall scoring pipeline stays under 50ms.
System Performance
Handling Class Imbalance
Fraud is inherently rare - only 0.1-0.5% of transactions are fraudulent. We addressed this through a multi-pronged approach: SMOTE oversampling of the minority class during training, cost-sensitive learning with asymmetric loss functions (false negatives cost 10x more than false positives), and stratified sampling for train/test splits that preserve the class ratio.
Deployment & Monitoring
The system runs on AWS SageMaker with auto-scaling inference endpoints behind a Kafka-based event stream. Every prediction is logged with full feature vectors for audit trails and model debugging. We run shadow deployments of new model versions alongside production, comparing predictions before promoting.
We monitor data drift daily using Population Stability Index (PSI) on all input features. When drift exceeds thresholds, it triggers automated retraining on the most recent 90 days of labeled data. This keeps the model current as fraud patterns evolve.
Key Takeaways
- Ensemble models outperform single models. The XGBoost + LSTM combination caught 23% more fraud than either model alone.
- Feature engineering matters more than model complexity. 80% of our accuracy gains came from better features, not bigger models.
- Real-time feature stores are critical. Precomputed features in Redis eliminated the latency bottleneck that kills most real-time ML systems.
- Monitor for drift, retrain often. Fraud patterns change constantly - a model trained 90 days ago is already degrading.