Architecture overview
Building a production ML system for real-time fraud scoring requires solving three problems simultaneously: feature freshness (signals must reflect the current state of the world, not a stale snapshot), inference latency (decisions must arrive in under 23ms to be actionable at payment time), and model freshness (the model must adapt to the adversarial distribution without manual retraining cycles). Our ML pipeline is designed to solve all three at scale.
Feature store
The online feature store is the performance-critical component of the pipeline. It maintains pre-computed values for all 340 features used by the production model, updated asynchronously as events arrive. Features are partitioned by entity type (account, device, IP, payment method) and stored in a Redis cluster with a write-through cache backed by a ClickHouse OLAP store for historical queries.
# Feature computation (simplified)
class AccountVelocityFeature(Feature):
entity_type = "account"
windows = ["1h", "24h", "7d", "30d"]
def compute(self, events: List[Event]) -> Dict[str, float]:
return {
f"txn_count_{w}": count_events(events, window=w)
for w in self.windows
} | {
f"txn_amount_{w}": sum_amounts(events, window=w)
for w in self.windows
}Model training
The production model is a gradient-boosted decision tree ensemble (XGBoost) trained on 90 days of labelled events. Labels are assigned through a combination of confirmed fraud (chargebacks and manual reviews), ground truth enrichment from operator confirmations, and a proprietary semi-supervised labelling pipeline that propagates confirmed fraud labels through the identity graph to unlabelled but connected accounts.
Production inference
A/B testing
Every model update goes through a shadow deployment phase (7 days, 100% of traffic scored but not actioned), followed by a canary phase (3 days, 5% of traffic actioned by the new model), followed by a full rollout. Rollout is automatically halted if fraud rate, false positive rate, or score distribution drift exceeds predefined thresholds.
Lessons learned
Five lessons from three years of operating this pipeline: the feature store is worth more than the model; label quality matters more than label quantity; shadow deployment is not optional; monitor score distributions not just outcome metrics; and the adversary is always adapting — your retraining cadence must be faster than their adaptation cadence.
Хотите такие же результаты?
Получите бесплатный аудит рисков за 48 часов. Интеграция не требуется.