Inside Serixo's ML Pipeline: From Feature Store to Production Inference

Architecture overview

Building a production ML system for real-time fraud scoring requires solving three problems simultaneously: feature freshness (signals must reflect the current state of the world, not a stale snapshot), inference latency (decisions must arrive in under 23ms to be actionable at payment time), and model freshness (the model must adapt to the adversarial distribution without manual retraining cycles). Our ML pipeline is designed to solve all three at scale.

Feature store

The online feature store is the performance-critical component of the pipeline. It maintains pre-computed values for all 340 features used by the production model, updated asynchronously as events arrive. Features are partitioned by entity type (account, device, IP, payment method) and stored in a Redis cluster with a write-through cache backed by a ClickHouse OLAP store for historical queries.

# Feature computation (simplified)
class AccountVelocityFeature(Feature):
    entity_type = "account"
    windows = ["1h", "24h", "7d", "30d"]

    def compute(self, events: List[Event]) -> Dict[str, float]:
        return {
            f"txn_count_{w}": count_events(events, window=w)
            for w in self.windows
        } | {
            f"txn_amount_{w}": sum_amounts(events, window=w)
            for w in self.windows
        }

Model training

The production model is a gradient-boosted decision tree ensemble (XGBoost) trained on 90 days of labelled events. Labels are assigned through a combination of confirmed fraud (chargebacks and manual reviews), ground truth enrichment from operator confirmations, and a proprietary semi-supervised labelling pipeline that propagates confirmed fraud labels through the identity graph to unlabelled but connected accounts.

Production inference

23ms

p99 inference latency

340

Features per inference

99.97%

Inference service availability

A/B testing

Every model update goes through a shadow deployment phase (7 days, 100% of traffic scored but not actioned), followed by a canary phase (3 days, 5% of traffic actioned by the new model), followed by a full rollout. Rollout is automatically halted if fraud rate, false positive rate, or score distribution drift exceeds predefined thresholds.

Lessons learned

Five lessons from three years of operating this pipeline: the feature store is worth more than the model; label quality matters more than label quantity; shadow deployment is not optional; monitor score distributions not just outcome metrics; and the adversary is always adapting — your retraining cadence must be faster than their adaptation cadence.

ML PipelineEngineeringFeature StoreInferenceArchitecture

Want results like these?

Get a free risk audit in 48 hours. No integration required.

← Previous

The Complete Guide to Multi-Accounting Detection in iGaming

How a Top-20 Exchange Reduced Fraud by 91% in 60 Days