Architecture
Serixo's real-time analytics pipeline is built on a multi-tier streaming architecture that separates event ingestion, feature computation, model inference, and decision delivery into independently scalable components. This separation is what allows us to achieve sub-23ms end-to-end latency while processing over 12,000 events per minute across our operator network.
The core design principle is that latency and throughput are not fundamentally in tension โ they are both solved by eliminating shared mutable state. Every computation in the pipeline is either stateless (and therefore trivially parallelisable) or operates on an immutable snapshot of a pre-computed feature store.
Data pipeline
Events enter the pipeline via a Kafka-backed ingestion layer that provides at-least-once delivery guarantees with deduplication at the consumer. From ingestion, events fan out to three parallel processing tracks: the feature computation track (updates the online feature store), the model inference track (scores the event using the current production model), and the audit track (appends to the immutable event ledger for compliance).
// Event structure entering the pipeline
interface SerixoEvent {
eventId: string; // UUIDv7 (time-sortable)
operatorId: string;
accountId: string;
eventType: 'transaction' | 'login' | 'bonus_claim' | 'withdrawal';
payload: Record<string, unknown>;
signals: SignalBundle;
ts: number; // Unix ms
}
// Signal bundle (pre-computed client-side)
interface SignalBundle {
deviceId: string;
sessionId: string;
behavioralHash: string; // 64-byte Blake3 of behavioral signals
agentClass: 'human' | 'bot' | 'ai_agent';
agentConfidence: number;
}Latency optimization
The biggest single contributor to latency is synchronous feature lookups during model inference. We eliminated this by pre-computing all 340 features used by our production model into a Redis-backed online feature store, updated asynchronously as events arrive. Model inference now reads from a local memory-mapped snapshot of the feature store, eliminating network round-trips entirely.
Scaling
Lessons learned
Three lessons from building this pipeline: First, measure latency at the 99th percentile, not the median โ the median will flatter you and hide the tail latencies that matter. Second, invest in observability before you invest in optimisation โ you cannot optimise what you cannot measure. Third, the biggest latency wins come from eliminating synchronous network calls, not from micro-optimising compute.
Want results like these?
Get a free risk audit in 48 hours. No integration required.