Building a fraud detection engine with sub-10ms latency

Fraud detection is one of the hardest engineering problems in payments. You need to be simultaneously fast (sub-10ms), accurate (low false positives), and adaptive (new fraud patterns emerge daily). Most systems sacrifice one of these three. We set out to achieve all three simultaneously.

Why existing approaches fall short

Traditional rule-based fraud engines are fast but brittle. They catch known patterns but miss novel attacks. ML models are more adaptive but typically run in separate microservices, adding 50–200ms of network overhead. Cloud-based fraud APIs are even slower and introduce a third-party dependency on every transaction.

8msOur median score time

0.01%False positive rate

340+Features per transaction

Our model architecture

We use an ensemble of three models that run in parallel within the same process as our routing engine:

A gradient-boosted tree model (XGBoost) — excellent at structured tabular features like transaction amount, time of day, and card BIN.
A graph neural network (GNN) — detects fraud rings by analysing relationship patterns across accounts, devices, and payment methods.
A sequence model (lightweight LSTM) — flags anomalous behavioural sequences for a given card or account.

The three scores are combined using a learned meta-classifier that has been trained to weight each model's output based on the transaction context. Card-present fraud? The GNN gets higher weight. Velocity attacks? The LSTM leads.

Critical design decision: We compile all models to ONNX format and run them using ONNX Runtime, which is significantly faster than Python-based inference. Median inference time per model: 2.1ms. Total ensemble: 7.8ms including feature computation.

Feature engineering at the edge

340 features sounds like a lot to compute in real time. The trick is pre-materialisation. We maintain a rolling feature store — a fast in-memory cache at each edge node — that stores pre-computed aggregate features per card, device, merchant, and IP. When a transaction arrives, we only need to compute the delta features in real time. The expensive aggregations (7-day spend velocity, cross-merchant correlation) are pre-computed and refreshed every 30 seconds.

Continuous retraining

Fraud patterns evolve faster than any static model can handle. Our retraining pipeline runs every 6 hours. New labels from confirmed fraud reports and chargebacks are fed back into the training pipeline automatically. A challenger model is deployed in shadow mode (scoring but not acting) for 24 hours before being promoted to production, with automatic rollback if key metrics degrade.

Outcome

Since deploying this system, our fraud rate has dropped 73% while our false positive rate sits at 0.01% — meaning legitimate payments rarely get blocked. The 8ms scoring overhead is invisible to end users but has saved our merchants millions in avoided chargebacks.

Ready to optimise your payment flow?

Join thousands of businesses using Zupay to process payments faster, smarter, and at lower cost.

Start for free → Talk to sales

Building a fraud detection engine with sub-10ms latency

Why existing approaches fall short

Our model architecture

Feature engineering at the edge

Continuous retraining

Outcome

Ready to optimise your payment flow?

Keep reading

How we cut payment latency by 60% with edge routing

The hidden cost of payment failures: a $500B problem

Designing a checkout that converts across 90 countries