MTE Relay Performance
MTE Relay is a drop-in encrypted HTTP gateway. It is the same engine behind both MTE Relay Server and MTE API Relay, so the performance characteristics on this page apply to either deployment. An application sends ordinary HTTP to an outbound relay; the relay MTE-encrypts the entire request; a peer relay in front of the origin decrypts it and forwards plaintext to the origin; and the response makes the same trip in reverse. Every byte of every request and response is encrypted in transit between the two relays, with no changes to the client or the origin service.
This page has two goals: to show that MTE Relay is highly performant, and to help you choose the right deployment size for your traffic.
Performance at a glance
A pair of standard 4 CPU / 4 GB relay instances moves about 7,700 encrypted requests per second of 10 KB JSON — roughly 77 MB/s (≈0.6 Gbit/s) of fully encrypted application payload, or about 665 million requests per day — with a measured error rate of zero. In the certification sweep for this configuration, 2,705,692 consecutive requests completed with 0.000% errors and 100% payload integrity (every response byte-identical to what was sent).
The price of the encryption is small and predictable:
- At light load, the full encrypt → transmit → decrypt → origin → encrypt → transmit → decrypt round trip adds about 1 millisecond per request (1.2 ms through the relay pair vs 0.15 ms talking to the origin directly). For interactive APIs this is imperceptible.
- Past its ceiling the gateway saturates gracefully, never destructively. When offered more work than it can encrypt, requests queue: latency rises linearly and predictably, throughput holds flat, and nothing fails. At 8× saturating demand, 99% of requests still completed in under 43 ms.
The system is compute-bound on encryption, and therefore scales with CPU (about 50% of relay CPU time is spent inside the MTE cryptographic library) — measured at roughly 955 requests/second per relay core, linear across the 2- and 4-CPU shapes. Memory is not a throughput factor: it tracks payload size, not demand.
How MTE Relay works
Because the client and origin are untouched, MTE Relay drops into an existing architecture as an encrypted hop. The only resource it consumes is CPU to encrypt and decrypt; everything else on this page follows from that.
How we measured it
Load was generated with k6, the open-source load-testing tool from Grafana Labs. k6 reports latency percentiles (p90, p95, p99) directly in its summary output, so every latency figure on this page is taken straight from the tool rather than computed by us.
The test uses maximum-pressure connections, not realistic users. Each connection is a k6 virtual user (VU) running closed-loop with zero think time: it fires its next request the instant the previous response arrives. A single such connection is itself a heavy load source — one connection alone extracts about 800 requests per second from the gateway. These connections are best thought of as units of offered pressure, each equivalent to hundreds of real API clients.
| Test connections | Each connection was sending | Total demand delivered |
|---|---|---|
| 8 | 806 req/s | 6,445 req/s |
| 16 | 493 req/s | 7,892 req/s |
| 32 | 241 req/s | 7,711 req/s |
| 64 | 121 req/s | 7,775 req/s |
| 128 | 60 req/s | 7,617 req/s |
The gateway reaches its ~7,700–7,900 req/s ceiling once total demand exceeds it (about 16 maximum-pressure connections); beyond that point the same total throughput is simply shared across more connections. The sweep then pushed to 128 simultaneous saturating connections — roughly 8× the demand needed to saturate the gateway — to prove overload safety, and the error count stayed at zero throughout.
Measured throughput and latency
Workload: a POST of ~10 KB JSON, sustained 60-second holds per load level, with a
co-located k6 load generator over plain HTTP (no TLS) to isolate the
cost of MTE encryption.
| Offered load (vs capacity) | Throughput (req/s) | Avg latency | p90 latency | p99 latency | Worst case | Errors |
|---|---|---|---|---|---|---|
| Light (~80%) | 6,445 | 1.2 ms | 1.5 ms | 2.0 ms | 7 ms | 0 |
| Saturating (100%) | 7,892 | 2.0 ms | 2.8 ms | 3.9 ms | 23 ms | 0 |
| Overload (~2×) | 7,711 | 4.1 ms | 6.2 ms | 9.1 ms | 37 ms | 0 |
| Overload (~4×) | 7,775 | 8.1 ms | 13.2 ms | 19.5 ms | 41 ms | 0 |
| Overload (~8×) | 7,617 | 16.7 ms | 28.8 ms | 42.9 ms | 154 ms | 0 |
Across the full certification sweep: 2.71 million requests, 0 failures, 0 corrupted payloads.
Throughput holds flat at capacity no matter how hard the gateway is pushed:
Under overload, the only thing that grows is queueing delay — and it grows linearly, with no cliffs and no error storms:
What this means in real terms
7,700 encrypted requests/second is, equivalently:
| Client profile | Request rate per client | Clients supported by one relay pair |
|---|---|---|
| Mobile/web user (6 requests per minute) | 0.1 req/s | ~77,000 concurrent users |
| Active interactive session (20 requests per minute) | 0.33 req/s | ~23,000 concurrent sessions |
| Busy machine-to-machine integration | 10 req/s | ~770 concurrent integrations |
| Maximum-pressure benchmark connection | 30–800 req/s | ~16 (the saturation point above) |
- For an interactive API: the relay adds about one millisecond. A user cannot perceive it.
- For a high-volume service: one standard relay pair carries ~665 million encrypted API calls per day at 10 KB each.
- For reliability engineering: overload manifests as added queueing delay, not as errors. At 8× saturating demand the gateway did not drop, corrupt, or reject a single request.
Resource footprint
Two instance shapes are certified (zero errors through full overload sweeps):
| Per relay instance | CPU at full load | Memory at full load (10 KB payloads) | Saturation |
|---|---|---|---|
| 4 CPU / 4 GB (standard) | ~4 cores (the limiter) | 1.9–2.4 GB | ~7,700 req/s |
| 2 CPU / 2 GB (small) | ~2 cores (the limiter) | ~1.9 GB | ~3,820 req/s |
Memory demand does not grow with offered load — it tracks payload size (response/body buffering churn), not request rate: at 10 KB payloads the working set is ~1.9 GB, at 2 KB payloads it is only ~0.2 GB. Under overload the gateway degrades in latency only; it does not balloon toward an out-of-memory failure. On the small shape, bound the runtime's memory slightly below the container limit so it stays within its allocation.
Sizing by hardware: cores → throughput
The gateway was measured at three instance sizes with the identical build and workload (10 KB payloads), giving a calibrated cores→throughput curve:
| Per-relay CPU | Sustained ceiling (req/s) | Per-core efficiency |
|---|---|---|
| 2 (measured) | ~3,820 | ~955 req/s per relay core |
| 4 (measured) | ~7,700 | ~960 req/s per relay core |
| 8 (measured) | ~12,600 | ~790 req/s per relay core |
Scaling is essentially linear from 2 to 4 cores (×2.02) at ~955 req/s per relay core. Only the 4→8 step measured sublinear (×1.64); at that size the test deployment approaches the capacity of the single test node, so the roll-off is at least partly testbed contention rather than a software limit.
| Per-relay instance size | 100% saturation (10 KB) | Confidence |
|---|---|---|
| 1 CPU | ~1,900 req/s | extrapolated (linear ~955/core), ±20% |
| 2 CPU | 3,820 req/s | measured |
| 4 CPU | 7,700 req/s | measured |
| 6 CPU | ~10,400 req/s | interpolated, ±10% |
| 8 CPU | 12,600 req/s | measured (conservative) |
| 12 CPU | ~16,800 req/s | extrapolated, ±20% |
| 16 CPU | ~20,600 req/s | extrapolated, ±20%+ |
Practical guidance: treat CPU as the only capacity lever. Per-core efficiency is flat at ~955 req/s through the 2- and 4-CPU shapes, so size by arithmetic and prefer adding more small instances (horizontal scaling, which also adds redundancy) over growing a single instance. Projections beyond the measured 2–8 core range should be confirmed by measurement before being relied on for production sizing.
On memory: it is not a throughput variable. 2 GB is certified for the 2-CPU shape (including 10 KB payloads); 4 GB is recommended for 4-CPU-and-larger shapes; 8–16 GB show no measurable throughput or latency benefit.
Payload-size sensitivity
Three full overload sweeps at 2 KB, 5 KB, and 10 KB bodies — 5.0 million requests, 0.000% errors in every run — give a simple, accurate cost model:
| Payload | Saturation (req/s) | Payload moved | Light-load avg latency | p90 latency | p99 latency |
|---|---|---|---|---|---|
| 2 KB | 5,537 | ~11 MB/s | 1.5 ms | 36.1 ms | 47.6 ms |
| 5 KB | 4,762 | ~23 MB/s | 1.7 ms | 43.1 ms | 59.7 ms |
| 10 KB | 3,856 | ~38 MB/s | 2.0 ms | 50.9 ms | 67.6 ms |
The p90 and p99 columns are measured under heavy overload (128 max-pressure k6 connections); the average latency is at light load. The three points fit a simple linear cost model almost exactly:
Relay-pair capacity cost per request ≈ 161 µs fixed + 9.8 µs per KB (predicts the 5 KB measurement to within 0.1 µs)
What this means in practice:
- Per-request overhead dominates. Even at 10 KB, 62% of the cost is payload-independent. Shrinking payloads 5× raises request throughput only ~1.4× — so for small-payload APIs, plan capacity by request rate, not by bandwidth.
- Larger payloads are relatively cheaper to encrypt: moving the same data in 10 KB requests costs ~3.4× less gateway capacity than moving it in 2 KB requests.
- To estimate saturation for any payload size and shape: take
1 ÷ (161 µs + 9.8 µs × KB)and scale by the CPU curve above.
High availability and horizontal scaling
For production resilience the gateway runs multiple relay instances per side behind a load balancer, with each side's instances sharing their encryption state through a dedicated coordination store (Redis, one per side). Any instance can then serve any client's traffic — an instance can fail or be replaced without breaking sessions — and capacity grows by adding instances.
Four full overload sweeps were run in this configuration (two instances per side; 12.85 million requests including saturation probes), all at 0.000% errors with traffic split evenly across instances:
| Instances per side | Payload | Saturation | p90 (heavy overload) | p99 (heavy overload) |
|---|---|---|---|---|
| 2 × 2 CPU / 2 GB | 5 KB | 7,845 req/s | 20.9 ms | 31.6 ms |
| 2 × 2 CPU / 2 GB | 10 KB | 7,301 req/s | 23.3 ms | 32.2 ms |
| 2 × 4 CPU / 4 GB | 5 KB | 13,558 req/s | 12.7 ms | 18.0 ms |
| 2 × 4 CPU / 4 GB | 10 KB | 11,706 req/s | 14.8 ms | 21.6 ms |
Latency percentiles are at 128 max-pressure k6 connections (the heaviest level shown on this page).
- The cost of high availability is small and bounded: versus a single instance with the same total cores, sharing state costs 5–7% of throughput and adds ~0.6–0.9 ms to light-load latency (~1.8 ms total through the full encrypted chain — still imperceptible for interactive APIs).
- The shared coordination store is not a practical limit: at peak it used under half of its modest 2-core / 2 GB allocation — clear headroom to serve more than two relay instances. Relay CPU remains the capacity lever.
- Relay memory drops sharply in this mode (state lives in the shared store): 0.4–1.2 GB per instance versus 1.9–2.4 GB self-contained.
Deployment tiers: Small, Medium, Large, Enterprise
The measured results reduce to four standard deployment tiers. Two planning rules are baked into the recommendations:
- Run at ≤60% of measured saturation. The gateway is provably safe far beyond that (zero errors at 8× overload), but at ≤60% utilization p99 latency stays in the single-digit-to-low-tens of milliseconds.
- Capacities below are quoted for ~10 KB payloads — the most demanding case measured. Smaller payloads raise request capacity (×1.23 at 5 KB, ×1.43 at 2 KB); adjust before picking a tier.
| Small | Medium | Large | Enterprise | |
|---|---|---|---|---|
| Relay deployment (per side) | 1 × 2 CPU / 2 GB, self-contained state | 2 × 2 CPU / 2 GB + shared store (2 CPU / 2 GB) | 2 × 4 CPU / 4 GB + shared store (2 CPU / 2 GB) | N independent "cells" of the Large tier |
| Total hardware (both sides) | 4 CPU / 4 GB | 12 CPU / 12 GB | 20 CPU / 20 GB | 20 CPU / 20 GB per cell |
| Measured saturation | 3,856 req/s | 7,301 req/s | 11,706 req/s | ~11,700 req/s per cell |
| Recommended sustained load | ≤ 2,300 req/s | ≤ 4,400 req/s | ≤ 7,000 req/s | ≤ 7,000 req/s × N |
| Requests per day | ≤ 200 M | ≤ 380 M | ≤ 600 M | 600 M × N |
| Encrypted data per day (10 KB) | ≤ ~2 TB | ≤ ~3.8 TB | ≤ ~6 TB | ~6 TB × N |
| Concurrent users (1 req/min) | ~140,000 | ~260,000 | ~420,000 | ~420,000 × N |
| High availability | No (single instance per side; sessions recover after a brief re-pair) | Yes — instance loss is transparent | Yes | Yes, plus cell-level isolation |
| Typical fit | Departmental APIs, internal services, pilots | Consumer app or B2B platform | High-volume consumer platform | National-scale / multi-tenant / multi-region |
Choosing a tier from your numbers:
- Take your peak sustained request rate (not daily average — size for the busiest hour).
- Adjust for payload:
capacity ≈ tier rating × (161 + 9.8×10) ÷ (161 + 9.8×KB). - Pick the smallest tier whose recommended load covers it; step up one tier if you need high availability (Medium is the smallest highly-available tier).
Worked example: a consumer API peaking at 3,000 req/s with ~5 KB payloads. The 5 KB capacity factor is ×1.23, so Medium's effective rating is ~5,400 req/s — Medium fits with headroom, and provides HA.
Enterprise scaling notes:
- Grow by adding cells (an independent Large-tier gateway pair with its own shared store), splitting traffic by client population, region, or DNS. Cells share nothing, so capacity grows linearly and a cell failure is contained.
- Within a cell, the per-side coordination store has measured headroom to coordinate 3–4 relay instances instead of 2 (certify before relying on it).
Test conditions and scope
- Topology: client → outbound relay → encrypted hop → inbound relay → origin, with a pre-warmed pair pool and on-demand growth. Headline numbers are a single 4 CPU / 4 GB instance per side at 10 KB with in-process state; the sizing curve adds measured 2-CPU and 8-CPU points; the payload model was measured at the 2 CPU / 2 GB shape with 2/5/10 KB bodies; the load-balanced results use two instances per side with a per-side shared store — each configuration a separate certified sweep.
- Tooling: load was generated and measured with k6 (Grafana Labs); the load-balanced tier coordinates relay state through Redis; everything ran on Kubernetes. All latency percentiles (p90, p95, p99) are reported directly by k6.
- Environment: single-node Kubernetes with the k6 load generator co-located in-cluster, so the absolute numbers are conservative. A direct-to-origin reference on the same rig reached ~56,000 req/s — the origin and generator were never the limiting factor.
- Isolation: all traffic was plain HTTP by design, so the figures isolate MTE encryption cost. TLS termination would add its usual, separate cost.
- Where the time goes: per-stage instrumentation at saturation shows pair acquisition ~0.001 ms and MTE encode + decode under 0.5 ms combined per relay — the remainder of high-load latency is ordinary queueing for CPU.
Bottom line: at the standard 4 CPU / 4 GB shape, MTE Relay delivers
fully-encrypted HTTP at ~7,700 requests/second per relay pair — enough for roughly
77,000 typical concurrent users — with about 1 ms of added latency at light load,
perfect payload integrity, and zero-error behavior at up to 8× overload.
Capacity follows CPU on a measured curve, payload size on a measured cost model
(~161 µs + ~9.8 µs/KB per request), and instance count on measured scale-out data.
Pick Small (≤200 M req/day), Medium (≤380 M/day, HA), Large (≤600 M/day,
HA), or Enterprise (linear growth by cells) from your peak traffic and payload
size, then confirm with a one-hour certification sweep.