Version: 4.6.x

MTE Relay Performance

MTE Relay is a drop-in encrypted HTTP gateway. It is the same engine behind both MTE Relay Server and MTE API Relay, so the performance characteristics on this page apply to either deployment. An application sends ordinary HTTP to an outbound relay; the relay MTE-encrypts the entire request; a peer relay in front of the origin decrypts it and forwards plaintext to the origin; and the response makes the same trip in reverse. Every byte of every request and response is encrypted in transit between the two relays, with no changes to the client or the origin service.

This page has two goals: to show that MTE Relay is highly performant, and to help you choose the right deployment size for your traffic.

Performance at a glance

A pair of standard 4 CPU / 4 GB relay instances moves about 7,700 encrypted requests per second of 10 KB JSON — roughly 77 MB/s (≈0.6 Gbit/s) of fully encrypted application payload, or about 665 million requests per day — with a measured error rate of zero. In the certification sweep for this configuration, 2,705,692 consecutive requests completed with 0.000% errors and 100% payload integrity (every response byte-identical to what was sent).

~7,700req/s

Encrypted 10 KB requests per standard 4 CPU / 4 GB relay pair

~665 M/ day

Encrypted API calls per day on one relay pair

~1ms

Added latency at light load across the full encrypted round trip

0.000%

Errors across 2.7 M requests, with 100% payload integrity

The price of the encryption is small and predictable:

At light load, the full encrypt → transmit → decrypt → origin → encrypt → transmit → decrypt round trip adds about 1 millisecond per request (1.2 ms through the relay pair vs 0.15 ms talking to the origin directly). For interactive APIs this is imperceptible.
Past its ceiling the gateway saturates gracefully, never destructively. When offered more work than it can encrypt, requests queue: latency rises linearly and predictably, throughput holds flat, and nothing fails. At 8× saturating demand, 99% of requests still completed in under 43 ms.

The system is compute-bound on encryption, and therefore scales with CPU (about 50% of relay CPU time is spent inside the MTE cryptographic library) — measured at roughly 955 requests/second per relay core, linear across the 2- and 4-CPU shapes. Memory is not a throughput factor: it tracks payload size, not demand.

How MTE Relay works

Each side runs its own relay inside its own environment. Applications speak ordinary HTTP to the nearby relay; every byte is MTE-encrypted for the hop across the public network and delivered as plaintext to the origin — no changes to the client or the origin service. Responses make the same trip in reverse.

Because the client and origin are untouched, MTE Relay drops into an existing architecture as an encrypted hop. The only resource it consumes is CPU to encrypt and decrypt; everything else on this page follows from that.

How we measured it

Load was generated with k6, the open-source load-testing tool from Grafana Labs. k6 reports latency percentiles (p90, p95, p99) directly in its summary output, so every latency figure on this page is taken straight from the tool rather than computed by us.

The test uses maximum-pressure connections, not realistic users. Each connection is a k6 virtual user (VU) running closed-loop with zero think time: it fires its next request the instant the previous response arrives. A single such connection is itself a heavy load source — one connection alone extracts about 800 requests per second from the gateway. These connections are best thought of as units of offered pressure, each equivalent to hundreds of real API clients.

Test connections	Each connection was sending	Total demand delivered
8	806 req/s	6,445 req/s
16	493 req/s	7,892 req/s
32	241 req/s	7,711 req/s
64	121 req/s	7,775 req/s
128	60 req/s	7,617 req/s

The gateway reaches its ~7,700–7,900 req/s ceiling once total demand exceeds it (about 16 maximum-pressure connections); beyond that point the same total throughput is simply shared across more connections. The sweep then pushed to 128 simultaneous saturating connections — roughly 8× the demand needed to saturate the gateway — to prove overload safety, and the error count stayed at zero throughout.

Measured throughput and latency

Workload: a POST of ~10 KB JSON, sustained 60-second holds per load level, with a co-located k6 load generator over plain HTTP (no TLS) to isolate the cost of MTE encryption.

Offered load (vs capacity)	Throughput (req/s)	Avg latency	p90 latency	p99 latency	Worst case
Light (~80%)	6,445	1.2 ms	1.5 ms	2.0 ms	7 ms
Saturating (100%)	7,892	2.0 ms	2.8 ms	3.9 ms	23 ms
Overload (~2×)	7,711	4.1 ms	6.2 ms	9.1 ms	37 ms
Overload (~4×)	7,775	8.1 ms	13.2 ms	19.5 ms	41 ms
Overload (~8×)	7,617	16.7 ms	28.8 ms	42.9 ms	154 ms

Across the full certification sweep: 2.71 million requests, 0 failures, 0 corrupted payloads.

Throughput holds flat at capacity no matter how hard the gateway is pushed:

Light (~80%)

6,445 req/s

Full (100%)

7,892 req/s← capacity

Overload (2×)

7,711 req/s

Overload (4×)

7,775 req/s

Overload (8×)

7,617 req/s

Encrypted throughput delivered (req/s). Throughput holds at capacity even at 8× saturating demand.

Under overload, the only thing that grows is queueing delay — and it grows linearly, with no cliffs and no error storms:

p90p99

Light (~80%)

1.5 ms

2 ms

Full (100%)

2.8 ms

3.9 ms

Overload (2×)

6.2 ms

9.1 ms

Overload (4×)

13.2 ms

19.5 ms

Overload (8×)

28.8 ms

42.9 ms

p90 and p99 latency (ms) by offered demand. Both percentiles grow linearly as demand doubles — queueing delay only, with no cliffs and no error storms.

What this means in real terms

7,700 encrypted requests/second is, equivalently:

Client profile	Request rate per client	Clients supported by one relay pair
Mobile/web user (6 requests per minute)	0.1 req/s	~77,000 concurrent users
Active interactive session (20 requests per minute)	0.33 req/s	~23,000 concurrent sessions
Busy machine-to-machine integration	10 req/s	~770 concurrent integrations
Maximum-pressure benchmark connection	30–800 req/s	~16 (the saturation point above)

For an interactive API: the relay adds about one millisecond. A user cannot perceive it.
For a high-volume service: one standard relay pair carries ~665 million encrypted API calls per day at 10 KB each.
For reliability engineering: overload manifests as added queueing delay, not as errors. At 8× saturating demand the gateway did not drop, corrupt, or reject a single request.

Resource footprint

Two instance shapes are certified (zero errors through full overload sweeps):

Per relay instance	CPU at full load	Memory at full load (10 KB payloads)	Saturation
4 CPU / 4 GB (standard)	~4 cores (the limiter)	1.9–2.4 GB	~7,700 req/s
2 CPU / 2 GB (small)	~2 cores (the limiter)	~1.9 GB	~3,820 req/s

Memory demand does not grow with offered load — it tracks payload size (response/body buffering churn), not request rate: at 10 KB payloads the working set is ~1.9 GB, at 2 KB payloads it is only ~0.2 GB. Under overload the gateway degrades in latency only; it does not balloon toward an out-of-memory failure. On the small shape, bound the runtime's memory slightly below the container limit so it stays within its allocation.

Sizing by hardware: cores → throughput

The gateway was measured at three instance sizes with the identical build and workload (10 KB payloads), giving a calibrated cores→throughput curve:

Per-relay CPU	Sustained ceiling (req/s)	Per-core efficiency
2 (measured)	~3,820	~955 req/s per relay core
4 (measured)	~7,700	~960 req/s per relay core
8 (measured)	~12,600	~790 req/s per relay core

Scaling is essentially linear from 2 to 4 cores (×2.02) at ~955 req/s per relay core. Only the 4→8 step measured sublinear (×1.64); at that size the test deployment approaches the capacity of the single test node, so the roll-off is at least partly testbed contention rather than a software limit.

1 CPU

~1,900

2 CPU

3,820 ✓

4 CPU

7,700 ✓

6 CPU

~10,400

8 CPU

12,600 ✓

12 CPU

~16,800

16 CPU

~20,600

Saturation at 10 KB by per-relay CPU. Solid bars are measured; dashed bars are projected from the ~955 req/s-per-core curve. Linear through 4 CPU; several small pairs beat one large instance.

Per-relay instance size	100% saturation (10 KB)	Confidence
1 CPU	~1,900 req/s	extrapolated (linear ~955/core), ±20%
2 CPU	3,820 req/s	measured
4 CPU	7,700 req/s	measured
6 CPU	~10,400 req/s	interpolated, ±10%
8 CPU	12,600 req/s	measured (conservative)
12 CPU	~16,800 req/s	extrapolated, ±20%
16 CPU	~20,600 req/s	extrapolated, ±20%+

Practical guidance: treat CPU as the only capacity lever. Per-core efficiency is flat at ~955 req/s through the 2- and 4-CPU shapes, so size by arithmetic and prefer adding more small instances (horizontal scaling, which also adds redundancy) over growing a single instance. Projections beyond the measured 2–8 core range should be confirmed by measurement before being relied on for production sizing.

On memory: it is not a throughput variable. 2 GB is certified for the 2-CPU shape (including 10 KB payloads); 4 GB is recommended for 4-CPU-and-larger shapes; 8–16 GB show no measurable throughput or latency benefit.

Payload-size sensitivity

Three full overload sweeps at 2 KB, 5 KB, and 10 KB bodies — 5.0 million requests, 0.000% errors in every run — give a simple, accurate cost model:

Payload	Saturation (req/s)	Payload moved	Light-load avg latency	p90 latency	p99 latency
2 KB	5,537	~11 MB/s	1.5 ms	36.1 ms	47.6 ms
5 KB	4,762	~23 MB/s	1.7 ms	43.1 ms	59.7 ms
10 KB	3,856	~38 MB/s	2.0 ms	50.9 ms	67.6 ms

The p90 and p99 columns are measured under heavy overload (128 max-pressure k6 connections); the average latency is at light load. The three points fit a simple linear cost model almost exactly:

Relay-pair capacity cost per request ≈ 161 µs fixed + 9.8 µs per KB (predicts the 5 KB measurement to within 0.1 µs)

Fixed per-request costPayload-dependent crypto

2 KB

181 µs

5 KB

210 µs

10 KB

259 µs

Cost per request in µs of relay-pair capacity (2-CPU shape). Even at 10 KB, most of the cost is payload-independent.

What this means in practice:

Per-request overhead dominates. Even at 10 KB, 62% of the cost is payload-independent. Shrinking payloads 5× raises request throughput only ~1.4× — so for small-payload APIs, plan capacity by request rate, not by bandwidth.
Larger payloads are relatively cheaper to encrypt: moving the same data in 10 KB requests costs ~3.4× less gateway capacity than moving it in 2 KB requests.
To estimate saturation for any payload size and shape: take 1 ÷ (161 µs + 9.8 µs × KB) and scale by the CPU curve above.

High availability and horizontal scaling

For production resilience the gateway runs multiple relay instances per side behind a load balancer, with each side's instances sharing their encryption state through a dedicated coordination store (Redis, one per side). Any instance can then serve any client's traffic — an instance can fail or be replaced without breaking sessions — and capacity grows by adding instances.

Four full overload sweeps were run in this configuration (two instances per side; 12.85 million requests including saturation probes), all at 0.000% errors with traffic split evenly across instances:

Instances per side	Payload	Saturation	p90 (heavy overload)	p99 (heavy overload)
2 × 2 CPU / 2 GB	5 KB	7,845 req/s	20.9 ms	31.6 ms
2 × 2 CPU / 2 GB	10 KB	7,301 req/s	23.3 ms	32.2 ms
2 × 4 CPU / 4 GB	5 KB	13,558 req/s	12.7 ms	18.0 ms
2 × 4 CPU / 4 GB	10 KB	11,706 req/s	14.8 ms	21.6 ms

Latency percentiles are at 128 max-pressure k6 connections (the heaviest level shown on this page).

1 × 4 CPU (in-memory)

7,700 req/s

2 × 2 CPU + shared state

7,301 req/ssame cores, −5%

1 × 8 CPU (in-memory)

12,600 req/s

2 × 4 CPU + shared state

11,706 req/ssame cores, −7%

10 KB saturation: distributing across two instances costs only ~5–7% of throughput versus a single instance with the same total cores — and buys high availability plus horizontal scaling.

The cost of high availability is small and bounded: versus a single instance with the same total cores, sharing state costs 5–7% of throughput and adds ~0.6–0.9 ms to light-load latency (~1.8 ms total through the full encrypted chain — still imperceptible for interactive APIs).
The shared coordination store is not a practical limit: at peak it used under half of its modest 2-core / 2 GB allocation — clear headroom to serve more than two relay instances. Relay CPU remains the capacity lever.
Relay memory drops sharply in this mode (state lives in the shared store): 0.4–1.2 GB per instance versus 1.9–2.4 GB self-contained.

Deployment tiers: Small, Medium, Large, Enterprise

The measured results reduce to four standard deployment tiers. Two planning rules are baked into the recommendations:

Run at ≤60% of measured saturation. The gateway is provably safe far beyond that (zero errors at 8× overload), but at ≤60% utilization p99 latency stays in the single-digit-to-low-tens of milliseconds.
Capacities below are quoted for ~10 KB payloads — the most demanding case measured. Smaller payloads raise request capacity (×1.23 at 5 KB, ×1.43 at 2 KB); adjust before picking a tier.

	Small	Medium	Large	Enterprise
Relay deployment (per side)	1 × 2 CPU / 2 GB, self-contained state	2 × 2 CPU / 2 GB + shared store (2 CPU / 2 GB)	2 × 4 CPU / 4 GB + shared store (2 CPU / 2 GB)	N independent "cells" of the Large tier
Total hardware (both sides)	4 CPU / 4 GB	12 CPU / 12 GB	20 CPU / 20 GB	20 CPU / 20 GB per cell
Measured saturation	3,856 req/s	7,301 req/s	11,706 req/s	~11,700 req/s per cell
Recommended sustained load	≤ 2,300 req/s	≤ 4,400 req/s	≤ 7,000 req/s	≤ 7,000 req/s × N
Requests per day	≤ 200 M	≤ 380 M	≤ 600 M	600 M × N
Encrypted data per day (10 KB)	≤ ~2 TB	≤ ~3.8 TB	≤ ~6 TB	~6 TB × N
Concurrent users (1 req/min)	~140,000	~260,000	~420,000	~420,000 × N
High availability	No (single instance per side; sessions recover after a brief re-pair)	Yes — instance loss is transparent	Yes	Yes, plus cell-level isolation
Typical fit	Departmental APIs, internal services, pilots	Consumer app or B2B platform	High-volume consumer platform	National-scale / multi-tenant / multi-region

Small (4 CPU)

≤ 2,300 req/s

Medium (12 CPU, HA)

≤ 4,400 req/s

Large (20 CPU, HA)

≤ 7,000 req/s

Enterprise

≤ 7,000 req/s× N cells

Recommended sustained encrypted traffic per tier (req/s at 10 KB). Every tier is certified at 0.000% errors through full overload; Enterprise grows linearly by adding independent cells.

Choosing a tier from your numbers:

Take your peak sustained request rate (not daily average — size for the busiest hour).
Adjust for payload: capacity ≈ tier rating × (161 + 9.8×10) ÷ (161 + 9.8×KB).
Pick the smallest tier whose recommended load covers it; step up one tier if you need high availability (Medium is the smallest highly-available tier).

Worked example: a consumer API peaking at 3,000 req/s with ~5 KB payloads. The 5 KB capacity factor is ×1.23, so Medium's effective rating is ~5,400 req/s — Medium fits with headroom, and provides HA.

Enterprise scaling notes:

Grow by adding cells (an independent Large-tier gateway pair with its own shared store), splitting traffic by client population, region, or DNS. Cells share nothing, so capacity grows linearly and a cell failure is contained.
Within a cell, the per-side coordination store has measured headroom to coordinate 3–4 relay instances instead of 2 (certify before relying on it).

Test conditions and scope

Topology: client → outbound relay → encrypted hop → inbound relay → origin, with a pre-warmed pair pool and on-demand growth. Headline numbers are a single 4 CPU / 4 GB instance per side at 10 KB with in-process state; the sizing curve adds measured 2-CPU and 8-CPU points; the payload model was measured at the 2 CPU / 2 GB shape with 2/5/10 KB bodies; the load-balanced results use two instances per side with a per-side shared store — each configuration a separate certified sweep.
Tooling: load was generated and measured with k6 (Grafana Labs); the load-balanced tier coordinates relay state through Redis; everything ran on Kubernetes. All latency percentiles (p90, p95, p99) are reported directly by k6.
Environment: single-node Kubernetes with the k6 load generator co-located in-cluster, so the absolute numbers are conservative. A direct-to-origin reference on the same rig reached ~56,000 req/s — the origin and generator were never the limiting factor.
Isolation: all traffic was plain HTTP by design, so the figures isolate MTE encryption cost. TLS termination would add its usual, separate cost.
Where the time goes: per-stage instrumentation at saturation shows pair acquisition ~0.001 ms and MTE encode + decode under 0.5 ms combined per relay — the remainder of high-load latency is ordinary queueing for CPU.

Bottom line: at the standard 4 CPU / 4 GB shape, MTE Relay delivers fully-encrypted HTTP at ~7,700 requests/second per relay pair — enough for roughly 77,000 typical concurrent users — with about 1 ms of added latency at light load, perfect payload integrity, and zero-error behavior at up to 8× overload. Capacity follows CPU on a measured curve, payload size on a measured cost model (~161 µs + ~9.8 µs/KB per request), and instance count on measured scale-out data. Pick Small (≤200 M req/day), Medium (≤380 M/day, HA), Large (≤600 M/day, HA), or Enterprise (linear growth by cells) from your peak traffic and payload size, then confirm with a one-hour certification sweep.

Performance at a glance​

How MTE Relay works​

How we measured it​

Measured throughput and latency​

What this means in real terms​

Resource footprint​

Sizing by hardware: cores → throughput​

Payload-size sensitivity​

High availability and horizontal scaling​

Deployment tiers: Small, Medium, Large, Enterprise​

Test conditions and scope​