State Management Considerations
Introduction
As a companion document to Client Server Design Patterns, this document describes
some considerations for State management.
State is the current internal image of an MTE Instance and can easily be saved and restored
from an application thereby giving your application the ability to persist over multiple
environmental events such as a loss of power or connectivity. State also allows an endpoint
to manage conversations with multiple paired endpoints and is therefore crucial in many
server endpoint implementations.
Since State is the current internal image of an MTE Instance it must be treated with
sensitivity and security.
A pair describes the relationship of the States between an Encoder and a Decoder and
is critical for the ongoing ability for endpoints to successfully communicate.
This document outlines strategies for ensuring that the States associated with pairs are effectively managed.
State Management
In a typical client / server system, the servers are most likely farmed (multiple servers) to
improve performance and scalability.
This presents a challenge since client endpoints may or may not always converse with the
same server.
As noted in the above referenced document, saving and restoring State is crucial for the successful encoding and decoding of data in a web conversation. This requires that the State on the server is always in sync with the State on the client (or at least within the sequencing tolerance).
After the server finishes decoding an incoming transmission or encoding
an outgoing transmission, the State must be persisted.
Many systems rely on sticky sessions (a proxy such as NGINX keeps these in check),
but that is not always desirable.
In Memory
If you only have a single server in your complex, it may be possible to persist the States in memory. However, this may be a challenge if the number of clients are large since the States must be kept in some sort of concurrent dictionary object and may consume large amounts of memory. You will also need to know when it is safe to purge a State since most web applications are only connected for a short time span. Affinity and concurrency must be considered as well, since a method for identifying each incoming client is required in order for the server to ensure the re-hydration of the State for the proper pair.
In a Database
Many server technologies allow for a session database to persist artifacts associated with multiple transmissions within a conversation (session). In a situation such as this the States would be stored in that database and must be retrieved and persisted after each usage of the MTE for the specific paired client.
In a Distributed Cache
Many modern web architectures take advantage of a distributed cache mechanism. Two of the more popular implementations are memcached and Redis. These technologies are mature and perform quite well, which makes them an excellent choice for managing the States of the Eclypses Web MTE.
.Net 6.0
Eclypses has much experience with .Net which has an excellent interface for handling distributed cache utilizing the above technologies. The IDistributedCache interface has implementations for many specific methodologies including (but not limited to):
- In Memory
- Memcached
- Redis
- MS SQL Server
This makes it possible to utilize any (or all) of these technologies without the
application code needing to change. In fact, the server application can
use a run time option switch to choose which methodology to implement when
your server starts up.
Eclypses has tested this internally within a server farm of three Ubuntu servers
with satisfactory results. Your testing may vary, but in order of performance
we ranked them as follows:
- In Memory -- obviously this was the fastest and was able to keep the pairs properly associated based on the client's unique identifier. However, it was only possible for a single server, so during development and unit testing it works well.
- Memcached -- in our server farm, we deployed memcached as a docker image on its own server and found very little overhead vs running In Memory on a single server.
- Redis -- in our server farm, this was slightly slower than memcached but we found the configuration to be a little more challenging to set up. It is, however, an excellent choice.
- MS SQL Server -- the very nature of the overhead associated with a relational database made this the least performing of the configurations that we tested. We do not recommend this for a high throughput environment, but it does work well and has the additional advantage of being able to use SQL tools to manage the individual States.
Things to consider
- Initial Pairing should require that both ends of a conversation supply at least one of the Pairing Values. A good pattern is that the client provides a unique personalization string, the server provides the nonce (a timestamp is a good option), and both ends derive the entropy through a public key exchange.
- Since identity of the client is crucial at the server to ensure the proper pair is used, there may be several ways to communicate this.
- Custom HTTP Header -- a custom HTTP header (such as X-EclypsesMTE) can contain the value of the endpoint identifier. During initial pairing, this can be the personalization string. It must be unique for each client endpoint.
- JWT property -- if you are using JSON web tokens as a bearer token in your authorization header, you can populate a claim (such as the NameIdentifier claim) with your unique endpoint id (again, the pair's personalization string is a good candidate). Then on the server side when you deconstruct the JWT, you will have the value needed to retrieve the proper State from your persistent store.
- Payload property -- you can choose to send your endpoint identifier as a property of each and every payload, however, this requires modifying each payload to include this property and is not recommended.
- Query parameter -- you can choose to send your endpoint identifier as a query parameter, but this requires application changes on every route within your server and places an undue burden on your application code.
- The State is the internal representation of the MTE, so it must be encrypted prior to persistence, and decrypted after retrieval.
- Your server side cache of State should have a timeout option to purge stale paired States, and if so, an appropriate error should be returned to the client if the State has timed out. Most web applications already have a session timeout, so this should also be the cache timeout. Each of the technologies detailed above have that timeout ability.
- Concurrency may be an issue if you are rolling your own State management. Servers must be architected to ensure conversation affinity when accessing a shared resource such as distributed cache.
Scalable Architecture
The architecture that Eclypses has tested consists of the following components on the server side.
- Load Balancer -- A load balancer presents a single IP address for the connected clients to attach to. We have used NGINX as that proxy and set it to use a round robin approach to our server farm.
- Server Farm -- We used Ubuntu running inside of Hyper-V VMs on three different servers all connected to the NGINX front end. These all run .Net 6.0 server instances.
- Individual Servers -- Each server instance runs .Net 6.0 with a registered IDistributedCache service that has a concrete implementation of the four technologies listed above. An appsetting instructs our startup class to register one of the four technologies.
- Distributed Cache -- Docker images for memcached, Redis, and MS SQL Server are all running on a separate VM - so we recommend that whichever caching technology you choose, that you separate it onto its own VM.
Process Flow
The process flow is broken into two distinct routes.
Handshake (pairing)
- For the pairing route, when a client needs to pair, it sends a POST request to the server with it's unique personalization string and a public key to use as a seed to generate entropy.
- The server then clears out any preceding artifacts associated with the personalization string
from the distributed cache and follows this process:
- It establishes a Nonce value (we use current time since unix epoch).
- It generates its own public key. - It uses the values from the client and its public key to generate entropy.
- It creates an MTE Encoder and MTE Decoder with the three Pair Values.
- It captures the initial States, encrypts them, and stores them in distributed cache. Using the personalization string as the lookup key.
- It clears out (zero-fills) the memory associated with the public keys and the entropy.
- It returns its public key and Nonce to the client.
- The client receives the result of the POST and does the following:
- It generates entropy with its private key and the server's public key.
- It uses the returned nonce, the generated entropy and the known personalization string to create the MTE Encoder and MTE Decoder.
- It clears out (zero-fills) the memory associated with the public keys and the entropy and nonce (the personalization string must be kept so that it can be used in subsequent transactions).
- It captures the initial States, encrypts them, and stores them in session storage. Be sure to use session storage since it is flushed when the browser tab closes. This is required because on a page load, the WASM object that implements MTE is reloaded.
Transacting
- The client captures some information to send to the server and does the following:
- It stringifies this information into JSON.
- It fetches the State information from session storage and decrypts it.
- It hydrates the State of the MTE Encoder
- It encodes the JSON string and POSTs it to the server for processing along with its endpoint identifer.
- It captures the State of the MTE Encoder, encrypts it, and overwrites the session storage.
- The server receives the payload and does the following:
- It identifies the client based on the incoming endpoint identifier.
- It retrieves the MTE Decoder State from the distributed cache storage.
- It hydrates an MTE Decoder with the decrypted State.
- It decodes the incoming message body.
- It captures the State of the MTE Decoder, encrypts it and overwrites the distributed cache with the endpoint identifier as the lookup key.
- It clears out (zero fills) the memory associated with the MTE Decoder State that was just used.
- It presents the decoded information to the application for processing.
If the server needs to send back a response payload, the same process happens in reverse except that the server will then work with its MTE Encoder State and the client will work with its MTE Decoder State.
Summary
MTE Web is a high performing, extremely secure way for a client and server to ensure
that the information that must be exchanged between endpoints is completely tamper-proof.
It requires minor changes in the methodology that you use during implementation, yet it
is perfectly suited for large, complex multi-server systems.