When retrofitting security onto legacy applications, dealing with the binary blobs you get from standard encryption techniques can be inconvenient.
Your database schema might be expecting a particular data type for a field that you now want to keep secret.
The expectations regarding the contents of a piece of data are called an alphabet.
Examples are phone or credit card numbers, which might be expected to be a specific length and contain only numbers, or e-mail addresses, which should contain a local part (
info), followed by an @ symbol, followed by a domain (
If you were able to somehow transform your data but keep it in the same alphabet, your legacy application’s requirements would be met but your data’s privacy would be retained.
Format-preserving tokenization and encryption are two different but similar techniques to do just that.
Format-preserving encryption (FPE) is a way to use encryption while specifying the alphabet of the input and output. Modern FPE algorithms are based on strong and ubiquitous cryptographic ciphers, such as AES. Some algorithms are specified in NIST Special Publication 800-38G.
To encrypt with FPE, configure the encryption algorithm with your alphabet and secret key, and run it on your input data. To decrypt, simply configure the decryption algorithm with the same alphabet and secret key to do the reverse operation. The alphabet and secret key are small amounts of static data.
A drawback of most FPE algorithms is that they’re deterministic. This means that if you encrypt the exact same data twice, you’ll get the same ciphertext. Most non-format-preserving encryption schemes are randomized, such that this doesn’t happen. Some formats simply don’t have enough space to add randomization. For example, if you’re encoding 16-digit credit card numbers as 16-digit credit card numbers, there is no way to add randomness. But when using arbitrary-length e-mail addresses, it would be possible to use randomized FPE.
Tokenization is a way to substitute real data with apparently meaningless identifiers (tokens), widely used in the payment industry. The tokens are not derived from the input data, but randomly generated and stored in a backend database. Format-preserving tokenization makes sure that the tokens are in the same alphabet as the regular data.
Tokenization schemes can use either single-use or multi-use tokens. This is similar to the distinction between randomized and deterministic encryption. With single-use tokens, every time a the same input data is used, a new token is generated. With multi-use tokens, using the same input data results in the same token.
To have a token issued, send the input data to the tokenization service for a particular alphabet. For single-use tokens, the service will store a newly-generated token in the database along with the input, and return the token. For multi-use tokens, the service will either return an existing token from the database or generate a new token. To exchange the token back for the real data, send it to the tokenization service, which will lookup the token in the database and return the associated data. The alphabet is a small amount of static data, but the database will grow over time.
Should you use tokenization or encryption? It depends on your usecase. For a quick comparison, just look at the differences in the figures above. The infrastructural complexity of tokenization can be costly to maintain, but in some cases the different security profile and added randomization can be beneficial.
|Security||Depends on secrecy of key and security of cryptographic algorithm||Depends on access controls to token database|
|Storage||Small static data only||Large dynamic database|
|Complexity/Availability||Can be performed offline with access to secret key||Needs live connection with service/backend database|
Start using SDKMS for your encryption and tokenization needs: request a quote today!
Get our blog updates in your inbox: