Problem
Historical data generated by long-term mission-critical systems as part of important business functions contain comprehensive business knowledge and sensitive information. When data resides inside the perimeter-based security of firewalls, organizations have a sense of control over this data and its storage (aka Legacy on-premises data-bases). These dynamic changes when the data needs to be migrated to other modern forms of storage such as on-cloud. Due to Regulatory requirements, lack of Digital Trust in cloud providers, and/or the lack of visibility of data-security protocols for on-cloud data, organizations feel the need for encryption of all or selective sensitive data in the databases. The ask is the encryption and thereby protection of legacy data-at-rest that is migrated from an on-premises database to an external environment such as the Cloud.
Solution
Fortanix Transparent Database Encryption, or External encryption using Fortanix DSM-Accelerator, is a method specifically useful for “data at rest” in tables and tablespaces. The solution can efficiently and effectively tokenize and/or detokenize potentially Terabytes of data that isn’t currently in use or in transit, whilst minimizing user impact. This is termed “transparent” because it is invisible to users or applications that may query the data. The key management and encryption are decoupled from the application and thus can be used without significant changes to the existing application. The data is decrypted for authorized users or applications when in use but stays protected at rest.
Data Ingestion
Fig 1: Data Ingestion with TEP/DSM-A
Fig 1 shows the process of data ingestion with Transparent Encryption using Fortanix Data Service Manager Accelerator (DSM-A). The source of data could be a legacy database table, a CSV file with data, or even an application that generates data. When an insert operation is desired, the Transparent Encryption Proxy (TEP) is invoked. In case of data migration, this proxy could be a script that could connect to the source, invoke the DSM-A, authenticate, and perform tokenization of sensitive information before redirecting the data into the destination source. The destination could be an on-cloud database, on-premises database/storage, or literally any other source as per the requirement.
Data Retrieval
Fig 2: Data Retrieval with TEP/DSM-A.
Querying/Retrieval of data is also efficiently handled using the reverse approach. Fig 2 shows how data can be searched by providing selection criteria in plaintext. When an authorized CLI/GUI search application provides a search term, the TEP is invoked which encrypts the term using DSM-A and subsequently searches the destination using the encrypted value. The resultset obtained is then decrypted before returning the search results in plaintext to the requesting application.
Benefits
Transparent Database encryption is extremely useful because even if the destination storage files are stolen or the security of cloud providers is flawed and the physical data is compromised, the data remains unreadable nonetheless, with only authorized users/applications successfully being able to query the plaintext data, thus providing a disincentive for hackers to steal the data at all.
Some of the striking features and benefits of Transparent Encryption using the Data Security Manager-Accelerator are:
- Encrypt or de-identify the sensitive data collected before it hits external networks.
- Follows a Zero-trust approach:
- Data gets de-identified right at the source.
- Role-based access control.
- Decouple cryptographic operations from business logic.
- The encryption key is always securely stored at rest inside the central DSM cluster.
- Works best when there is a need for a very high rate of data tokenization and detokenization with negligible latency. It also supports specific database UDFs.
- Available as Java, JCE, PKCS#11 libraries or as a more convenient web service that can be consumed in micro-services or serverless functions such as AWS Lambda or Azure functions supports specific database UDFs.
- High throughput for data transformation of large datasets sitting at rest in data lakes.
- Libraries load in memory; Keys are cached in memory and In-memory encryption is in place enabling a High rate of cryptographic operations.
Working example:
In this example, we demonstrate the working of DSM – Accelerator as a web service using a Python script. Python is solely used due to its popularity and simplicity. Any application language that can be used to connect to the desired source and call RestAPI services can be used.
Prerequisites:
Install Docker, docker-compose on the local machine, or Run on a VM which has these installed. Install python
Input:
Any combination of source and destination is possible such as CSV files, database tables, etc. In this case, the python script is using a CSV file NFLdraftclass.csv with selective fields to be tokenized.
Output:
The tokenized data is inserted into an MSSQL database table and plaintext querying of encrypted data is demonstrated.
Set up steps:
- CREATE KEYS TO BE USED FOR TOKENIZATION/DETOKENIZATION IN THE DSM.
The key should be “Exportable” for it to be used with DSM-A. - COPY THE API KEY OF THE APPLICATION TO WHICH THE KEY CREATED ABOVE BELONGS. THIS KEY WILL BE USED BY THE SCRIPT FOR AUTHENTICATING THROUGH THE RESTAPI CALLS TO DSM-A.
- CHECK FOR CONTAINER LOGS AS SEEN BELOW TO CONFIRM THAT THE SERVICE IS STARTED
We now have a DSM-A service available on port 8443 running on our localhost. The service can be assessed at http://localhost:8443/ from the system on which the service is running.
The DSM-A service can also be accessible on http://192.168.0.3:8443/ (IP Address of container obtained using “docker inspect” as shown above) from inside a container.
Accessing the DSM-A through the python script:
Tokenize/Detokenize: the tokenize/detokenize process is a simple POST to call tokenize() or detokenize() to the service we just started.
Other operations that facilitate Transparent Data Encryption include encryption and decryption of large csv files or all/selective columns of database tables
Querying Data:
Below are a few examples of searches done on the encrypted database:
- Search Address = 4000 W. North Ave
- Search Player = Bradley Chubb
- Search Email = BarkSA00@nfl.com
- Search DraftAge = 23