Secrets are used in software applications for many different things from connecting to a database to signing a JWT passed between services. It is considered a security best practice to rotate these types of secrets on a regular basis. Some reasons for this are:
If a secret is accidentally exposed in logs, accidentally committed to version control, etc., having a process in place which smoothly rotates a secret turns what would be a firedrill into much less risky operation
Rotating secrets regularly helps to ensure that if a malicious
insider acquired a secret, that the time it would be viable would be minimized
It is common that applications will have secrets provided to them through configuration or the environment and the secret will rarely change if ever. Some secrets are harder to rotate than others but with a little planning, the tools exist to make secret rotation possible and possibly even with zero downtime. AWS provides some good documentation on retrieving secrets from Secrets Manager and Parameter Store as well as some examples of dynamic credential rotation. We can use this information to put together a solution for how we can rotate credentials within our applications. Lets get started.
A Shared Secret
Lets suppose we have application A and application B. In the spirit of zero-trust we aren’t going to just let the applications trust the network and communicate without any form authentication or authorization. Instead, the services will utilize a shared secret. This could be used for a JWT HMAC or simply a token used for bearer authentication. To keep things simple, we’ll use bearer authentication in this demonstration.
Storing and Sharing the Secret
For this example we will use AWS SSM Parameter Store for storing the shared secret. This could also be AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, etc. The important thing is that the secret is stored centrally in a secrets manager and the applications have the appropriate privileges so that they can access it.
NOTE: AWS Parameter Store and Secrets manager each have their own pluses and minuses so read up on the differences to select the one which fits your use case best.
We’ll use the SecureString type when configuring the parameter to ensure that the secret is encrypted with KMS and we set the secret to MyInitialSecret.
The API
The first thing we will do is create a small Python Flask api that has an endpoint protected with bearer authentication. The secret is acquired from the environment the way most applications typically would:
Next, we’ll create a SecretsCache class that:
Has a method to pull / refresh credentials from AWS Parameter Store
Has a method to get a secret from the object
Acquires AWS parameter store credentials when the object is instantiated
Has a ttl (time to live) feature to automatically refresh secrets when they become stale
Leverages “current” and “previous” versions of secrets to allow for rolling updates
The reason for the ttl in the class is that if the server is never rebooted it will never know that a secret has been rotated. It needs something to wake it up and refresh to secrets. The reason for fetching and using current and previous versions of a secret is that in a production environment not all clients and servers will necessarily recive updated credentials at the same time. By having both previous and current secrets the server can support a client that hasn’t received the updated secret yet.
Now, we’ll update the api code from before to instantiate an instance of this class when the app boots up to make the token comparison to the secret stored in the object instead of from the environment. We’ll also update the token validation to try the current secret, try the previous secret, and if both fail, refresh and retry the current secret:
The reason after we try current and previous secrets we refresh and retry current again is it could be possible that a client has received an updated secret before the server did. In that case neither the current or previous secrets would match the new secret the client holds. By refreshing the secrets on the server at this point, the server should receive the newly updated secret which becomes the new current and the request should succeed.
Putting it all together, the final code looks like this:
secretscache.py
server.py:
Alright! Lets give it a shot! Let’s boot up our Flask api:
In the output we see the “Retrieving secrets from Parameter Store” output so presumeabley the secrets are pulled correctly when the app boots. Now we’ll use curl to make a call to our api with the initial secret we set of MyInitialSecret to test things out:
The call succeeds as expected as the app pulled the secret when it booted and it matches the secret we send in the authorization header. In the output on the server side we see:
Now for the magic. We will go to Parameter Store and update the secret to a new value of MyNewSecret:
And without restarting our server (because that would be cheating ;)) we use curl again with the updated secret:
The request succeeds with the new secret! Lets look at the server logs. We see:
The output may differ depending on if the secret is rotated and curl executed before the ttl has expired or not. In this case the ttl had expired so when the new secret was tried the server pulled the new secrets and client secret against the refreshed secret which match so the request succeeds.
And just to prove we don’t have anything up our sleeve, we’ll make another request with a bogus secret:
And as we would expect, we are denied access.
What About The Client
You might wonder what the client side code might look like to support the secret
rotation. In actuality the client side code could simply use a class like the
SecretsCache class as is. For example:
client.py
Here we have a simple Python script that makes a request in a loop every 5 seconds. The script leverages our SecretsCache class again but needs no other special code to support the secret rotation. The class has the ttl logic built into it so it will refresh the secrets every so often. The client doesn’t need to be aware of current and previous versions so long as the server handles it, which it is doing in our example. We can start up our server and client and rotate the secret in AWS and when the ttl expires, we can see the magic happen:
In the client console when we rotate the secret, we see:
And in the server console we see:
We can see that after rotation the secret from Abracadabra to AlaKazaam!, when the ttl expires on the client, the new secret is picked up and tried. On the server side, the ttl has expired so the new secret is picked up and compared with AlaKazaam! which matches so the request suceeds.
Conclusion
I hope this post has helped to make secret rotation a little less intimidating and more approachable for people. TheSecretsCache class described in the post can easily be modified to work with AWS Secrets Manager or some other cloud providers secrets management service. In our example code we have used a very small ttl default of 1 minute but you likely wouldn’t need to refresh the secrets this frequently. This was simply done to speed up my demonstration. Somewhere between 10 - 60 minutes might be a more reasonable default to allow you to rotate the secret in case of exposure and have it rolled over on all servers in a reasonable amount of time. The sample code from the post can be found over on my GitHub Happy secret rotating!