From small businesses to large multinationals, companies struggle with secret or “sensitive” information. It takes hackers only a single minute to find exposed secrets and make use of them.
Given the complexity of the cloud ecosystem, it’s inevitable that you will find yourself trying to find a solution to secret storage. Fortunately, Hashicorp, creators of Terraform, developed a product called Vault.
Almost 1,000 companies already use Vault to safely store their secrets in a decentralized way. Not only does this help them keep everything nice and secure, but it also makes porting secrets across a hybrid environment easy.
Keep reading to see how you can implement best practices for using Terraform with Vault enterprise.
Why Vault?
As we mentioned previously, breaches are widespread. So if you’re not worrying about your security yet, you really should be.
Vault comes in and fills a niche that’s been a pain point for both management and developers. It’s designed to fix the problem of decentralized secret storage and management.
Having a consistent way to store and access secrets across the organization makes sense. When you’re dealing with hybrid cloud offerings and trying to reconcile your secrets across teams, you need somewhere central to turn to.
Robust APIs mean that you’re able to codify your secret management in almost any environment. It’s also very likely that the identity provider you use is already supported by Vault.
Of course, should you wish to roll your own authentication or identity provider, that’s supported as well. In addition, vault Enterprise supports custom secrets management engines using GoLang.
Free vs. Enterprise
Hashicorp published vault as a free, open-source secrets keeper. You’ll find support for all the usual Secret Engines in the open-source version, but Enterprise has some useful features.
Namespaces
Generally, if you’re implementing vault, you’re deploying it for multiple teams across your organization. Namespaces provide an easy way to segregate functions and teams.
Namespaces are functionally isolated environments. They have separate paths for authentication and authorization, and data is kept only in the namespace.
This means, practically, that you are able to isolate Secret Engines, Policies, Identities, and tokens at a namespace level. Child namespaces are able to share policies with their parents, and namespaces can be managed by delegated users.
Replication
Vault Enterprise separates the concepts of replication quite neatly. Performance replication allows secondary instances of Vault to satisfy client requests if they’re able to do so.
A secondary Vault maintains its own Tokens and Leases but shares the same config, policy set, and encryption keys. If anything you’re doing will modify the shared state, the secondary will pass the request to the primary.
Disaster Recovery
Like most DR setups, Vault Enterprise will elect a new Primary in case of a failure. Your secondaries will share all the same configs, policies, and encryption keys as the primary.
In a change from the performance scenario, Secondaries also share the same root tokens and leases. This makes sense as they’re designed to ensure continuous operation if the Primary has a failure.
HSM Support
Hardware Security Modules are becoming more common in Enterprise use. As a result, companies constantly need to find new ways to secure their payloads.
Vault Enterprise supports using your existing HSM to encrypt its master key. This allows for automatic unsealing of the vault using the in-storage wrapped key.
There are many more advantages to using an HSM with Vault. They’re detailed inside the Vault Enterprise documentation.
Seal Wrapping
Right on the heels of HSM support, Vault Enterprise supplies a mechanism for wrapping values with extra encryption. As you saw in the HSM Support section, Vault can wrap the master key via HSM.
Seals allow you to implement a number of extra security layers. Depending on your local jurisdiction, this may include FIPS 140-2.
It’s trivial to use the seal stanza to provide these extra layers. As demonstrated with the OCI KMS seal below (reference values, do not copy these):
seal “ocikms” {
key_id = “my_key_ocid”
crypto_endpoint = “endpoint.oraclecloud.com”
management_endpoint = “endpoint-management.oraclecloud.com”
auth_type_api_key = “true”
}
As you can see, we’re simply providing Vault with the keys, endpoints, and auth type that we’d like to implement. If this seal is activated, it will seamlessly integrate with Oracle Cloud’s KMS.
FIPS 140-2 Compliance
Vault’s seal wrapping has been marked compliant by Leidos. Vault will store CSPs in a way that is approved by KeyStorage and KeyTransit.
Storage Snapshots
It’s essential to have a constant backup process, and Vault Enterprise provides for this automatically. Depending on your storage backend, you’ll be able to configure Vault to run periodic backups.
If at all possible, you should not configure Vault to store backups locally. In a proper DR setup, only the active node will take snapshots. This means you can’t predict which node would have the backups in its storage.
Using distributed storage gives you an additional layer of protection. Cloud storage such as AWS S3 supports redundancy and staged replicas.
Lease Quotas
One of the more interesting features of Vault, Lease Quotas, allows you to set limits to the number of leases that the server will grant. Once the max_leases limit is hit, clients will receive a forbidden response until leases expire or are revoked.
All nodes in the cluster will share the same max_leases counter, so it doesn’t matter which node is serving the request. Root tokens are exempt from the max_leases counter.
You’re able to scope the lease counters across the whole of Vault or drill down to a namespace or mount level. Namespaces support inheritance for limits as well.
Inheritance
Let’s take a moment to talk about inheritance. The best practice is to define a limit at the root level first. This will then apply to all child namespaces, restricting by default.
Limits defined at the namespace level will override inherited values. Likewise, limits defined at a mount point level will override namespace values.
Things to Consider First
Vault is not a magic bullet. If you’ve been in the technology world for long enough, you soon realize that everything has flaws.
While Hashicorp has done its best to ensure that everything works out of the box, you still need to plan. Getting a good reference for your project will help in the long run.
Plan Your Deployment
Having a solid idea of what you want your Vault deployments to look like is crucial to a successful pIn addition, you’ll. You’ll need to know exactly how you plan to fold Vault into your existing authentication schemes.
It’s important to know which applications will interact with Vault directly and which are going to use a pass-through model. Not only is this useful for your initial planning, but it will be invaluable in a DR scenario.
We suggest going through the process of making a design diagram. This will let you spot any potential errors and show you where your security concerns are.
Keep track of which services will be running in which enclave and track traffic through all paths. Make a note of which ports you will need to have access to in each deployment. Remember, apply Least Privilege!
Figure Out Storage
Every secret you store will take up additional space. Often, people opt to have a retention strategy for secrets as well. Storing 30 days of history for all your secrets across your entire organization can get costly.
It’s important to get your storage questions out of the way early so that there’s no contention for developer time later on. Although Vault supports many storage mechanisms, it’s a good idea to consider using Consul.
At scale, you’re going to want to have a solution that scales with you and is as easy to use as possible. Don’t forget to add in cost calculations when comparing storage solutions.
APIs vs Terraform for Management
Part of the adventure of Infrastructure as Code is codifying everything that you can. In keeping with that paradigm, it makes sense to use terraform to manage your Vault deployments.
Yes, you can use the API and CLI to manage every operation but at the cost of repeatability and visibility. In particular, Terraform is a powerful way of managing vault namespaces and state across the whole organization.
Roll Your Own Plugins
The larger the organization, the more complex entitlements and identity become. It’s very likely that you’ve already got a chosen solution in place for all your other authentication.
Instead of adopting yet another standard, why not simply write your own identity plugin for Vault? It’s relatively straightforward, and there are countless tutorials. You don’t have to sacrifice security for the sake of new technology.
Best Practices for Using Terraform With Vault Enterprise
Let’s dive right into the best practices and recommendations. Of course, not all of these will apply to every organization, but most are generalized enough to work.
Least Privilege
When talking about security best practices, the first principle we need to discuss is Least Privilege. This will form the cornerstone for our discussions throughout this article.
Very simply, Least Privilege means that every entity should only have minimum access rights. That is, enough rights to function but not enough to cause trouble.
Generally speaking, this is implemented by using rigid boundaries for each service or entity. Keeping your secrets management project and deployments logically separate from the rest of your system is a good place to start.
The Secrets Deployment
In keeping with security best practices, always try to isolate your secrets management project completely. Deployments should be onto their own, separate cluster. There should be no other apps or deployments running besides Vault.
Every additional app or workload that you bring into your secure enclave is a potential target for exploits. Even disregarding exploits, you don’t want an unrelated service to bring down your vault cluster – so keep this sacred and clean.
It’s possible to set up strict ingress and egress rules for each aspect of Vault. You should never need to allow unfettered public access to your vault endpoints.
If at all possible, it’s recommended to keep only port 443 open all the way along. This allows you to craft end-to-end encryption through all your components.
Always Use a Bastion
You may not always need a bastion, but you’ll regret not integrating one if the need arises. Ideally, the bastion host should only be responsible for handling SSH connections.
All of this can be overwhelming to start with. However, if you go through the paces and plan well, you’ll find it easy to implement best practices for using Terraform with Vault enterprise.
If you wish to handle public requests, make sure that you have a load balancer in place. Most large providers allow ingress over a secure HTTPS endpoint and even offer tunneling and DDoS protection.
Restricted Storage
One of the bonuses of using Vault Enterprise is that it encrypts all data at rest by default. In addition, Vault is entirely storage agnostic, so it doesn’t matter where you’re ultimately storing your data – Vault will handle it and encrypt it.
However, any attacker that gains access to your infrastructure would still be able to wreak havoc by corrupting the stored data. Mitigate this by applying the principle of least privilege again. Give access to storage only to processes that require it to function.
HA Is Good
Running your vault cluster in HA mode gives you some excellent advantages. Besides ensuring your clusters and storage are being used as efficiently as possible, this also gives protection against outages.
Interestingly, Vault will automatically enable HA mode if your chosen storage provider supports it.
Tokens and Keys
When initializing your vault, you will need to take note of the “Unseal Keys” and “Root Token.” Without these, you will not be able to access your vault.
This raises the usual problem – how do you store these without human interaction? Easily, it turns out. Simply use an application sidecar to immediately capture the values when they are produced.
Choose Your Authentication
By default, Vault uses token authentication. But, of course, that’s not practical in some scenarios. Fortunately, Vault also supports quite a few extra authentication methods.
Vault Policies
Policies in Vault govern how clients are allowed to behave. For example, a common pattern is to use two personas when setting up policies.
The Admin persona manages the vault infrastructure for a team. The provisioner persona configures backends and creates policies.
Policies are written in HCL format. For example, a typical Admin policy will contain capabilities like:
# Manage auth backends broadly across Vault
path “auth/*”
{
capabilities = [“create”, “read”, “update”, “delete”, “list”, “sudo”]
}
# List, create, update, and delete auth backends
path “sys/auth/*”
{
capabilities = [“create”, “read”, “update”, “delete”, “sudo”]
}
# List existing policies
path “sys/policy”
{
capabilities = [“read”]
}
# Create and manage ACL policies broadly across Vault
path “sys/policy/*”
{
capabilities = [“create”, “read”, “update”, “delete”, “list”, “sudo”]
}
# List, create, update, and delete key/value secrets
path “secret/*”
{
capabilities = [“create”, “read”, “update”, “delete”, “list”, “sudo”]
}
# Manage and manage secret backends broadly across Vault.
path “sys/mounts/*”
{
capabilities = [“create”, “read”, “update”, “delete”, “list”, “sudo”]
}
# Read health checks
path “sys/health”
{
capabilities = [“read”, “sudo”]
}
We are creating policies that allow the admin persona to manage broad areas of Vault. Admin is given the create/read/update/delete/list/sudo capabilities in almost all areas.
In contrast, the provisioner persona only needs to be able to mount and manage backends and create/manage ACL policies. So, again, we are applying the least privilege principle.
# Manage auth backends broadly across Vault
path “auth/*”
{
capabilities = [“create”, “read”, “update”, “delete”, “list”, “sudo”]
}
# List, create, update, and delete auth backends
path “sys/auth/*”
{
capabilities = [“create”, “read”, “update”, “delete”, “sudo”]
}
# List existing policies
path “sys/policy”
{
capabilities = [“read”]
}
# Create and manage ACL policies
path “sys/policy/*”
{
capabilities = [“create”, “read”, “update”, “delete”, “list”]
}
# List, create, update, and delete key/value secrets
path “secret/*”
{
capabilities = [“create”, “read”, “update”, “delete”, “list”]
}
Deploying the policies to Vault is simple. For example, you might write the admin policy by:
vault policy write admin admin-policy.hcl
The Application Deployment
Step one – apply the principle of least privilege. When you’re designing your overall system schematics, keep this in mind. There are a few quick wins here:
- Never store credentials in source code
- Always use service accounts where possible
- Segregate traffic from your applications
- Use infrastructure that can be templatized
Credential Storage
Fortunately, there are hundreds of ways to store credentials. For example, Google Cloud gives the ability to attach service accounts directly to compute services.
One part of credential storage that many organizations forget about is state. Remember always to encrypt your state when possible. Even better, never output credentials to state so unauthorized parties can never retrieve them.
Service Accounts
All newly created service account should be unable to perform any tasks. This means that they, by default, apply the principle of least privilege.
This is the default position when creating resources with AWS and many large cloud providers follow the same course. It can be difficult to lock down automation users using least privilege, but it’s worth it to safeguard your deployments.
Segregate Traffic
When you’re deploying something as sensitive as Vault, you should be segregating all traffic. Your vault cluster should be behind its own internal load balancer that only listens on port 443. The cluster should only be directly reachable through accessing your bastion host.
As well as this, you should be keeping traffic from your own internal services segregated. A noisy neighbor could easily result in degradation of service in the vault cluster at a crucial time.
Templatized Infrastructure
If at all possible, all your infrastructure should be as close to the base image as possible. It’s far more secure to let Vault handle bringing credentials down to an instance or application.
Keeping your infrastructure as close to base means you can replace it simply if something goes wrong. Nobody wants to hunt through old conversations for old configuration files.
How Does Terraform Fit In?
Terraform interacts very tightly with Vault. That’s great because deploying anything using Vault becomes a lot easier when you work with Terraform.
You Must Encrypt Your State
There has been an open issue in GitHub for more than seven years related to secrets in state. There is work ongoing to solve the problem but, for now, it’s essential that you use a storage provider that supports encryption.
Almost all the large cloud providers are supported natively by terraform. Many implement encryption as a default. There are numerous guides out there to implement secure terraform state – it’s out of the scope of this article, but they’re worth reading.
Diving Into It
Let’s take a look at a potential architecture for your vault deployment. In this scenario, we’re choosing to have three vault replicas. Two will function as standby while we have one active.
The backend we choose to store our vault credentials is Consul. We will have five instances of Consul.
Our active vault replica will be reachable via Bastion Host. This bastion host will be disabled unless required for maintenance.
There are some Vault-specific terms that you should know ahead of time. They’re fairly close to industry standards, but there are some differences.
Vault Cluster
A collection of Vault processes running a single Vault service. These could be hosted on bare metal, virtual machines, or containers.
Vault Backend Storage
Due to the nature of the information stored in Vault, it requires a persistent storage mechanism. The simplest is local storage though this is strongly discouraged in production.
Availability Zone
AZs are physically segregated areas hosting either all of or a section of a Vault deployment. These should be either in separate data centers, isolated cages, or zones in your Cloud Provider.
Regions
Regions geographically separate Vault deployments. A region may contain multiple Availability Zones.
It’s important to know the difference between AZs and regions. You would never spread a single Vault deployment across multiple regions. You could have multiple Vault deployments in a single region.
Reference Design
The reference design is intended to provide both security and flexibility. It is ideally suited for a production installation.
While Vault is designed to handle failure scenarios, it’s important that you work this into your design process. The recommended number of nodes in a single cluster is three for both Open Source and Enterprise Vault.
However, Enterprise Vault supports more nodes to be provisioned as warm standby or failovers. In addition, Vault Enterprise allows up to seven Consul instances to be used as backend storage.
Ideal Sizing
You can configure vault Enterprise to allow for n-2 redundancy depending on your replication, failover, and regulatory requirements. The ideal number of nodes/instances for Vault and Consul differs depending on the service.
Vault figures out which node is active by passing a data lock. Should a leader be lost, the lock passes to another node that becomes the leader. This means that the ideal size for a Vault cluster is 3 nodes.
In comparison, Consul uses a consensus vote to elect a leader. This means that the ideal size for a Consul cluster is five, as this allows for a quorum.
Availability Zone Best Practices
As mentioned above, Consul relies on a consensus model to elect a leader. This means that a simple majority of Consol instances must be available at any one time.
With a sizing of three instances spread between two availability zones, there is around a 50 percent chance that a single AZ being lost would cause an outage.
Typically, it’s a good idea to spread your nodes across multiple AZs. Latency concerns mean that practically, you should spread your nodes and instances across all the AZs you can.
Regional Redundancy
As we mentioned, Vault Enterprise supports both disaster recovery and performance replication. This means that ultimately, your Vault deployment should be distributed across:
- Three Availability Zones
- Regions via replication
Though there are recommendations for Vault and Consul open source, they are not performance or disaster recovery best practices. However, if you cannot make use of the recommended number of AZs or regions, they will work well enough.
Steps to Deploy
We’re finally ready to look at a deployment strategy. This is purposefully going to be a more generalized approach but the principles hold true.
Step 1: Terraform Creates Vault
In this step, you would use your initial terraform structure to create the vault clusters. This means that the vault is completely empty. It’s a good idea to set up your Consul backends now as well.
Now we have a fully functional vault cluster but no data. Let’s move on to bringing data into Vault.
Step 2: Vault is Populated
How you choose to populate your Vault is up to you. The simplest method to start with is using the Vault CLI.
By default, any Vault CLI commands will appear in your command history, so ensure that you follow the Static Secrets guide. For now, we’ll keep things simple.
In our example, we will use the Key/Value secrets engine. Do not use any of the commands here in a production environment if at all possible.
vault kv put secret/test foo=bar
That’s it. That’s the simplest way to store a secret. So, what did we do in the background? First, we stored a secret in the path secret/test. The secret’s name was “foo” and the value we stored was “bar”.
The Key/Value secrets engine uses simple paths to store secrets. Retrieving your new secret is as simple as:
vault kv get secret/test
If we wish to amend the secret and add a new value, we can also do that fairly simply:
vault kv put secret/test foo=bar fizz=buzz
Now our secret contains two values. You’ll also notice that the version has incremented if you rerun your get command.
Of course, there are many other, more efficient ways to populate your vault with secrets. If you need to fill vault with more than a few simple secrets, you will want to explore other Secrets Engines.
Secrets Engines
Like the key/value secrets engine, each of these either stores, generates, or encrypts secrets. Again, depending on the provider you choose, you will find a plethora of features to make use of.
Popular secrets engines connect to external data sources or generate dynamic credentials on demand.
Step 3: Terraform Pulls Secrets
Now that vault is in place, it’s possible to get secrets by using Terraform. There’s a stable Vault Provider that makes pulling secrets easy.
Once again, make sure not to store secrets in your state file or config.
provider “vault” {
address = “http://127.0.0.1:8200”
auth_login {
path = “auth/aws/login”
method = “aws”
parameters = {
role = “dev-role-iam”
}
}
}
We’re initializing our vault provider. In this case, we’re using AWS Signing to perform authentication.
data “vault_generic_secret” “test-secret” {
path = “secret/test”
}
As you can see, this data source allows us to read data from an arbitrary path. Of course, this is not the most complex example, but it illustrates how easy using Terraform with Vault Enterprise really is.
Hardening Best Practices
Every environment is different, and your needs are not the same as another organization. However, several hardening best practices will make your production Vault deployment more secure and are generally applicable.
Root Be Gone
Services should never be run as root or privileged users if possible. Remember the Principle of Least Privilege? That applies more than ever here.
You should create a dedicated user that has just enough access to run Vault. Vault Enterprise has been created to favor being run by an unprivileged user, so there’s no advantage in using a root user.
Minimal Write
We touched on Service Accounts higher up. It’s important that when you create your service account, you understand that it should have minimal write privileges.
The Vault Service account should not have access to write configurations or binary access. Only directories that are absolutely critical for the user to access should be writable. This includes local storage and audit logs.
TLS Everywhere
There’s been a lot of noise lately about end-to-end encryption. There’s no reason not to adopt this standard within your deployments. In fact, it’s harder to run Vault Enterprise unencrypted.
Specifically, you should use TLS for all communication between all internal components. You should also encrypt traffic coming to a load balancer. This ensures that all traffic in transit to and from Vault is properly secure.
No Swap
This requirement is aimed squarely at anyone wanting to use the integrated storage backend. Inevitably, Vault (or any system process) must sometimes store sensitive information in memory.
However, we absolutely can and should avoid ever storing that sensitive data on local disks. Disabling swap prevents this from happening.
No Core Dumps
Any user that can force a core dump on your host could have access to sensitive information. This is not specifically related to Vault, but there’s a possibility that a user could access Vault’s encryption keys via Core Dump.
Single Focus
The Vault Cluster should be used for Vault alone. We’ve touched on this a little before, but some more detail is required to harden properly.
Any process running on your node could potentially have memory leaks or bugs that you cannot foresee. Reduce your attack surface by ensuring that your nodes are clean and dedicated to Vault.
If possible, running Vault Enterprise on Bare Metal is preferable rather than in a virtualized environment. This forces physical and logical segregation. In the same way, running Vault in a VM is better than running it in a container.
Firewall Everything
Start with a zero access approach to firewalling your Vault Cluster. That means no traffic in or out except local (via a Bastion).
Then, start opening up your firewall until Vault is working properly. A good approach is to look at the design diagram you made earlier (see above: Plan Your Deployment).
If a port, traffic path, or application is not on your original design diagram, question if it needs to be used now. If it does, amend your diagram and update your firewall rules.
Revoke Root Tokens
Once Vault has been set up, you no longer need the root tokens. That may seem counter-intuitive, but remember that you have an incident if anyone gets hold of your root tokens.
You can use Vault Enterprise to generate new Vault Tokens if required. If you’ve followed the guidance to set up policies, all privileges are secured by Vault itself.
Audit Everything
Audit logging should be turned on by default and only disabled if there is a legitimate business case for it. Vault Enterprise supports several audit mechanisms, including file, Syslog, and Socket.
Your audit logs are securely stored in transit and at rest. Once generated, you have a full forensic trail in case of misuse or breakages.
It’s simple to set up audit logging in Vault. Remember to set your log rotation mechanisms as well.
vault audit enable file file_path=/var/log/vault_audit.log
Upgrade
Updating frequently is necessary if you want to keep your Vault deployment secure. Hashicorp’s mailing list is an excellent source for details. However, as with all Enterprise installations, don’t just upgrade blindly.
No Clock Skew
As you can imagine, a system that relies on clocks to generate TTL and certificate validity doesn’t do well with Clock Skew. Imagine a failover scenario where your root certificates seem to have expired because your clocks aren’t in sync.
Running industry-standard solutions such as NTP is enough to keep your node clocks in sync. This is the one exception to the “Single Tenancy” rule.
Never Plain Text
No matter the backend mechanism, security is only as strong as its weakest link. Never store any credential in plain text, whether on your HSM, in your seal stanza, or on disk.
In some circumstances, you may need to store the HSM PIN in an environment variable so that Vault can access it. Again though, apply the Least Privilege Principle and don’t allow arbitrary access to environment variables.
Disable Remote Access
There is no reason for anyone to have ssh or remote desktop to your Vault Cluster. A bastion should always be used if required for some on-host process (see our section about Bastions above).
Vault has a robust API that you should use to perform almost all operations once Vault has been set up. For example, you should never need to login locally to access logs – these should be piped to a central system or repository.
Use systemd
Systemd builds in a lot of security features by default. It makes sense to make use of them in your Vault Deployment. In fact, if you look at the standard unit file provided with Vault, you’ll see some of these in action:
ProtectSystem=full
PrivateTmp=yes
CapabilityBoundingSet=CAP_SYSLOG CAP_IPC_LOCK
You can check out the full specification for systemd if you’re interested in enhancing this. There are many options to choose from.
Side-Along Upgrades
Upgrading is a sensitive subject in most organizations. You want to have zero downtime but also need to be running the latest software for compliance reasons.
The recommended way of upgrading Vault is to bring up new infrastructure in parallel first. Once that’s done, attach the new nodes to your shared storage backends and demote the old nodes.
Be sure to actually destroy your old nodes rather than simply archiving them. They still contain sensitive information and could compromise your security.
Start Using Vault Enterprise
Now that you’ve got a good grasp of the best practices for using Terraform with Vault Enterprise, it’s time to put your knowledge to use. To briefly recap, your road to success will look similar to this:
- Design Diagram and planning
- Storage Backend selection
- Identity plugin selection
- Secrets Deployment
- Application Deployment
We’ve covered all the major approaches to High Availability Vauld deployments. But, of course, your situation may differ, so don’t be afraid to adjust the processes to fit.
Terraform Security is an almost endless process. You should be aware that it’s not going to be solved in a single project. Check out Hasicorp Cloud and sign up for a free trial today.