Datasource block

Terraform data sources allow you to use information defined outside of the Terraform. A good example is information defined in another separate Terraform configuration. Every provider, such as AWS, Azure, and Google Cloud, supports a set of data sources in addition to the resources types.

Data sources are read-only resources. Terraform uses them only to read information from some external source and will not create, update, or delete infrastructure objects.

Datasource block syntax

To use a data source, you declare a data block, which is a special kind of Terraform resource called a "data resource". The syntax looks like this:

data "data_source_name" "local_name" {
    # Datasource Specific Arguments 
}

Here is a breakdown of syntax:

data is the keyword to declare a data resource

data_source_name is the name of the data source.

local_name is the local name given to the data source by which it is referred in the same Terraform module.

block body (between {}) defines the query constraints and arguments specific to the data source in use.

Working of Datasource Block

The Datasource block instructs Terraform to read information from your infrastructure. This can be about an existing resource, for instance, an AWS EC2 instance, a database, or any resource. Terraform identifies the resource in your infrastructure based on the arguments that you mention in the datasource. Terraform fetched the details of the selected resource, and the fetched information is then exported under a local name specified by you within the data block. You can access this local name to get the data in other resources in the same module. You can use an expression like data.<datasource_name>.<local_name>.<attribute> in another resource.

The Datasource has no scope outside the module in which it is defined. Make sure the local name you choose for the data source is unique within the module. It will be very important to avoid conflict or unspecified behavior if another resource refers to this data source.

Example: Using a Datasource Block to Retrieve an AWS EC2 Instance

For instance, consider fetching details of an already existing AWS EC2 instance using Terraform. You can do this by using a Datasource block.

data "aws_instances" "test" {
    lifecycle = ["running"]
    instance_tags {
        Env = "Production"
    }
    filter {
        availability-zone = "1A"
    }
}

In this example,

aws_instances datasource fetches details from one or more AWS EC2 instances.

The Datasource block has been named as test

instance_state_names argument specifies that we're interested in instances that are in the running state.

instance_tags argument specifies that we are interested in instances with a tag named Env whose value is Production.

filter block can further allow filtering on the output and here we filter on Availability Zone 1A.

When Terraform executes this configuration, it makes a request on the AWS API to list all the EC2 instances that match the above filters. The AWS API returns the list of instances that matched against those filters, and Terraform will store the retrieved data in the "test" datasource.

Once the data is retrieved, you can then reference the data within your Terraform configuration to build resources dependent upon the EC2 instance. Here's an example:

resource "aws_eip" "eips" {
    count = length(data.aws_instances.test.ids)
    instance = data.aws_instances.test.ids[count.index]
}

In the above example, Terraform will create one aws_eip resource for each EC2 instance ID returned by the "test" datasource. The count argument is set to the length of the ids attribute of the "test" datasource, so an aws_eip resource will be created for each instance ID in the list. The instance argument for each aws_eip resource is set from the corresponding instance ID in the "test" datasource, using the count.index variable to access the right element in the list.

When is a Data Resource Resolved?

When applying your configuration, Terraform tries to read Data Resources during the planning phase whenever possible. In some cases Terraform may need to delay the reading of the Data Resource to the apply phase, this is to ensure all of the information is available prior to making the API call to the infrastructure, and also to keep Terraform behaving correctly.

When Terraform Defers Reading Data Resources?

Terraform defers, or delays, reading Data Resources in two circumstances:

1. Indirect dependence using managed resource attributes:

When a Data Resource uses an attribute from a managed resource, Terraform will defer the run of the Data Resource to a read in the apply phase due to the fact that first the managed resource needs to be created or updated before Terraform is able to access its attributes.

resource "aws_s3_bucket" "s3_bucket" {
    bucket = "my-bucket"
}
data "aws_s3_bucket" "s3_data" {
    bucket = aws_s3_bucket.s3_bucket.id
}

In this example, the Data Resource my_bucket_data depends upon the id attribute from the managed resource my_bucket. Terraform will delay the reading of my_bucket_data until my_bucket has been created and its id attribute is available.

2. Direct dependence using the depends_on meta-argument:

When a Data Resource depends on a managed resource via the depends_on meta-argument, Terraform will also delay reading the Data Resource until the apply phase.

resource "aws_instance" "my_instance" {
    provider = aws.europeregion
    ami = "ami-0123456789"
    instance_type = "t2.micro"
    tags = {
        Name = "my-ec2-instance"
    }
}
data "aws_instance" "my_instance_data" {
    instance_state_names = ["running"]
    instance_tags = {
        Name = "my-ec2-instance"
    }
    depends_on = ["aws_instance.my_instance"]
}

In the above example, the Data Resource my_instance_data depends on the managed resource my_instance by using the depends_on meta-argument. Terraform will defer the reading of my_instance_data until the creation or update of my_instance.

Local-only Data Sources

While many of the data sources in Terraform interact with external infrastructure objects through remote network APIs, there are some more specialized data sources that operate entirely within Terraform itself, often referred to as local-only datasources. These local-only data sources perform calculations and return values which can be used elsewhere in the same module of your Terraform configuration.

The behaviour for this local-only data sources is the same as all other data source in Terraform. However unlike all other data sources, there is a key difference with local-only data sources. The data that is provided by local-only data sources exists only in a temporary manner during a Terraform operation. This means the data gets recalculated every time you run terraform plan or terraform apply. The data is not persisted across Terraform invocations, it gets recalculated every time it is needed.

Non-local data sources, on the other hand, interact with some external infrastructure object, and their data are persistent. That is, unless there has been no change on the underlying infrastructure, data will be constant in every run of terraform plan or terraform apply.

Examples of local-only datasource include rendering templates, reading local files, rendering AWS IAM policies.

Example 1: template_file data source

data "template_file" "example" {
    template = "ls -al $${location}"
    vars = {
        location = "/var/logs"
    }
}
resource "terraform_data" "provisioner_wrapper" {
    provisioner "local-exec" {
        command = data.template_file.example.rendered
    }
}

Here, the template_file data source is used to render a minimal template containing one variable. The template is defined as ls -al ${location}, while the variable location is set to be /var/logs. The rendered template will be ls -al /var/logs. The result of the template_file data source is then used in a local-exec provisioner, running the command in the local machine: ls -al /var/logs.

Example 2: local_file data source

data "local_file" "local_data" {
    filename = "/home/ubuntu/content.txt"
}
resource "aws_s3_object" "zip" {
    bucket = local-data-bucket
    key = local
    content = data.local_file.example.content
}

Here, the local_file data source is used to read the contents of a local file /home/ubuntu/content.txt. Then, the output of this datasource is used to upload the content of the file to S3 bucket.

Example 3: aws_iam_policy_document data source

data "aws_iam_policy_document" "policy_document" {
    statement = {
        effect = "Allow"
        actions = ["s3:ListAllMyBuckets", "s3:GetBucketLocation"]
        resources = ["arn:aws:s3:::*"]
    }
}
resource "aws_iam_policy" "my-policy" {
    name = ""my-policy"
    policy = data.aws_iam_policy_document.example.json
}

The above example uses the aws_iam_policy_document data source to generate an AWS IAM policy document in JSON format. This policy document gives permissions to perform actions of s3:ListAllMyBuckets and s3:GetBucketLocation on all S3 buckets (arn:aws:s3:::). Finally, the output is passed to an aws_iam_policy resource named my-policy.

Meta Arguments of Data Source

In Terraform, a meta-argument is a special kind of argument that can be used to customize the behavior of a resource block, Module block, and Data block.

Meta-arguments are the way to give you more control over how Terraform creates, updates, and destroys your infrastructure. Using them, you will be able to build complex and flexible configurations of the infrastructure.

Types of Meta Arguments Resource

Terraform provides several meta-arguments that can be used to customise Data source resource behaviour. The meta-arguments used are as follows:

depends_on

count

for_each

provider

lifecycle (With some Exception)

depends_on Meta-Argument

depends_on is a metaargument in Terraform that explicitly specifies dependencies of resources. It is used where Terraform cannot, by itself, automatically infer the dependency between resources.

Why do I need depends_on?

Terraform does an outstanding job automatically inferring dependency relationships between resources via their access of each other's data. That is to say: if resource B uses resource A's output to configure itself, terraform will automatically create A before B.

However, there are scenarios when one resource depends on the behaviour of another resource, where it does not use any data of that other resource in its configuration. In such cases, Terraform can't automatically compute the dependency between the two resources. That's where depends_on comes in place.

When would I use depends_on?

Now, suppose you have two resources, A and B. Resource B depends on the behaviour of resource A, but it doesn't, in fact, use any of the data or output from resource A in its configuration. That is to say, B requires A to have been created, or otherwise configured, prior to its own creation/configuration, but it does not reference A's data in any way.

In such a case, you will want to use depends_on to tell Terraform that B depends on A's behavior. This says, in effect, even though B did not use any data from A, Terraform needs to make sure A is created before B is created.

Example: Creating an EC2 Instance and Fetching its Data

data "aws_instance" "my_instance_data" {
    instance_state_names = ["running"]
    vpc_id = aws_vpc.example.id
    instance_tags {
        Name = "my-ec2-instance"
    }
    depends_on = ["aws_instance.my_instance"]
}

resource "aws_instance" "my_instance" {
    ami = "ami-0123456789"
    instance_type = "t2.micro"
    tags = {
        Name = "my-ec2-instance"
    }
    depends_on = [aws_security_group.my_sg]
}

In the above example, the depends_on meta-argument is used to create a dependency between the my_instance resource and the my_instance_data data resource. This ensures that Terraform creates the EC2 instance before it fetches its data. If the my_instance resource fails to create, the my_instance_data data resource will not be executed.

The Count Meta-Argument

In Terraform, when you define a data resource block, it usually configures only one infrastructure object. Sometimes you need to read multiples of similar objects, say, for instance, a fixed pool of compute instances, without writing separate blocks for each.

For this purpose, Terraform provides two meta-arguments: count and for_each. In this explanation, we will explain the count meta-argument.

How it works - Count?

Adding a count argument to a data resource informs Terraform to read several instances of this resource. The value of count must be a whole number and indicates how many instances to read. Each instance has an associated infrastructure object, and each is read separately as this configuration is applied.

Here is an example using count on an AWS EC2 instance data resource:

data "aws_instance" "read_ec2_instances" {
    count = 4
    instance_state_names = ["running"]
}

The count Object

When using the count meta-argument with a data resource, Terraform also exposes an additional object, count, in expressions. This object can be used to customize the configuration of each instance that the count meta-argument reads.

The count object has only one attribute:

count.index: This attribute returns the distinct index number starting with 0 for each instance that the count meta-argument reads.

For instance, assume that, earlier in your infrastructure, you declared a resource block which utilized the count meta-argument to create three instances of an identical compute instance.

resource "aws_instance" "server" {
    count = 3
    ami = "ami-0123456789"
    instance_type = "t2.micro"
    tags = {
        Name = "Ec2-${count.index}"
    }
}

Now, you want to read information about these instances using the tags as the filter. You do this by creating a data resource that will read information about the instances with the use of tags as filters.

data "aws_instance" "server_data" {
    count = 3
    instance_tags = {
        Name = "Ec2-${count.index}"
    }
}

Using count and count.index in both resource and data blocks, Terraform provisions multiple instances with the same resource and fetches information about those instances using the same tagging scheme.

Referencing Instances of count

In Terraform, the count argument is used to read a data resource multiple times. Terraform identifies multiple data resources by assigning an index number for each instance, relying on the starting index 0.

Resources can be referred in Terraform in the following two ways:

Entire Data Resource Block: To refer to the entire data block, it should be in the format data.<TYPE>.<NAME>. For instance, data.aws_instance.server_data refers to the whole data block.

Individual Instances: An individual instance is referenced by adding the index number in square brackets. Syntax is data.<TYPE>.<NAME>[<INDEX>]. For example, data.aws_instance.server_data[0] refers to the first instance, while data.aws_instance.server_data[1] refers to the second, and so on.

The for_each Meta-Argument

The for_each metaargument is a way to read multiple instances of a data resource block according to a collect of values. If a data resource block includes a for_each argument whose value should be either a map or a set of strings, then Terraform will read an instance for each member of that map or set.

Each instance is handled independently, with its own infrastructural object associated with it. This means each instance will be independently read when applying/planning your Terraform configuration.

NOTE: A given data resource block can't use the count and for_each at the same time.

Map Example

In the following example, we will use the for_each meta-argument with a map to read AWS S3 buckets. We will then define a collected set of buckets operating under different names and regions, and Terraform will simply read the buckets for each of them.

data "aws_s3_bucket" "buckets" {
    bucket = each.key
    region = each.value
    for_each = tomap ({
        "my-bucket-1" = "us-east-1"
        "my-bucket-2" = "us-west-2"
        "my-bucket-3" = "eu-east-1"
    })
}

The tomap() here is a Terraform function that converts a list of key-value pairs into a map. The for_each meta-argument will step through the map, reading one instance of the aws_s3_bucket resource for each key-value pair. The each.key and each.value expressions are used to access the key and value of each pair, respectively.

Set Example

Here is an example where for_each is used on a set of strings to create many AWS RDS:

data "aws_db_instance" "database" {
    for_each = toset ([ "database-1", "database-2", "database-3" )]
    allocated_storage = each.key
    engine = "mysql"
    instance_class = "db.t2.micro"
    db_name = each.key
    username = "myuser"
    password = "mypassword"
}

Here, toset() is a Terraform function that takes a list of strings in and returns a set. The for_each meta-argument will iterate over that set and read one instance of the aws_db_instance resource for every string in that set. Finally, the each.key expression is used to access the string value of each item of that set using the db_names variable as the database name.

The each Object

When you use the for_each inside of a Terraform block, you get this special object called each inside that block. So, it can be used to modify the configuration of each instance.

Each object has only two attribute:

each.key: The identifier of each entity within the collection. If you are working with a map, that is a collection of key-value pairs, then each.key will return the key in the pair. If you're working with a set, meaning a collection of just unique values, then each.key yields the actual value.

each.value: This is the value associated with your key in the map. In case of a set, each.value is same as each.key as sets do not have key-value pairs.

Following is an example using Data resource of AWS EC2:

data "aws_instance" "data_instance" {
    for_each = tomap ({
      "dev" = "t2.micro"
      "prod" = "t2.micro"
    })
    instance_state_names = ["running"]
    instance_tags = {
        Env = each.key
        Instance_type = each.value
    }
}

In the above example, Terraform will read two instances of the aws_instance resource: one for "dev" and one for "prod". Within this block, the each object is available, whose properties can be used to configure each instance individually.

In the "dev" instance, each.key would be "dev" and each.value would be "t2.micro". Thus, the instance_type attribute is to be set to "t2.micro", and the tags.Name is to be "Instance dev with type t2.micro". Wherein, for an instance "prod", each.key will be "prod" and each.value will be "t2.large". So, instance_type is set to "t2.large" and tags.Name will set to "Instance prod with type t2.large".

Referencing Resource Blocks and Instances of for_each

When you attach a for_each to a Terraform block, Terraform will read several repetitions of that data resource. In order to distinguish between them, Terraform uses the map key (or set of members) from the value provided to for_each.

There are two ways to reference resources in Terraform:

Entire Data Resource Block: If you want to refer to the block itself, you can use the following syntax data.<TYPE>.<NAME>. Example - data.aws_s3_bucket.bucket will refer to the whole data block.

Individual Instances: In order to address individual Instances, you need to add the key of the map or the set member respectively to the reference. The syntax is the following: data.<TYPE>.<NAME>[<KEY>]. Example: data.aws_ami.ami["ubuntu"] refers to the first instance and so forth.

Chaining for_each Between Resources

The for_each meta-argument is a powerful feature in Terraform. It allows one to read resources multiple times, based on a map or sets of values. When using resource, the for_each returns a map of objects where each key represents an individual instance and the respective value represents the attributes of that instance.

You could establish one-to-one relationships among resources by using the output from one for_each block as the input for another for_each block. This approach is referred to as "Chaining for_each between resources."

Example: Reading EC2 Instances and Elastic IP Addresses

Consider a scenario in which we need to read multiple EC2 instances and assign an Elastic IP address uniquely to each of the read instances. One can use for_each for reading EC2 instances, then make the output of that block as an input for another for_each block creating Elastic IP addresses.

data "aws_instance" "data_instance" {
    for_each = tomap ({
      "t2.micro" = {
          "ami" = "ami-abc123"
          "instance_type" = "t2-micro"
      },
      "t2.small" = {
        "ami" = "ami-def456"
        "instance_type" = "t2-small"
    }
    })
    ami = each.value.ami
    instance_type = each.value.instance_type
    instance_tags = {
        ami = each.value.ami
        Instance_type = each.value.instance_type
    }
}

The above code creates a map of objects that contain two keys: "t2.micro" and "t2.small". Each of these keys here represents an individual EC2 instance, and the corresponding value represents the attributes of that instance.

The output of the above block is a map of objects, which looks like this:

{ 
"t2.micro" = {
    "arn" = "arn:aws:ec2:ap-southeast-2:accountid:instance/i-id"
    "id" = "i-04f36f59dc445df86115ec"
    "availability_zone" = "ap-southeast-2a"
    "cpu_core_count" = 1
    "cpu_threads_per_core" = 2
    "ebs_block_device" = []
    "ami" = "ami-abc123"
    "instance_type" = "t2.micro"
},

"t2.small" = {
    "arn" = "arn:aws:ec2:us-east-1:accountid:instance/i-id"
    "id" = "i-8738673ddv3t381h63y765"
    "availability_zone" = "ap-southeast-2b"
    "cpu_core_count" = 1
    "cpu_threads_per_core" = 2
    "ebs_block_device" = []
    "ami" = "ami-def456"
    "instance_type" = "t2.small"
}
}

Now we can use this output to be the input for another for_each block that creates the Elastic IP addresses.

resource "aws_instance" "instance" {
    for_each = aws_instance.instance
    instance_type = each.value.id
}

Here, the for_each block is getting its input from the output of the previous block where each value ID refers to the IDs of each EC2 instance read by the previously block, and an AWS EIP resource will create an Elastic IP address for each one.

Limitations on values used in for_each

1. Values to be known in advance

When using for_each with a map or set of values, the keys of the map, or all the values in case of a set of strings, must be known values. That is, values present in for_each can't be empty during the Terraform configuration apply phase. If you try to use unknown values, you get an error message indicating that for_each has dependencies that cannot be determined before apply.

2. Sensitive Values Not Allowed

Sensitive input variables, sensitive outputs, sensitive resource attributes are particularly sensitive values that cannot be used as an argument for for_each. The value used in for_each is to identify the resource instance and will always be disclosed in UI output during plan which can potentially reveal sensitive information. Using sensitive values as an argument to for_each will result in an error.

The provider Meta-Argument

The provider meta-argument in Terraform is special and is used to choose which provider configuration to use with a particular data resource. It overrides Terraform's default behavior, which looks for a provider configuration with a matching resource type name.

By default, Terraform will automatically select a provider configuration based on the name of the resource type. For instance, if you declare the aws_instance data resource, it will automatically select the default AWS provider configuration. However, in some cases, you may wish to use a different provider configuration.

The provider meta-argument is available to set explicitly which provider configuration is used for a data resource. It takes the form of adding the provider name and alias (when using multiple configurations of the one provider type) to the data resource declaration.

Provider Syntax

The syntax for using the provider meta-argument is:

<PROVIDER>.<ALIAS>

<PROVIDER>: This is the name of provider such as "google", "aws", or "azure".

<ALIAS>: is the name of the configuration or alias for this provider. This is an optional part of this syntax. It is utilized to differentiate between configurations for the same provider, such as us-east1, europe, staging, and so on.

Example

Suppose you have two configurations of provider for AWS Cloud:

First, there is the default AWS configuration, with its region set to "us-east-1".

The other configuration would be an alternative to AWS with an alias named "europeregion" and a region set to "eu-west-1".

  provider "aws" {
      region = "us-east-1"
  }
  provider "aws" {
      alias = "europeregion"
      region = "eu-west-1"
  }

Now you want to read an AWS instance, using the alternative "europeregion" configuration and not the default configuration. You can achieve that by adding the provider meta-argument to the aws_instance resource declaration:

data "aws_instance" "data_ec2_instance" {
    provider = aws.europeregion
    instance_id = "i-instance0123456789"
    filter = "t2.micro"
    filter = {
        name = "image-id"
        values = ["ami-123456789"]
    }
}

Here, the provider meta-argument has been used to specify that the aws_instance data resource should use this "europeregion" configuration instead of using the default configuration. In other words, the EC2 instance will be read in the "eu-west-1" region as defined in "europeregion".

The Lifecycle Meta-Argument

The lifecycle block in a DataSource supports only the postcondition and precondition blocks of custom validation. However, it does not support the create_before_destroy, prevent_destroy, ignore_changes attributes, which are used in Resource lifecycles.