Study your Data: August 2023

Monday, August 28, 2023

Introduction to IaC: Deploying Data infrastructure to Azure using Terraform (Part 3 - Azure SQL Database)

In Part 1, we explain the basics and prerrequisites to work with Terraform.

In Part 2, we created our first Azure resource.

On this post we will create an Azure SQL Database and configure it so we can access it right away.

Example 2: Deploy and configure an Azure SQL database

To create an Azure SQL database, we need at least the following:

Resource group (we already have it from the previous example)
Azure SQL Server
Azure SQL database
At least one firewall rule

We can obtain code snippets on each resource from the Terraform provider documentation we checked on the previous example: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs

We proceed to add each component on the main.tf file (do not delete the previous code we have there)
At the end we should have something like this (just check resource names and parameters to match what you want.)

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "3.69.0"
    }
  }
}

provider "azurerm" {
  # Configuration options
  features {}
}

resource "azurerm_resource_group" "MyRG" {
  name     = "RG_TF_Tests"
  location = "East US"
  tags = {
    environment = "Azure Resource tests"
  }
}


#Azure SQL server
resource "azurerm_mssql_server" "sqlserver" {
  name                = "eduardopivaral-tf"
  resource_group_name = azurerm_resource_group.MyRG.name
  location            = azurerm_resource_group.MyRG.location
  version             = "12.0"
  # Note that the user and password is plain text, for this case we use variables, 
  # but that is out of scope for this post, we will discuss this on next posts.
  administrator_login          = "Gato"
  administrator_login_password = "-n0meAcuerd0-"

  #we add our account as administrator of the instance
  azuread_administrator {
    login_username = "epivaral@studyyourdata.com"
    object_id      = "<check object ID for the account in Azure Portal>"
  }
}


# Azure SQL Database
resource "azurerm_mssql_database" "sqldb" {
  name                 = "demoData"
  server_id            = azurerm_mssql_server.sqlserver.id
  collation            = "SQL_Latin1_General_CP1_CI_AS"
  sku_name             = "Basic" #we use basic tier
  storage_account_type = "Local" #local redundancy storage
  tags = {
    description = "Part 1 - just the empty resource"
  }
}


#firewall rule to allow us access to it
resource "azurerm_mssql_firewall_rule" "MyLaptopRule" {
  name             = "MyLaptopRule"
  server_id        = azurerm_mssql_server.sqlserver.id
  start_ip_address = "181.209.256.300" #This IP does not exist, you should put your IP or range
  end_ip_address   = "181.209.256.300"
}

We execute Terraform plan:

Terraform plan

Notice how this time three resources will be added, and no action will be taken for the resource group.
We execute terraform apply -auto-approve

Terraform apply -auto-approve

If everything is ok, we should see the process completing successfully:

To validate it, navigate to Azure portal (or using SSMS) and validate you can access the database:

Since we also included a firewall rule for our laptop, we should access without issues:

Destroying our infrastructure

This is a demo and I do not want to incur on additional costs, so I will bring down all the infrastructure. In a real-world scenario probably, you will just remove individual resources instead of everything, but this is how we delete everything:

Terraform destroy

Same as apply, we can use the -auto-approve flag, but I do not recommend it for destroy activities.

If you want to destroy just one resource (like in a real-word scenario), use the -target='resource.name' flag.

You can validate resources that have been removed by checking in Azure portal.

As you noticed, this approach has a lot of information to digest, and IaC is just used to provision the underlying infrastructure in an automated and repeatable way.

To create database objects and populate tables with data, that is another part of the CI/CD pipeline, there are other tools for that, which we can discuss in the next posts.

As a suggested practice, you can try to deploy different resources to other providers like an AWS S3 bucket (we will make another posts for that as well).

Stay in touch for next articles where we will discuss sourcing and Terraform Cloud, and how to secure sensitive data.

Wednesday, August 23, 2023

Introduction to IaC: Deploying Data infrastructure to Azure using Terraform (Part 2 - our first Azure resource)

Continuing our Part 1, we already set up our environment, we can now setup our very first example (do not worry if is too simple at this point, but this is just to understand how it works).

Example 1: Deploy an Azure Resource Group

I think the Azure Resource Group is the most basic Azure resource, so we will verify our setup is ok by deploying one.

First, create your project folder on your local machine, in my case I am using C:\Terraform\Terraform_AZ_example but you can use any path you want.
Then, open that folder in VSCode:

cd C:\Terraform\Terraform_AZ_example
code .

Or use the GUI to open the folder:

Once opened, I like, as a best practice to add this .gitignore file even when you work locally, so sensitive information and Terraform state is not uploaded if you decide source to GitHub:

# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version 
# control as they are data points which are potentially sensitive and subject 
# to change depending on the environment.
*.tfvars
*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc

Let us create a main.tf file:

The first step is to add the Azure provider, you can find the providers at https://registry.terraform.io/
Click on providers and then select Azure:

Click on Use Provider, and then select the code snippet shown:

Copy it to your main.tf, and inside the provider block, add this line of code: features {} as this is required, your code should look like this:

Next step is to add to the end of same script a resource of type Resource Group, as this is the most basic:

resource "azurerm_resource_group" "MyRG" {
  name     = "RG_TF_Tests"
  location = "East US"
}

On this case MyRG is the Terraform identifier for the resource, we will access the resource in the script using that name, and name is the name of the deployed resource into Azure.

For any resource, we can check the provider documentation to find examples and possible parameters we can configure:

Once done, we are ready to deploy our first resource.
Following Terraform lifecycle, we must execute init, for this, open a new terminal into VSCode and run

Terraform init

This will download required plugins into .terraform folder and create some other files that will be out of scope for now.

Next is to validate syntax, so run

Terraform validate

Everything looks ok, so we now need to plan the deployment, so run

Terraform plan

We will see the proposed changes to our infrastructure (new resource group creation)

Note: it is possible to skip the Terraform validate and Terraform plan steps and jump from init to apply if you are confident with the changes you will perform, but I advise you to run those steps every time as a double-check.

Once we are ok with the proposed plan, we can deploy it using

Terraform apply

It will ask for our confirmation to proceed with the changes, type yes. (It is possible to skip this confirmation with the -auto-approve flag):

This step can take some time since the process is done using API calls. So, after some wait, we can confirm the changes were done:

We need to validate, that the resource was properly created, to do that, login to your Azure console and go to resource groups, if everything is ok, you will see the resource there:

What if we want to modify the resource? For example, let us add a tag to the resource. Modify the resource block adding the tags block:

 resource "azurerm_resource_group" "MyRG" {
  name     = "RG_TF_Tests"
  location = "East US"
  tags = {
    environment = "Modification tests"
  }
}

Save the main.tf file.

This time, we will skip directly to Terraform apply using auto approve:

Terraform apply -auto-approve

We can see that the resource was just updated with the new tag.

Validate it on the Azure portal:

On the next example we will create and configure an Azure SQL database inside this resource group.

Part 3 - Azure SQL Database

Tuesday, August 22, 2023

Introduction to IaC: Deploying Data infrastructure to Azure using Terraform (Part 1 - Intro and prerequisites)

Introduction

One integral part of modern Continuous Integration and Continuous Development (CI/CD) pipelines is the ability to create the underlying infrastructure as part of the pipeline in a consistent and repeatable manner.
Infrastructure as Code (IaC) offers this capability, allowing us to automate and deploy multi-cloud infrastructure as definition scripts.

This approach helps us to reduce time invested in provisioning the infrastructure, as well as reducing human errors or misconfigurations. The infrastructure can also be redeployed to other environments knowing that the configuration will be the same each time.
Although Azure offers ARM templates and AWS offers CloudFormation, these are just for its respective cloud provider.

Which option do we have if we need an enterprise-grade solution that supports multi-cloud environments and can integrate with our CI/CD pipelines?

Terraform is a declarative, open source IaC tool that can deploy to multiple cloud providers and offers a great documentation and community support. Is one of the most used IaC tools in the world because of its reliability and ease of use.

It can be used locally in multiple operating systems and using Terraform Cloud. It can be integrated into our CI/CD pipelines , as for example Azure DevOps.

Terraform scripts are created in Hashicorp Configuration Language (HCL) or JSON. For this article we will be using HCL.
Basic terraform files have the .tf extension, usually we need at least one main.tf file to work.
The infrastructure status at any given point is stored in the Terraform State, it helps to keep control of what should be created, updated or destroyed.

The most common objects of Terraform are:

Backend: where the terraform state is stored, can be standard or enhanced. Each type can be stored locally or remotely.
Providers: Target where we want to deploy our infrastructure, it uses each cloud provider’s API and RPC calls to communicate with it.
Modules: Offer simplified ways to deploy some common solutions that enclose multiple resources, it helps reduce coding efforts and offers code reusability, can be developed by official cloud providers, Terraform, or by any individual.
Resource: Object to be deployed, it can have multiple parameters and configuration options. If no optional parameter is provided, the defaults configured in the provider or module are used.

There are a lot more parts of Terraform files and configuration, but that is the basic information we need for now.

Once you have your scripts ready, the usual Terraform lifecycle looks like this (in very simplified way):

We will see how to use each one in our example, but for now, let me briefly explain each one:

Terraform init: Allow us to initialize our project, it downloads the required providers and modules, or updates them to the latest required version.
Terraform validate: Parses the script file and verifies the integrity of the code, it highlights syntax errors or undefined properties or values.
Terraform plan: It compares the current Terraform state, and determines which changes, if any, should be performed. It displays the proposed information for review and can be saved to a plan file.
Terraform apply: It deploys the required changes to the cloud provider and updates the state file with the latest changes. It also allows us to refresh the Terraform state with the latest changes from the cloud provider using the refresh-only option.
Terraform destroy: It allows us to destroy the infrastructure partially or totally. If we run the apply command after a destroy, all resources will be created again.

Terraform concepts and lifecycle are a more complex topic so is not possible to cover everything on this article. I provided just enough information so we can proceed with a simple Azure SQL deployment.

On this article we will use Terraform on our local machine to deploy common Azure data resources. We will start with a very basic script so you can understand the fundamentals on how IaC works with Terraform.

On next articles, we can use VCS like GitHub and use Terraform Cloud to implement versioning and more secure workflow.

Prerequisites

For the most basic deployments we will need the following:

Install Terraform on your local machine or dev workstation. You can find instructions here.

Note: if you are using windows, just add the terraform.exe path to PATH environment variable.

Install the CLI for the provider you want to use, for this example, we will use Azure CLI.

Authenticate to Azure CLI, so you can store your credentials locally: az login

If you do not have it already, install VSCode, we will use it as our IDE.

Install these 2 extensions as it will make your life easier:

Hashicorp Terraform
Hashicorp HCL

Once you have the prerequisites, we are ready to work on our first deployment.

---------------

To make things clear, we will split this into multiple posts, so stay tuned for the rest of the parts.

I will update this post as well as soon as the other parts are released.

Part 2 - our first Azure resource