Wednesday, December 27, 2023

Introduction to IaC: Deploying Data infrastructure to Azure using Terraform (Part 4 - Using Github)

 

In Part 1, we explain the basics and prerrequisites to work with Terraform.

In Part 2, we created our first Azure resource.

In Part 3, we deployed our fist solution (Azure SQL database) using multiple resources.




On this post, we will integrate Terraform to use the Version Control System Git, implemented using GitHub.

Version control is outside the scope of this post, but if you want to learn more about it, you can check these excellent resources:

Introduction to version control with Git

Introduction to GitHub

Automate your workflow with GitHub Actions 

Manage the lifecycle of your projects on GitHub

 

OK, once you have a good idea of what VCS and Github are about, we can integrate it into our Terraform solution and use it as a external repository for our IaC.

For simplicity, we will use the same folder and code from the previous post:

cd C:\Terraform\Terraform_AZ_example
code .


Make sure the .gitignore file is in place, since we will source it and is a good practice to include it even when we not use git, you can obtain the contents of the file from this previous post:



Once we have verified everything is in place, proceed to init Terraform:


 

Then run a Terraform plan:

At this point we have what we need to source it and we have enough files to validate that the .gitignore file works ok.

Sourcing from VSCode using GUI

If you are using VSCode, sourcing your code to Github is really easy. You can do it from the VSCode gui and just select the option Publish to Github, this option is under Source Control menu.



After clicking the option, the default name for the repository will be the folder/project name, you can change it as needed:


The recommended option to select is a private repository, only use public repos if you are sure you want to share your code with the rest of the world.

Note: At this stage, if you are not logged in to Github, a web page will prompt to login.

If everything is ok, you will see a progress message like this while the repo is being created:

Once done, you will see this message and you will be able to see it on GitHub site:

If everything is ok, you should be able to see your project on Github, and some files as the .terraform folder and state files should be properly skipped from the terraform project:




Sourcing using command line

What if you want to do it manually and want to add it to another repository someone else already created, or by company requirements you must use different options from the default?
You can add it from command line as well.


Prerequisite:

Install Git on your computer, you can obtain it from here.

 

Once you have Git installed, the first step, if your code is not sourced yet, is to create a local repository by initializing it:

git init


Then we add all the files in the folder (except for the ones we specify on the .gitignore file), the dot (.) indicates we want all the contents in the folder, you can replace it by specifying individual files if you want:

git add .

Once the files are added, we need to commit them so they can be "checked in" to the repository, it is a best practice to use a descriptive comment for each commit command you execute:

git commit -m "our first commit"

You can create a new repository from GitHub webpage using the New Repository option (or use one already created): 

 

For simplicity we will use HTTPS connection, if you need to use SSH, remember to configure your connection keys previously.


You will need this URL for the next step, as we need to add a remote origin to this repository:

git remote add origin https://github.com/Epivaral/Terraform_AZ_example.git

Note: if any issue occurs (for example a typo or a wrong remote repo), you can remove the remote origin with the command git remote remove origin

Next step is to create the Main branch, as we need at least one branch for the repo to work:

git branch -M main

Last step, is to push the pending commits to the remote origin:

git push -u origin main



You can validate again by browsing the repo in Github and confirm the correct files are there:

Now you can work as usual on your Terraform files and commit/push the changes as needed. 

Remember that if you add a new file that needs to be sourced, use the git add <file> command before the commit/push.

Now you can implement your infrastructure as usual. 

In the next post we will learn how to use our recently sourced repository and integrate into Terraform Cloud, where we will be able to automatically deploy changes when the commits are made.

Monday, August 28, 2023

Introduction to IaC: Deploying Data infrastructure to Azure using Terraform (Part 3 - Azure SQL Database)

In Part 1, we explain the basics and prerrequisites to work with Terraform.

In Part 2, we created our first Azure resource.

On this post we will create an Azure SQL Database and configure it so we can access it right away.

Example 2: Deploy and configure an Azure SQL database

To create an Azure SQL database, we need at least the following:

  • Resource group (we already have it from the previous example)
  • Azure SQL Server
  • Azure SQL database
  • At least one firewall rule

We can obtain code snippets on each resource from the Terraform provider documentation we checked on the previous example: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs


We proceed to add each component on the main.tf file (do not delete the previous code we have there)
At the end we should have something like this (just check resource names and parameters to match what you want.)



terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "3.69.0"
    }
  }
}

provider "azurerm" {
  # Configuration options
  features {}
}

resource "azurerm_resource_group" "MyRG" {
  name     = "RG_TF_Tests"
  location = "East US"
  tags = {
    environment = "Azure Resource tests"
  }
}


#Azure SQL server
resource "azurerm_mssql_server" "sqlserver" {
  name                = "eduardopivaral-tf"
  resource_group_name = azurerm_resource_group.MyRG.name
  location            = azurerm_resource_group.MyRG.location
  version             = "12.0"
  # Note that the user and password is plain text, for this case we use variables, 
  # but that is out of scope for this post, we will discuss this on next posts.
  administrator_login          = "Gato"
  administrator_login_password = "-n0meAcuerd0-"

  #we add our account as administrator of the instance
  azuread_administrator {
    login_username = "epivaral@studyyourdata.com"
    object_id      = "<check object ID for the account in Azure Portal>"
  }
}


# Azure SQL Database
resource "azurerm_mssql_database" "sqldb" {
  name                 = "demoData"
  server_id            = azurerm_mssql_server.sqlserver.id
  collation            = "SQL_Latin1_General_CP1_CI_AS"
  sku_name             = "Basic" #we use basic tier
  storage_account_type = "Local" #local redundancy storage
  tags = {
    description = "Part 1 - just the empty resource"
  }
}


#firewall rule to allow us access to it
resource "azurerm_mssql_firewall_rule" "MyLaptopRule" {
  name             = "MyLaptopRule"
  server_id        = azurerm_mssql_server.sqlserver.id
  start_ip_address = "181.209.256.300" #This IP does not exist, you should put your IP or range
  end_ip_address   = "181.209.256.300"
}

We execute Terraform plan:

Terraform plan

Notice how this time three resources will be added, and no action will be taken for the resource group.
We execute terraform apply -auto-approve

Terraform apply -auto-approve

If everything is ok, we should see the process completing successfully:


To validate it, navigate to Azure portal (or using SSMS) and validate you can access the database:


Since we also included a firewall rule for our laptop, we should access without issues:


Destroying our infrastructure

This is a demo and I do not want to incur on additional costs, so I will bring down all the infrastructure. In a real-world scenario probably, you will just remove individual resources instead of everything, but this is how we delete everything:

Terraform destroy

Same as apply, we can use the -auto-approve flag, but I do not recommend it for destroy activities.


If you want to destroy just one resource (like in a real-word scenario), use the -target='resource.name' flag.

You can validate resources that have been removed by checking in Azure portal.

As you noticed, this approach has a lot of information to digest, and IaC is just used to provision the underlying infrastructure in an automated and repeatable way. 

To create database objects and populate tables with data, that is another part of the CI/CD pipeline, there are other tools for that, which we can discuss in the next posts.

As a suggested practice, you can try to deploy different resources to other providers like an AWS S3 bucket (we will make another posts for that as well).

Stay in touch for next articles where we will discuss sourcing and Terraform Cloud, and how to secure sensitive data.

Wednesday, August 23, 2023

Introduction to IaC: Deploying Data infrastructure to Azure using Terraform (Part 2 - our first Azure resource)

Continuing our Part 1, we already set up our environment, we can now setup our very first example (do not worry if is too simple at this point, but this is just to understand how it works).

Example 1: Deploy an Azure Resource Group

I think the Azure Resource Group is the most basic Azure resource, so we will verify our setup is ok by deploying one.

First, create your project folder on your local machine, in my case I am using C:\Terraform\Terraform_AZ_example but you can use any path you want.
Then, open that folder in VSCode:

cd C:\Terraform\Terraform_AZ_example
code .

Or use the GUI to open the folder:

 

Once opened, I like, as a best practice to add this .gitignore file even when you work locally, so sensitive information and Terraform state is not uploaded if you decide source to GitHub:

# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version 
# control as they are data points which are potentially sensitive and subject 
# to change depending on the environment.
*.tfvars
*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc

 Let us create a main.tf file:


The first step is to add the Azure provider, you can find the providers at https://registry.terraform.io/
Click on providers and then select Azure:


Click on Use Provider, and then select the code snippet shown:


Copy it to your main.tf, and inside the provider block, add this line of code: features {} as this is required, your code should look like this:


Next step is to add to the end of same script a resource of type Resource Group, as this is the most basic:

resource "azurerm_resource_group" "MyRG" {
  name     = "RG_TF_Tests"
  location = "East US"
}
 

On this case MyRG is the Terraform identifier for the resource, we will access the resource in the script using that name, and name is the name of the deployed resource into Azure.

For any resource, we can check the provider documentation to find examples and possible parameters we can configure:


Once done, we are ready to deploy our first resource.
Following Terraform lifecycle, we must execute init, for this, open a new terminal into VSCode and run

Terraform init

This will download required plugins into .terraform folder and create some other files that will be out of scope for now.

Next is to validate syntax, so run

Terraform validate

Everything looks ok, so we now need to plan the deployment, so run

Terraform plan 

We will see the proposed changes to our infrastructure (new resource group creation)

Note: it is possible to skip the Terraform validate and Terraform plan steps and jump from init to apply if you are confident with the changes you will perform, but I advise you to run those steps every time as a double-check.

Once we are ok with the proposed plan, we can deploy it using

Terraform apply

 It will ask for our confirmation to proceed with the changes, type yes. (It is possible to skip this confirmation with the -auto-approve flag):

This step can take some time since the process is done using API calls. So, after some wait, we can confirm the changes were done:

We need to validate, that the resource was properly created, to do that, login to your Azure console and go to resource groups, if everything is ok, you will see the resource there:

What if we want to modify the resource? For example, let us add a tag to the resource. Modify the resource block adding the tags block:

 resource "azurerm_resource_group" "MyRG" {
  name     = "RG_TF_Tests"
  location = "East US"
  tags = {
    environment = "Modification tests"
  }
}

Save the main.tf file.

This time, we will skip directly to Terraform apply using auto approve:

Terraform apply -auto-approve

 We can see that the resource was just updated with the new tag.


Validate it on the Azure portal:

On the next example we will create and configure an Azure SQL database inside this resource group.

Part 3 - Azure SQL Database



Tuesday, August 22, 2023

Introduction to IaC: Deploying Data infrastructure to Azure using Terraform (Part 1 - Intro and prerequisites)

Introduction

One integral part of modern Continuous Integration and Continuous Development (CI/CD) pipelines is the ability to create the underlying infrastructure as part of the pipeline in a consistent and repeatable manner.
Infrastructure as Code (IaC) offers this capability, allowing us to automate and deploy multi-cloud infrastructure as definition scripts.


This approach helps us to reduce time invested in provisioning the infrastructure, as well as reducing human errors or misconfigurations.  The infrastructure can also be redeployed to other environments knowing that the configuration will be the same each time.
Although Azure offers ARM templates and AWS offers CloudFormation, these are just for its respective cloud provider.


Which option do we have if we need an enterprise-grade solution that supports multi-cloud environments and can integrate with our CI/CD pipelines?


Terraform is a declarative, open source IaC tool that can deploy to multiple cloud providers and offers a great documentation and community support. Is one of the most used IaC tools in the world because of its reliability and ease of use.


It can be used locally in multiple operating systems and using Terraform Cloud. It can be integrated into our CI/CD pipelines , as for example Azure DevOps.


Terraform scripts are created in Hashicorp Configuration Language (HCL) or JSON. For this article we will be using HCL.
Basic terraform files have the .tf extension, usually we need at least one main.tf file to work.
The infrastructure status at any given point is stored in the Terraform State, it helps to keep control of what should be created, updated or destroyed.


The most common objects of Terraform are:

Backend: where the terraform state is stored, can be standard or enhanced. Each type can be stored locally or remotely.
Providers: Target where we want to deploy our infrastructure, it uses each cloud provider’s API and RPC calls to communicate with it.
Modules: Offer simplified ways to deploy some common solutions that enclose multiple resources, it helps reduce coding efforts and offers code reusability, can be developed by official cloud providers, Terraform, or by any individual.
Resource: Object to be deployed, it can have multiple parameters and configuration options. If no optional parameter is provided, the defaults configured in the provider or module are used.

There are a lot more parts of Terraform files and configuration, but that is the basic information we need for now.

Once you have your scripts ready, the usual Terraform lifecycle looks like this (in very simplified way):
 

We will see how to use each one in our example, but for now, let me briefly explain each one:

Terraform init: Allow us to initialize our project, it downloads the required providers and modules, or updates them to the latest required version.
Terraform validate: Parses the script file and verifies the integrity of the code, it highlights syntax errors or undefined properties or values.
Terraform plan: It compares the current Terraform state, and determines which changes, if any, should be performed. It displays the proposed information for review and can be saved to a plan file.
Terraform apply: It deploys the required changes to the cloud provider and updates the state file with the latest changes. It also allows us to refresh the Terraform state with the latest changes from the cloud provider using the refresh-only option.
Terraform destroy: It allows us to destroy the infrastructure partially or totally. If we run the apply command after a destroy, all resources will be created again.


Terraform concepts and lifecycle are a more complex topic so is not possible to cover everything on this article. I provided just enough information so we can proceed with a simple Azure SQL deployment.

On this article we will use Terraform on our local machine to deploy common Azure data resources. We will start with a very basic script so you can understand the fundamentals on how IaC works with Terraform.

On next articles, we can use VCS like GitHub and use Terraform Cloud to implement versioning and more secure workflow.

Prerequisites

For the most basic deployments we will need the following:
    • Note: if you are using windows, just add the terraform.exe path to PATH environment variable.
  • Install the CLI for the provider you want to use, for this example, we will use Azure CLI.
  • Authenticate to Azure CLI, so you can store your credentials locally: az login
  • If you do not have it already, install VSCode, we will use it as our IDE.
  • Install these 2 extensions as it will make your life easier:
    • Hashicorp Terraform
    • Hashicorp HCL
 
Once you have the prerequisites, we are ready to work on our first deployment.
 
 
---------------
 
To make things clear, we will split this into multiple posts, so stay tuned for the rest of the parts.
I will update this post as well as soon as the other parts are released.
 
 

Data Engineering Books Worth Having on Your Shelf (or your tablet)

Good documentation gets you started. Good books get you deep. After years of working with cloud data platforms, SQL engines, and m...