Tuesday, April 28, 2026

How to Find Expensive Queries in Amazon Redshift

Slow-running queries can degrade your Redshift cluster’s performance and lead to increased costs. Identifying the most expensive queries is crucial to optimize resource usage and improve overall system efficiency.

Step-by-Step Guide

  1. Connect to your Redshift cluster.
    Use your preferred SQL client or the Redshift Query Editor to establish a connection with your cluster.

  2. Query the stl_query system table for the most resource-intensive queries.
    The stl_query table logs all completed queries, including their runtime metrics. Use the following SQL to retrieve the top 10 queries with the longest execution time over the last 24 hours:
    SELECT query, userid, label, starttime, endtime,
           DATEDIFF(seconds, starttime, endtime) AS elapsed_seconds
    FROM stl_query
    WHERE starttime >= GETDATE() - INTERVAL '1 day'
    ORDER BY elapsed_seconds DESC
    LIMIT 10;
        

  3. Retrieve the SQL text of the expensive queries.
    Use the stl_querytext table to get the text of the queries identified:
    SELECT text
    FROM stl_querytext
    WHERE query = <query_id>
    ORDER BY sequence;
        

    Replace <query_id> with the actual query ID from the previous step to analyze the query text for possible optimizations.


  4. Focus your optimization efforts.
    Look for joins without indexes, large data scans, or missing filters and consider rewriting or adjusting these queries for better performance.

Conclusion

Tracking and analyzing expensive queries using Redshift’s system tables is a simple yet powerful way to maintain a healthy and efficient data warehouse environment. Regular monitoring helps reduce costs and speeds up analytics workflows.

For further reading, visit the official Amazon Redshift system tables documentation: Amazon Redshift System Tables

Saturday, April 20, 2024

Cloud Databases Made Simple: Creating Your First AWS RDS Instance


I recently started full time learning of Amazon Web Services (AWS). I found that AWS's Relational Database Service (RDS) is really user-friendly. 

RDS simplifies many of the complicated aspects of database setup, operation, and scaling. Now that I've gone through the process myself, I'm here to walk you through setting up your first AWS RDS instance. 

Let's break it down into simple steps so you can get your cloud database up and running in no time.

 

What is RDS?

AWS RDS (Relational Database Services) is a managed relational database service, where you can choose multiple database engines (MySQL, PostgreSQL, MariaDB, SQL Server, Oracle, DB2)

Since it is managed, with a couple of configuration options, you are ready to go and start using it on your application.

It is highly scalable, so you do not need to know how much resources to provision upfront, as you can scale it as your storage or computing requirements increases.

It offers automated backups, point in time recovery and multi Availability Zone so you can have DR and HA ready to go.

It is highly secure as you can encrypt data at rest and in transit. With the proper configuration of VPC, you can isolate it at network level. Also with the help of Guarduty you have threat detection feature available.

Common use cases include: Web applications, E-Commerce, Blog and news sites, and any other general-purpose application requiring a reliable, globally available, and scalable database.


Prerequisites

 To use it, you will need an AWS account,  if you don't have one you can create one here.

Basic AWS knowledge is required as you need to understand what IAM is and how VPC works.
also some prior database knowledge is required as we will assume you know about the DB engine you are creating.

Once you have your account created, there are multiple ways to create your database: Web console, AWS CLI, IaC, CI/CD pipelines, etc.

For this post, we will use AWS console, you can access it here 
 
Note: It is recommended to create another IAM user different from the root account and enable MFA. Root account should not be used for day to day work and implementations.


Which engine to choose?

RDS support multiple database engines, so it depends on multiple factors: your needs, your knowledge and your budget, In all cases if you need a migration from on premises, then you can choose the same engine to make the migration easier.

MySQL and PostgreSQL are open source databases, and if needed you can jump to RDS aurora if you need a more scalable and faster database.

SQL Server, Oracle and DB2 are proprietary databases, so it could be more expensive, but if you have previous knowledge on those engines, or you have regulatory constraints on not use open source technologies, you can opt for those.

MariaDB is an open source database based on MySQL, so it could be a modern approach if your application will be developed from scratch.

Aurora is the AWS implementation for RDS, it can perform better compared to other engines. It is offered in two modes: MySQL and PostgreSQL, since is optimized for AWS can perform and escalate better than their raw counterparts, it also offers unique features like Global Database and Zero-ETL. One drawback (at the moment of the writing of this post) is that is not available in free tier.

Since we are learning and we do not want to incur on high costs, for this example we can use MySQL or PostgreSQL (not Aurora).

 

Free tier

 RDS offers a free tier during the first 12 months after your account creation with the following restrictions:
  • 750 Hours of Amazon RDS Single-AZ db.t2.micro, db.t3.micro, and db.t4g.micro instances usage running MySQL, MariaDB, PostgreSQL
  • 20 GB of General Purpose (SSD) database storage
  • 20 GB of storage for database backups and DB Snapshots

Any additional cost will be charged at standard rate.

You can always forecast your costs using AWS pricing calculator.


Creating our first AWS RDS Instance


In the AWS console, in the search bar, type RDS, then select the RDS service.


 
On the new page, click on Create Database:
 

 

 

I will guide through all options available, so choose Standard Create.
Then select the engine you want, for our example we will use MySQL.



 

Select the Edition and the required version, in most cases you will select the latest version available.


 

For Template, we will select the Free Tier.

You can see that the deployments options are locked to single DB instance, if you want to configure a highly available database, you need to select a Dev/Test or Production template.


 

For the settings, put the instance name you want (note, this instance name can contain multiple databases)

Then select the options for the credentials you want. It is advisable to use AWS Secret manager, but we do not want to incur in additional costs, so we will use a manual password.



For Instance configuration, you select the size of your instance based on the expected workload, note that for free tier the options available are limited.

You can learn more about instance types here.



For storage options, you put the expected values for your db size, notice that for free tier we must select gp2 type.

We have disabled storage auto scaling so we do not incur on additional costs by mistake.

If you want to learn more about RDS storage configuration, you can check here.



In Connectivity, select the parameters you need if you need a connection with an EC2 instance, and the VPC you want to use. 

I previously created a VPC for my database but you can use the default or another one you want.

Note that you cannot change your VPC later.



Choose the DB sub-net group to use, or create a new one.

In public access, i will select Yes so I can show you how to connect from my local machine, but is highly advisable to not grant public access to your database and use VPC and security groups instead to connect to it.

We will not use RDS proxy to maintain our costs low, basically RDS proxy is used to connect client applications to the database in a high available and resilient proxy managed by AWS.



Configure the CA to use and the database port.

We will leave the default for our example.



We will not enable enhanced monitoring for our example.

Then select the database name you need, or you can leave it blank and create a new database later.

we will use the default parameter group and option group, the parameter group is where the instance level options are configured for each DB engine, those parameters are engine-specific.

We also enable Automated backups since we have 20 GB available on the free tier.



We configure some additional backup options as well as encryption (always recommended).



For log exports, in production environments, I recommend to select all, for non prod or test environments, is up to you which logs to capture.

Note that CloudWatch charges you based on the storage needed to save the logs.



Once everything is configured, we will see the estimated costs based on our selections, you can go back at this point and change anything you need.

You can see with the options we selected, we are under the free tier.

Once you are happy with your configuration click on Create Database.



The provision on your resource will take some minutes or more based on the configuration options you choose. 

For large setups, with clustering, replication and large storage, it could take a considerable time to complete.



Once your instance is ready, you can access it under Databases. 

You will see it as status Available if is ready to use.




Connecting to our instance

There are multiple options to connect to the instance and start using it, but the easiest mode is using the connection endpoint and port.

Since we configure our instance to be public, we can connect from my local machine.

You can obtain the endpoint and port in the Connectivity & Setup tab:



To connect to your instance, it depends on the engine you choose, but since I am using MySQL, I will use MySQL Workbench, it is free and you can download it from here.

Other options include Azure Data Studio or VSCode with its proper extensions.

In MySQL Workbench, create a new connection:


For Connection Name, put a descriptive name.

Then put the endpoint as host name and the port you configured.

for the initial connection, put the username and password we configured earlier, but is advisable to create another user right after.

Click on Test connection, if you can reach the instance, you will be asked for password.

NOTE: If you cannot reach your instance, please check your VPC setup.


 

If everything is ok, the connection test will be succesful:



You can see the DB we created earlier, now you are good to go, we can test creating a simple table:




We have validated our access is ok.

We are good to start working inside our new instance:



Dropping the instance

To make sure you don't leave the DB running and start seeing charges in 12 months, you can delete the database using the Delete option in AWS console.

Please don't do this for prod instances, this is just in case you are testing and experimenting with RDS.




What to expect next?

That's it—your first AWS RDS instance is up and running! You've tackled the basics of setting up a RDS instance, and now you're ready to explore further.

As you continue your journey with AWS RDS, remember that learning is ongoing. Keep experimenting, asking questions, and seeking out resources to deepen your understanding.

Thank you for reading this long post, and please keep in touch, as in the next posts we will explore about monitoring, performance and troubleshooting common issues.

 

Wednesday, December 27, 2023

Introduction to IaC: Deploying Data infrastructure to Azure using Terraform (Part 4 - Using Github)

 

In Part 1, we explain the basics and prerrequisites to work with Terraform.

In Part 2, we created our first Azure resource.

In Part 3, we deployed our fist solution (Azure SQL database) using multiple resources.




On this post, we will integrate Terraform to use the Version Control System Git, implemented using GitHub.

Version control is outside the scope of this post, but if you want to learn more about it, you can check these excellent resources:

Introduction to version control with Git

Introduction to GitHub

Automate your workflow with GitHub Actions 

Manage the lifecycle of your projects on GitHub

 

OK, once you have a good idea of what VCS and Github are about, we can integrate it into our Terraform solution and use it as a external repository for our IaC.

For simplicity, we will use the same folder and code from the previous post:

cd C:\Terraform\Terraform_AZ_example
code .


Make sure the .gitignore file is in place, since we will source it and is a good practice to include it even when we not use git, you can obtain the contents of the file from this previous post:



Once we have verified everything is in place, proceed to init Terraform:


 

Then run a Terraform plan:

At this point we have what we need to source it and we have enough files to validate that the .gitignore file works ok.

Sourcing from VSCode using GUI

If you are using VSCode, sourcing your code to Github is really easy. You can do it from the VSCode gui and just select the option Publish to Github, this option is under Source Control menu.



After clicking the option, the default name for the repository will be the folder/project name, you can change it as needed:


The recommended option to select is a private repository, only use public repos if you are sure you want to share your code with the rest of the world.

Note: At this stage, if you are not logged in to Github, a web page will prompt to login.

If everything is ok, you will see a progress message like this while the repo is being created:

Once done, you will see this message and you will be able to see it on GitHub site:

If everything is ok, you should be able to see your project on Github, and some files as the .terraform folder and state files should be properly skipped from the terraform project:




Sourcing using command line

What if you want to do it manually and want to add it to another repository someone else already created, or by company requirements you must use different options from the default?
You can add it from command line as well.


Prerequisite:

Install Git on your computer, you can obtain it from here.

 

Once you have Git installed, the first step, if your code is not sourced yet, is to create a local repository by initializing it:

git init


Then we add all the files in the folder (except for the ones we specify on the .gitignore file), the dot (.) indicates we want all the contents in the folder, you can replace it by specifying individual files if you want:

git add .

Once the files are added, we need to commit them so they can be "checked in" to the repository, it is a best practice to use a descriptive comment for each commit command you execute:

git commit -m "our first commit"

You can create a new repository from GitHub webpage using the New Repository option (or use one already created): 

 

For simplicity we will use HTTPS connection, if you need to use SSH, remember to configure your connection keys previously.


You will need this URL for the next step, as we need to add a remote origin to this repository:

git remote add origin https://github.com/Epivaral/Terraform_AZ_example.git

Note: if any issue occurs (for example a typo or a wrong remote repo), you can remove the remote origin with the command git remote remove origin

Next step is to create the Main branch, as we need at least one branch for the repo to work:

git branch -M main

Last step, is to push the pending commits to the remote origin:

git push -u origin main



You can validate again by browsing the repo in Github and confirm the correct files are there:

Now you can work as usual on your Terraform files and commit/push the changes as needed. 

Remember that if you add a new file that needs to be sourced, use the git add <file> command before the commit/push.

Now you can implement your infrastructure as usual. 

In the next post we will learn how to use our recently sourced repository and integrate into Terraform Cloud, where we will be able to automatically deploy changes when the commits are made.

Monday, August 28, 2023

Introduction to IaC: Deploying Data infrastructure to Azure using Terraform (Part 3 - Azure SQL Database)

In Part 1, we explain the basics and prerrequisites to work with Terraform.

In Part 2, we created our first Azure resource.

On this post we will create an Azure SQL Database and configure it so we can access it right away.

Example 2: Deploy and configure an Azure SQL database

To create an Azure SQL database, we need at least the following:

  • Resource group (we already have it from the previous example)
  • Azure SQL Server
  • Azure SQL database
  • At least one firewall rule

We can obtain code snippets on each resource from the Terraform provider documentation we checked on the previous example: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs


We proceed to add each component on the main.tf file (do not delete the previous code we have there)
At the end we should have something like this (just check resource names and parameters to match what you want.)



terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "3.69.0"
    }
  }
}

provider "azurerm" {
  # Configuration options
  features {}
}

resource "azurerm_resource_group" "MyRG" {
  name     = "RG_TF_Tests"
  location = "East US"
  tags = {
    environment = "Azure Resource tests"
  }
}


#Azure SQL server
resource "azurerm_mssql_server" "sqlserver" {
  name                = "eduardopivaral-tf"
  resource_group_name = azurerm_resource_group.MyRG.name
  location            = azurerm_resource_group.MyRG.location
  version             = "12.0"
  # Note that the user and password is plain text, for this case we use variables, 
  # but that is out of scope for this post, we will discuss this on next posts.
  administrator_login          = "Gato"
  administrator_login_password = "-n0meAcuerd0-"

  #we add our account as administrator of the instance
  azuread_administrator {
    login_username = "epivaral@studyyourdata.com"
    object_id      = "<check object ID for the account in Azure Portal>"
  }
}


# Azure SQL Database
resource "azurerm_mssql_database" "sqldb" {
  name                 = "demoData"
  server_id            = azurerm_mssql_server.sqlserver.id
  collation            = "SQL_Latin1_General_CP1_CI_AS"
  sku_name             = "Basic" #we use basic tier
  storage_account_type = "Local" #local redundancy storage
  tags = {
    description = "Part 1 - just the empty resource"
  }
}


#firewall rule to allow us access to it
resource "azurerm_mssql_firewall_rule" "MyLaptopRule" {
  name             = "MyLaptopRule"
  server_id        = azurerm_mssql_server.sqlserver.id
  start_ip_address = "181.209.256.300" #This IP does not exist, you should put your IP or range
  end_ip_address   = "181.209.256.300"
}

We execute Terraform plan:

Terraform plan

Notice how this time three resources will be added, and no action will be taken for the resource group.
We execute terraform apply -auto-approve

Terraform apply -auto-approve

If everything is ok, we should see the process completing successfully:


To validate it, navigate to Azure portal (or using SSMS) and validate you can access the database:


Since we also included a firewall rule for our laptop, we should access without issues:


Destroying our infrastructure

This is a demo and I do not want to incur on additional costs, so I will bring down all the infrastructure. In a real-world scenario probably, you will just remove individual resources instead of everything, but this is how we delete everything:

Terraform destroy

Same as apply, we can use the -auto-approve flag, but I do not recommend it for destroy activities.


If you want to destroy just one resource (like in a real-word scenario), use the -target='resource.name' flag.

You can validate resources that have been removed by checking in Azure portal.

As you noticed, this approach has a lot of information to digest, and IaC is just used to provision the underlying infrastructure in an automated and repeatable way. 

To create database objects and populate tables with data, that is another part of the CI/CD pipeline, there are other tools for that, which we can discuss in the next posts.

As a suggested practice, you can try to deploy different resources to other providers like an AWS S3 bucket (we will make another posts for that as well).

Stay in touch for next articles where we will discuss sourcing and Terraform Cloud, and how to secure sensitive data.

Wednesday, August 23, 2023

Introduction to IaC: Deploying Data infrastructure to Azure using Terraform (Part 2 - our first Azure resource)

Continuing our Part 1, we already set up our environment, we can now setup our very first example (do not worry if is too simple at this point, but this is just to understand how it works).

Example 1: Deploy an Azure Resource Group

I think the Azure Resource Group is the most basic Azure resource, so we will verify our setup is ok by deploying one.

First, create your project folder on your local machine, in my case I am using C:\Terraform\Terraform_AZ_example but you can use any path you want.
Then, open that folder in VSCode:

cd C:\Terraform\Terraform_AZ_example
code .

Or use the GUI to open the folder:

 

Once opened, I like, as a best practice to add this .gitignore file even when you work locally, so sensitive information and Terraform state is not uploaded if you decide source to GitHub:

# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version 
# control as they are data points which are potentially sensitive and subject 
# to change depending on the environment.
*.tfvars
*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc

 Let us create a main.tf file:


The first step is to add the Azure provider, you can find the providers at https://registry.terraform.io/
Click on providers and then select Azure:


Click on Use Provider, and then select the code snippet shown:


Copy it to your main.tf, and inside the provider block, add this line of code: features {} as this is required, your code should look like this:


Next step is to add to the end of same script a resource of type Resource Group, as this is the most basic:

resource "azurerm_resource_group" "MyRG" {
  name     = "RG_TF_Tests"
  location = "East US"
}
 

On this case MyRG is the Terraform identifier for the resource, we will access the resource in the script using that name, and name is the name of the deployed resource into Azure.

For any resource, we can check the provider documentation to find examples and possible parameters we can configure:


Once done, we are ready to deploy our first resource.
Following Terraform lifecycle, we must execute init, for this, open a new terminal into VSCode and run

Terraform init

This will download required plugins into .terraform folder and create some other files that will be out of scope for now.

Next is to validate syntax, so run

Terraform validate

Everything looks ok, so we now need to plan the deployment, so run

Terraform plan 

We will see the proposed changes to our infrastructure (new resource group creation)

Note: it is possible to skip the Terraform validate and Terraform plan steps and jump from init to apply if you are confident with the changes you will perform, but I advise you to run those steps every time as a double-check.

Once we are ok with the proposed plan, we can deploy it using

Terraform apply

 It will ask for our confirmation to proceed with the changes, type yes. (It is possible to skip this confirmation with the -auto-approve flag):

This step can take some time since the process is done using API calls. So, after some wait, we can confirm the changes were done:

We need to validate, that the resource was properly created, to do that, login to your Azure console and go to resource groups, if everything is ok, you will see the resource there:

What if we want to modify the resource? For example, let us add a tag to the resource. Modify the resource block adding the tags block:

 resource "azurerm_resource_group" "MyRG" {
  name     = "RG_TF_Tests"
  location = "East US"
  tags = {
    environment = "Modification tests"
  }
}

Save the main.tf file.

This time, we will skip directly to Terraform apply using auto approve:

Terraform apply -auto-approve

 We can see that the resource was just updated with the new tag.


Validate it on the Azure portal:

On the next example we will create and configure an Azure SQL database inside this resource group.

Part 3 - Azure SQL Database



How to Find Expensive Queries in Amazon Redshift

Slow-running queries can degrade your Redshift cluster’s performance and lead to increased costs. Identifying the most expensive queries is ...