AWS Automated EC² Security Incident Response in Practice

Fabricio

Published in

System Weakness

12 min readSep 19, 2023

AWS Automated EC² Security Incident Response in Practice

Summary

The NIST Cybersecurity Framework (NIST CSF) is a set of guidelines developed to improve cybersecurity risk management and to protect computer systems. NIST core competencies are aligned with the 5 NIST functions: Identify, Protect, Detect, Respond and Recover.

This article is about how to concretely mitigate, isolate and gather info about the attacker and the attack after it succeeded. Working on the respond domain, we’ll explore practically using a use case how to save info for future investigation and what we can do to stop the attack immediately.

Let’s suppose the cybersecurity team have discovered a flaw being used by an attacker, or could be some exfiltration of data happening, or even important credentials present in the system were found leak in the web that could invoke a request for immediate termination of the instance. Let’s also be clear, this use case is about compromised system running in production that needs to be stopped to mitigate and avoid further damage to the company, so this decision was made before, following company policies and procedures attending some criteria like business loss and others. The decision to take down a system running in production is not simple and can be done on the fly in a hot daily basis due to some zero-day exploits like we had when Heartbleed was discovered and started to be used around, any critical security incident could justify the termination of the instance. This workflow is intended to act only on one instance and considering only the instance was compromised, and not other services in AWS or the AWS account, or even systems running on the instance. This article also does not cover the recover of the system, for a properly recover the company could choose to make some backup of the data inside the instance, before compromised or even if compromised, and this step should be planned before.

Summarizing our actions we’ll collect maximum data possible in an automated fashion and terminate the instance. To do such a complex task we’ll use AWS cloud native technologies like AWS Lambda, S3 buckets, and SSM Systems Managers. As it’s simple to say but not so simple to do we’ll broke the tasks in several steps as following. Authentication methods and concerns will not be taken into account here.

Workflow

Acquire Compromised Instance Metadata
Acquire Compromised Instance Memory image
Take EBS Snapshots
Terminate Instance

Prerequisites

We’re working with the maximum of cloud native tools, Lambda to run functions, S3 buckets to save artifacts, and Systems Manager(SSM) installed on machines to do what needed at instance level, beyond that we’ll code IaC in Terraform and use Python to program some specific tasks.

Acquiring EC2 Metadata

To acquire EC2 metadata we’ll create a Lambda function running Python code. The file structure is simple, three files for Terraform, the main file for the code and two others for variables, and the file code in python “.py” in the same directory, variables will be used on the code to pass the instance value as for the S3 bucket name, this schema will be followed always in next steps when Terraform code is used.

import boto3
import json
import os
from datetime import datetime

ec2 = boto3.client('ec2')
s3 = boto3.client('s3')

def datetime_serializer(obj):
    if isinstance(obj, datetime):
        return obj.isoformat()
    raise TypeError(f"Object of type {obj.__class__.__name__} is not JSON serializable")

def lambda_handler(event, context):
    instance_id = os.environ['INSTANCE_ID']
    bucket_name = os.environ['S3_BUCKET_NAME']

    response = ec2.describe_instances(InstanceIds=[instance_id])
    instance_metadata = response['Reservations'][0]['Instances'][0]

    # Convert datetime object to string using custom serializer
    instance_metadata['LaunchTime'] = datetime_serializer(instance_metadata['LaunchTime'])

    s3.put_object(Bucket=bucket_name, Key=f"{instance_id}.json", Body=json.dumps(instance_metadata, default=datetime_serializer))

    return {
        "statusCode": 200,
        "body": "Lambda function executed successfully!"
    }

On Terraform code:

We’ll need a role, role attachment, and a policy to permit the upload to the S3 bucket.

provider "aws" {
  region = "us-east-1"
}

resource "aws_lambda_function" "get_ec2_metadata" {
  filename      = "getec2metadata.zip"
  function_name = "getec2metadata"
  role          = aws_iam_role.lambda_exec.arn
  handler       = "getec2metadata.lambda_handler"
  runtime       = "python3.8"
  
  environment {
    variables = {
      INSTANCE_ID      = var.INSTANCE_ID
      S3_BUCKET_NAME  = var.S3_BUCKET_NAME
    }
  }
}

resource "aws_iam_role" "lambda_exec" {
  name = "lambda-exec-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Sid    = ""
        Principal = {
          Service = "lambda.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_policy" "lambda_exec_policy" {
  name        = "lambda-exec-policy"
  description = "Policy for Lambda to access EC2 and S3"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action   = ["ec2:DescribeInstances"]
        Effect   = "Allow"
        Resource = "*"
      },
      {
        Action   = ["s3:PutObject"]
        Effect   = "Allow"
        Resource = "arn:aws:s3:::${var.S3_BUCKET_NAME}/*"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "lambda_exec_attachment" {
  policy_arn = aws_iam_policy.lambda_exec_policy.arn
  role       = aws_iam_role.lambda_exec.name
}

data "archive_file" "lambda_function" {
  type        = "zip"
  source_file = "getec2metadata.py"  # Change this to the name of your Python file
  output_path = "getec2metadata.zip"  # Change this to the desired ZIP file name
}

Variables file:

variable "INSTANCE_ID" {
  description = "The EC2 instance ID"
}

variable "S3_BUCKET_NAME" {
  description = "The name of the S3 bucket"
}

Result Metadata

This is an example of the metadata result the code will give us. I changed some values to “X” due to security concerns.

{
    "AmiLaunchIndex": 0,
    "ImageId": "ami-XXXXXXXXXXXXXXXXX",
    "InstanceId": "i-XXXXXXXXXXXXXXXXX",
    "InstanceType": "t2.micro",
    "KeyName": "XXX_key",
    "LaunchTime": "2023-08-05T10:52:57+00:00",
    "Monitoring": {
        "State": "disabled"
    },
    "Placement": {
        "AvailabilityZone": "us-east-1a",
        "GroupName": "",
        "Tenancy": "default"
    },
    "PrivateDnsName": "ip-172-31-86-18.ec2.internal",
    "PrivateIpAddress": "172.31.86.18",
    "ProductCodes": [],
    "PublicDnsName": "ec2-100-26-195-175.compute-1.amazonaws.com",
    "PublicIpAddress": "100.26.195.175",
    "State": {
        "Code": 16,
        "Name": "running"
    },
    "StateTransitionReason": "",
    "SubnetId": "subnet-XXXXXXXX",
    "VpcId": "vpc-XXXXXXXX",
    "Architecture": "x86_64",
    "BlockDeviceMappings": [
        {
            "DeviceName": "/dev/xvda",
            "Ebs": {
                "AttachTime": "2023-08-02T20:28:14+00:00",
                "DeleteOnTermination": true,
                "Status": "attached",
                "VolumeId": "vol-04ffc6f5ca1fXXXXX"
            }
        }
    ],
    "ClientToken": "19ed11b7-9f57-4a36-9777-XXXXXXXXXXXX",
    "EbsOptimized": false,
    "EnaSupport": true,
    "Hypervisor": "xen",
    "NetworkInterfaces": [
        {
            "Association": {
                "IpOwnerId": "amazon",
                "PublicDnsName": "ec2-100-26-195-175.compute-1.amazonaws.com",
                "PublicIp": "100.26.195.175"
            },
            "Attachment": {
                "AttachTime": "2023-08-02T20:28:13+00:00",
                "AttachmentId": "eni-attach-0eed499f3XXXXXXX",
                "DeleteOnTermination": true,
                "DeviceIndex": 0,
                "Status": "attached",
                "NetworkCardIndex": 0
            },
            "Description": "",
            "Groups": [
                {
                    "GroupName": "launch-wizard-2",
                    "GroupId": "sg-01a724b2eXXXXXXXX"
                }
            ],
            "Ipv6Addresses": [],
            "MacAddress": "12:6f:78:ea:b4:f1",
            "NetworkInterfaceId": "eni-0def0af1eXXXXXXXX",
            "OwnerId": "XXXXXXXXXXXX",
            "PrivateDnsName": "ip-172-31-86-18.ec2.internal",
            "PrivateIpAddress": "172.31.86.18",
            "PrivateIpAddresses": [
                {
                    "Association": {
                        "IpOwnerId": "amazon",
                        "PublicDnsName": "ec2-100-26-195-175.compute-1.amazonaws.com",
                        "PublicIp": "100.26.195.175"
                    },
                    "Primary": true,
                    "PrivateDnsName": "ip-172-31-86-18.ec2.internal",
                    "PrivateIpAddress": "172.31.86.18"
                }
            ],
            "SourceDestCheck": true,
            "Status": "in-use",
            "SubnetId": "subnet-XXXXXXXX",
            "VpcId": "vpc-XXXXXXXX",
            "InterfaceType": "interface"
        }
    ],
    "RootDeviceName": "/dev/xvda",
    "RootDeviceType": "ebs",
    "SecurityGroups": [
        {
            "GroupName": "launch-wizard-2",
            "GroupId": "sg-01a724b2eXXXXXXXX"
        }
    ],
    "SourceDestCheck": true,
    "Tags": [
        {
            "Key": "Name",
            "Value": "myec2"
        }
    ],
    "VirtualizationType": "hvm",
    "CpuOptions": {
        "CoreCount": 1,
        "ThreadsPerCore": 1
    },
    "CapacityReservationSpecification": {
        "CapacityReservationPreference": "open"
    },
    "HibernationOptions": {
        "Configured": false
    },
    "MetadataOptions": {
        "State": "applied",
        "HttpTokens": "required",
        "HttpPutResponseHopLimit": 2,
        "HttpEndpoint": "enabled",
        "HttpProtocolIpv6": "disabled",
        "InstanceMetadataTags": "disabled"
    },
    "EnclaveOptions": {
        "Enabled": false
    },
    "BootMode": "uefi-preferred",
    "PlatformDetails": "Linux/UNIX",
    "UsageOperation": "RunInstances",
    "UsageOperationUpdateTime": "2023-08-02T20:28:13+00:00",
    "PrivateDnsNameOptions": {
        "HostnameType": "ip-name",
        "EnableResourceNameDnsARecord": true,
        "EnableResourceNameDnsAAAARecord": false
    },
    "MaintenanceOptions": {
        "AutoRecovery": "default"
    },
    "CurrentInstanceBootMode": "legacy-bios"
}

Acquiring Memory Image

https://ultimatepopculture.fandom.com/wiki/Dynamic_random-access_memory

This step is the most difficult and must be taken as optional. In a compromised system we need to consider the possibility that the attacker could kick us out in the process or even completely lock us out of the compromised machine once he gains access to the system. To acquire the RAM data of machines we need to have some software installed and prepared before, this should be done automated on the creation of the instance. For each system we use a software accordingly, LiME can be used for Linux-based OS’s, MACMemoryReader for Mac OS’s and FTK Imager for Windows OS’s. We’ll use LiME for our case in an Amazon Linux 2 distro.

Installing LiME:

sudo yum install kernel-devel kernel-headers -y
sudo yum install git -y
git clone https://github.com/504ensicsLabs/LiME.git
cd LiME/src
make

Installing SSM Agent:

sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm

Getting the Memory Image and sending to S3:

sudo insmod LiME/src/lime-6.1.41-63.114.amzn2023.x86_64.ko "path=./ramdata.mem format=raw" && aws s3 cp ramdata.mem s3://yournamebucket/folder/ramdata.mem

Terraform code:

On Terraform we’ll code the permissions needed, the encrypted bucket, the role, the KMS key, and the SSM document we’ll use to run the command remote and get our RAM data delivered to the bucket. Let’s start with the role, attachments, and policies needed to access the S3 bucket, and the SSM document.

resource "aws_iam_instance_profile" "GetRamData_instance_profile" {
  name = "GetRamData_InstanceProfile"
  role = aws_iam_role.GetRamData_role.name
}

resource "aws_iam_role" "GetRamData_role" {
  name = "GetRamData_Role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Action = "sts:AssumeRole",
        Effect = "Allow",
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
}

# iam s3 put policy
resource "aws_iam_policy" "s3_put_policy" {
  name        = "S3PutPolicy"
  description = "Policy to permit S3 PutObject"

Attach Instance Created Profile to EC2 instance:

aws ec2 associate-iam-instance-profile --instance-id <instance_id> --iam-instance-profile Name=<instance_profile_name>

S3 Bucket + KMS Key

# creates the kms key
resource "aws_kms_key" "RamData_key" {
  description             = "RamData KMS key"
  enable_key_rotation     = true
  deletion_window_in_days = 30
}

# creates the bucket 
resource "aws_s3_bucket" "RamData_bucket" {
  bucket = var.S3_BUCKET_NAME
}

# creates the configuration needed for the encryption
resource "aws_s3_bucket_server_side_encryption_configuration" "RamData_S3_Encryption" {
  bucket = aws_s3_bucket.RamData_bucket.bucket

  rule {
    apply_server_side_encryption_by_default {
      kms_master_key_id = aws_kms_key.RamData_key.arn
      sse_algorithm     = "aws:kms"
    }
  }
}

# creates a folder inside the bucket
resource "aws_s3_object" "Ram_Data" {
  bucket = aws_s3_bucket.RamData_bucket.id
  key    = "RamData/"
}

SSM Document


# SSM Document for running the command on the instance 
resource "aws_ssm_document" "ram2s3_ssm_document" {
  name          = "Ram2S3_SSM_Document"
  document_type = "Command"
  content = jsonencode({
    schemaVersion = "2.2",
    description   = "Command Document Take RAM and send to S3",
    mainSteps = [
      {
        action = "aws:runShellScript",
        name   = "Ram2S3_SSM_Document",
        inputs = {
          runCommand = [
            "sudo insmod /home/ec2-user/LiME/src/lime-6.1.41-63.114.amzn2023.x86_64.ko 'path=./ramdata.mem format=raw' && aws s3 cp ramdata.mem s3://limerambkt/LIMERAM/ramdata.mem"
          ]
        }
      }
    ],
    parameters = {
      Message = {
        type        = "String",
        description = "Example",
        default     = "None"
      }
    }
  })
}

# SSM association to instance and SSM document 
resource "aws_ssm_association" "ram2s3_association" {
  name             = aws_ssm_document.ram2s3_ssm_document.name
  document_version = "$LATEST"
  targets {
    key    = "InstanceIds"
    values = [var.INSTANCE_ID]
  }
  depends_on = [aws_ssm_document.ram2s3_ssm_document]
}

Ram uploaded to S3 bucket

Excerpt of taken Ram

Can we extend into region below? %p + %x + %x + %x ?=? %p
No: considering a new region at %p of size %x
requested buffer size is too large../../grub-core/kern/buffer.cnew read is position beyond the end of the written data%s%s (%s) invalid argument../../grub-core/kern/corecmd.c(%s): Filesystem is %s.
one argument expected%s=%s
not an assignment[ENVVAR=VALUE]Set an environment variable.ENVVARRemove an environment variable.[ARG]List devices or files.MODULEinsmodInsert a module.%s,%sopening device %s
rootvariable `%s' isn't set../../grub-core/kern/device.c%s read failed
diskClosing `%s'...
Closing `%s' succeeded.
Opening `%s'...
sector sizes of %d bytes aren't supported yet../../grub-core/kern/disk.cno such partitionOpening `%s' failed.
Opening `%s' succeeded.
disk `%s' not foundattempt to read or write outside of partition../../grub-core/kern/disk_common.cattempt to read or write outside of disk `%s'Read out of range: sector 0x%llx (%s).
`%s' is already loaded../../grub-core/kern/dl.cwmodule at %p, size 0x%lx
ELF header smaller than expectedinvalid arch-independent ELF magicthis ELF file is not of the right typeELF sections outside corerelocating to %p

Take EBS snapshot

To take EBS snapshot we’ll use a Lambda function with some python code. In the Terraform code we have the role for the Lambda function and the policies needed.

# IAM Role for Lambda Function
resource "aws_iam_role" "lambda_role" {
  name = "lambda_role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Action = "sts:AssumeRole",
        Effect = "Allow",
        Principal = {
          Service = "lambda.amazonaws.com"
        }
      }
    ]
  })
}

# Attach necessary policies to the IAM Role
resource "aws_iam_policy_attachment" "lambda_policy_attachment" {
  name = "iamattachment"
  policy_arn = "arn:aws:iam::aws:policy/AWSLambda_FullAccess"
  roles       = [aws_iam_role.lambda_role.name]
}

resource "aws_iam_policy" "ec2_describe_policy" {
  name        = "EC2DescribePolicy"
  description = "Policy to describe EC2 instances"
  
  policy = jsonencode({
    Version   = "2012-10-17",
    Statement = [
      {
        Action   = [
            "ec2:DescribeInstances",
            "ec2:CreateSnapshot",
            "ec2:DescribeVolumes",
            "ec2:CreateTags"
        ],
        Effect   = "Allow",
        Resource = "*"
      }
    ]
  })
}

resource "aws_iam_policy_attachment" "ec2_describe_attachment" {
  name = "ec2describeattachment"
  policy_arn = aws_iam_policy.ec2_describe_policy.arn
  roles       = [aws_iam_role.lambda_role.name]
}

# Lambda Function
resource "aws_lambda_function" "ebs_snapshot_lambda" {
  filename         = "lambda_function.zip" # Update this with your Lambda deployment package
  function_name    = "ebsSnapshotLambda"
  role             = aws_iam_role.lambda_role.arn
  handler          = "takeebssnapshot.lambda_handler"
  source_code_hash = filebase64sha256("lambda_function.zip")
  runtime          = "python3.8"
  timeout = 10
  
  environment {
    variables = {
      INSTANCE_ID   = var.INSTANCE_ID
      S3_BUCKET_ARN = var.S3_BUCKET_ARN
    }
  }
}

data "archive_file" "lambda_function" {
  type        = "zip"
  source_file = "takeebssnapshot.py"  # Change this to the name of your Python file
  output_path = "lambda_function.zip"  # Change this to the desired ZIP file name
}

The Lambda function

import boto3
import os

def lambda_handler(event, context):
    # Get the EC2 instance ID from the environment variable
    instance_id = os.environ['INSTANCE_ID']

    # Create a connection to the EC2 service
    ec2 = boto3.client('ec2')

    # Get a list of EBS volumes attached to the specified instance
    response = ec2.describe_instances(InstanceIds=[instance_id])
    volumes = response['Reservations'][0]['Instances'][0]['BlockDeviceMappings']

    # Create a connection to the S3 service
    s3 = boto3.client('s3')

    for volume in volumes:
        volume_id = volume['Ebs']['VolumeId']
        
        # Create a snapshot of the EBS volume
        snapshot = ec2.create_snapshot(VolumeId=volume_id)

        # Tag the snapshot with relevant information
        ec2.create_tags(
            Resources=[snapshot['SnapshotId']],
            Tags=[
                {'Key': 'Name', 'Value': f'Snapshot for Volume {volume_id}'},
                {'Key': 'InstanceID', 'Value': instance_id}
            ]
        )


    return "Snapshot creation process initiated."

Snapshot

Terminate Instance

The last step is terminate the instance, here we’ll automate with Lambda.

Terraform Code

# IAM Role for Lambda Function
resource "aws_iam_role" "lambda_role_termination" {
  name = "lambda_role_termination"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Action = "sts:AssumeRole",
        Effect = "Allow",
        Principal = {
          Service = "lambda.amazonaws.com"
        }
      }
    ]
  })
}

# Attach necessary policies to the IAM Role for termination Lambda
resource "aws_iam_policy_attachment" "lambda_policy_attachment_termination" {
  name        = "iamattachment_termination"
  policy_arn  = "arn:aws:iam::aws:policy/AWSLambda_FullAccess"
  roles       = [aws_iam_role.lambda_role_termination.name]
}

resource "aws_iam_policy" "ec2_termination_policy" {
  name        = "EC2TerminationPolicy"
  description = "Policy to terminate EC2 instances"
  
  policy = jsonencode({
    Version   = "2012-10-17",
    Statement = [
      {
        Action   = [
            "ec2:TerminateInstances",
            "ec2:DescribeInstances",
            "ec2:DescribeInstanceAttribute"
        ],
        Effect   = "Allow",
        Resource = "*"
      }
    ]
  })
}

resource "aws_iam_policy_attachment" "ec2_termination_attachment" {
  name        = "ec2terminationattachment"
  policy_arn  = aws_iam_policy.ec2_termination_policy.arn
  roles       = [aws_iam_role.lambda_role_termination.name]
}

# Lambda Function for Instance Termination
resource "aws_lambda_function" "instance_termination_lambda" {
  filename         = "termination_function.zip" # Update this with your Lambda deployment package
  function_name    = "instanceTerminationLambda"
  role             = aws_iam_role.lambda_role_termination.arn
  handler          = "termination.lambda_handler"
  source_code_hash = filebase64sha256("termination_function.zip")
  runtime          = "python3.8"
  timeout          = 10
  
  environment {
    variables = {
      INSTANCE_ID = var.INSTANCE_ID
    }
  }
}

data "archive_file" "termination_function" {
  type        = "zip"
  source_file = "termination.py"  # Change this to the name of your Python termination file
  output_path = "termination_function.zip"  # Change this to the desired ZIP file name
}

Python Lambda Code

import boto3
import json
import os  # Import the os module

def lambda_handler(event, context):
    # Parse the event payload as JSON to access INSTANCE_ID
    
    instance_id = os.environ['INSTANCE_ID']

    if instance_id:
        # Create a connection to the EC2 service
        ec2 = boto3.client('ec2')

        # Terminate the instance
        ec2.terminate_instances(InstanceIds=[instance_id])

        return {
            "statusCode": 200,
            "body": json.dumps("Instance termination initiated successfully.")
        }
    else:
        return {
            "statusCode": 400,
            "body": json.dumps("Missing INSTANCE_ID in the request payload.")
        }

Instance Terminated

Final thoughts & Considerations

This practical steps can be automated using AWS Steps or any other tool capable to automate the workflow. Some tasks like detach from Autoscaling Groups and ELB Target Groups should be taken into consideration, if it’s the case, remove security groups attached to instances, detach instance roles or disable instances roles or even attach an explicit deny policy to an EC2 role are tasks one could be willing to taken into consideration too. Any other Credentials linked to compromised instances like workloads using Vault secrets could be the case to refresh or disable. When an instance is found compromised and an incident is started, mitigate the attack and take info about the attack to prevent further losses to the company must be the top priority for security teams and for the company working together to achieve the same goal.

Thanks for reading :)

AWS Automated EC² Security Incident Response in Practice

Summary

Workflow

Prerequisites

Acquiring EC2 Metadata

Acquiring Memory Image

Take EBS snapshot

Terminate Instance

Final thoughts & Considerations

Written by Fabricio