AWS Glue Dev Endpoint Deleter

Development of AWS Glue scripts can potentially add unnecessary expenses to your invoice if you are not careful. This blog post shows one way to avoid some of the cost in an automated fashion by using AWS CloudFormation and AWS Lambda.

Background

A while ago, I had the opportunity to explore AWS Glue, a serverless extract, transform and load (ETL) service from AWS. The AWS Glue service offering also includes an optional developer endpoint, a hosted Apache Zeppelin notebook, that facilitates the development and testing of AWS Glue scripts in an interactive manner. Typically, you only pay for the compute resources consumed while running your ETL job. However, in contrast to the rest of AWS Glue family, usage of developer endpoints are charged hourly, regardless if you actively use them or not. You can see this when launching a dev endpoint from the AWS Console:

Billing for a development endpoint is based on the Data Processing Unit (DPU) hours used during the entire time it remains in the READY state. To stop charges, delete the endpoint. To delete, choose the endpoint in the list, and then choose Action, Delete.

Is this something that we should care about? Each dev endpoint gets 5 DPUs by default (with a minimum of 2 DPUs), and they are currently priced at $0.44 per DPU-hour (billed per second, 10 minutes minimum). Put it differently, you pay $2.20 per hour or $52.80 per day for the default configuration (or more than $100 for a weekend if you forget to delete your endpoint before you go home for the weekend which I did). For me, this was expensive enough to motivate an automatic dev endpoint deleter.

Outline

The solution consists of the following components:

  • A Lambda function that lists all Glue developer endpoints and subsequently deletes them.
  • A CloudWatch Event that triggers the Lambda function at scheduled intervals (typically at the end of each workday).
  • A Lambda Permission that allows the CloudWatch Event to invoke the Lambda function.
  • An IAM Role that allows the Lambda function to get and delete the Glue developer endpoints.
  • A CloudFormation template that comprises all resources. Thus, the stack can be re-used across AWS accounts and AWS regions.

Solution

The entire solution is presented in the CloudFormation template below. By inlining the Lambda source code into the template a single file is enough for both the infrastructure as well as the application logic. I have chosen to declare the cron expression as a parameter. I have scheduled the CloudWatch Events to trigger the Lambda when I leave the office, typically at 5PM. Since the cron expression is given in UTC the actual time will depend on daylight saving. A cron expression of 0 16 * * ? * translates to 5PM CET (Central European Time) in the winter and 6PM CEST (Central European Summer Time) in the summer.

AWSTemplateFormatVersion: 2010-09-09
Description: Stack that deletes all Glue Developer Endpoints in a region

Parameters:

  CronExpression:
    Type: String
    Description: The cron expression for triggering the Glue Dev endpoint deletion (in UTC)
    Default: 0 16 * * ? *
    ConstraintDescription: Must be a valid cron expression


Resources:

  DeleteGlueEndpointsLambda:
    Type: AWS::Lambda::Function
    Properties:
      Description: A Lambda function that gets and deletes the AWS Glue Dev endpoints
      Handler: index.handler
      Code:
        ZipFile: |
          'use strict';

          const AWS = require('aws-sdk');
          const glue = new AWS.Glue({apiVersion: '2017-03-31'});

          function deleteDevEndpoints(endpointNames) {
            console.info('Deleting Glue DevEndpoints:', JSON.stringify(endpointNames));
            const promises = endpointNames
              .map(params => {
                return glue.deleteDevEndpoint(params).promise()
                  .then(data => {
                    console.info('Deleted:', JSON.stringify(params));
                    return data;
                  });
              });
            return Promise.all(promises);
          }

          function extractEndPointNames(data) {
            return data.DevEndpoints
              .map(({ EndpointName }) => ({ EndpointName }));
          }

          exports.handler = (event, context, callback) => {
            console.info('Event:', JSON.stringify(event));
            glue.getDevEndpoints().promise()
              .then(extractEndPointNames)
              .then(deleteDevEndpoints)
              .then(data => {
                callback(null, data);
              })
              .catch(callback);
          };
      Role: !GetAtt LambdaExecutionRole.Arn
      Runtime: nodejs6.10

  TriggerRule:
    Type: AWS::Events::Rule
    Properties:
      Description: Trigger for the DeleteGlueEndpointsLambda
      ScheduleExpression: !Sub cron(${CronExpression})
      Targets:
      - Arn: !GetAtt DeleteGlueEndpointsLambda.Arn
        Id: TriggerId

  InvokeLambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !GetAtt DeleteGlueEndpointsLambda.Arn
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: !GetAtt TriggerRule.Arn

  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: GlueDeleteEndpointPolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - glue:GetDevEndpoint
                  - glue:GetDevEndpoints
                  - glue:DeleteDevEndpoint
                Resource:
                  - '*'

Resources

The AWS documentation is comprehensive, yet it can be hard to navigate. Here are some links that you may find useful:

Mattias Severson

Mattias is a senior software engineer specialized in backend architecture and development with experience of cloud based applications and scalable solutions. He is a clean code proponent who appreciates Agile methodologies and pragmatic Test Driven Development. Mattias has experience from many different environments, including everything between big international projects that last for years and solo, single day jobs. He is open-minded and curious about new technologies. Mattias believes in continuous improvement on a personal level as well as in the projects that he is working on. Additionally, Mattias is a frequent speaker at user groups, companies and conferences.

This Post Has 4 Comments

  1. Jaz

    Awesome stuff! I was working on a similar solution and surprised with my hours of researching I didn’t find your blog prior to implementation. I approached it fairly similar, we have developer teams in multiple LOBs for our enterprise. They all use their own respective “team endpoints.” I split the solution into 2 methods:

    1. Automated Solution
    a. CloudWatch Event triggers Lambda function at scheduled intervals to create an Environment Snapshot of Active Endpoints, stored in S3
    b. Lambda Permission allows CW Event invocation to Lambda above
    c. Lambda Execution role permission to Delete EndPoints, Deletes all relevant endpoints we specify from a list
    d. CloudWatch Event triggers Glue Endpoint creation. Function compares Snapshot against current active endpoints (some we keep active). Lambda Function than creates EndPoints from Snapshot that do not exisit. EndPoints active at the start of business day (same PubKeys, DPUs, dependency files etc.)

    2. Hybrid Solution
    a. CloudWatch Event triggers Lambda function at scheduled intervals to create an Environment Snapshot of Active Endpoints, stored in S3
    b. Lambda Permission allows CW Event invocation to Lambda above
    c. Lambda Execution role permission to Delete EndPoints, Deletes all relevant endpoints we specify from a list
    d. Developers can launch Lambda function to createEndpoint. Lambda function validates JSON payload as endpoint variable. Correct endpoint variable will validate the Endpoint is deleted, and start a new one with the snapshot details.
    Although it’s a bit more complexity, it functions similar to yours. I’ve kept the cloud native approach and ensured we have the same Glue Artifacts each day without change. Glad to see there’s someone else working on a solution too.

    Best,
    Jaz
    https://www.linkedin.com/in/jazark/

  2. Rob F

    Thanks for this – a scenario I just bumped into when after getting distracted I accidentally left an endpoint running overnight. Nevertheless implemented this, but had a couple issues with the js code, so rewrote it for python. Its not the slickest code but it works for us.

    import json
    import boto3

    client = boto3.client(‘glue’)

    def return_endpoints():
    response = client.list_dev_endpoints(
    MaxResults=123
    )
    endpoints = []
    for endpoint in response[‘DevEndpointNames’]:
    endpoints.append(endpoint)
    print(f”Found {len(endpoints)} endpoints.”)
    return endpoints

    def kill_endpoints(event, context):
    try:
    endpoints = return_endpoints()
    kills = 0
    for endpoint in endpoints:
    client.delete_dev_endpoint(EndpointName=endpoint)
    print(f”Killed endpoint: {endpoint}”)
    kills += 1
    return {
    ‘statusCode’: 200,
    ‘body’: json.dumps(f”Successfully killed {kills} endpoints out of total {len(endpoints)} endpoints.”)
    }
    except Exception as e:
    print(e)

    Hopefully that can serve as a starting off point for anyone that stumbles across this but wants a python solution :)

Leave a Reply