Development of AWS Glue scripts can potentially add unnecessary expenses to your invoice if you are not careful. This blog post shows one way to avoid some of the cost in an automated fashion by using AWS CloudFormation and AWS Lambda.
Background
A while ago, I had the opportunity to explore AWS Glue, a serverless extract, transform and load (ETL) service from AWS. The AWS Glue service offering also includes an optional developer endpoint, a hosted Apache Zeppelin notebook, that facilitates the development and testing of AWS Glue scripts in an interactive manner. Typically, you only pay for the compute resources consumed while running your ETL job. However, in contrast to the rest of AWS Glue family, usage of developer endpoints are charged hourly, regardless if you actively use them or not. You can see this when launching a dev endpoint from the AWS Console:
Billing for a development endpoint is based on the Data Processing Unit (DPU) hours used during the entire time it remains in the READY state. To stop charges, delete the endpoint. To delete, choose the endpoint in the list, and then choose Action, Delete.
Is this something that we should care about? Each dev endpoint gets 5 DPUs by default (with a minimum of 2 DPUs), and they are currently priced at $0.44 per DPU-hour (billed per second, 10 minutes minimum). Put it differently, you pay $2.20 per hour or $52.80 per day for the default configuration (or more than $100 for a weekend if you forget to delete your endpoint before you go home for the weekend which I did). For me, this was expensive enough to motivate an automatic dev endpoint deleter.
Outline
The solution consists of the following components:
- A Lambda function that lists all Glue developer endpoints and subsequently deletes them.
- A CloudWatch Event that triggers the Lambda function at scheduled intervals (typically at the end of each workday).
- A Lambda Permission that allows the CloudWatch Event to invoke the Lambda function.
- An IAM Role that allows the Lambda function to get and delete the Glue developer endpoints.
- A CloudFormation template that comprises all resources. Thus, the stack can be re-used across AWS accounts and AWS regions.
Solution
The entire solution is presented in the CloudFormation template below. By inlining the Lambda source code into the template a single file is enough for both the infrastructure as well as the application logic. I have chosen to declare the cron expression as a parameter. I have scheduled the CloudWatch Events to trigger the Lambda when I leave the office, typically at 5PM. Since the cron expression is given in UTC the actual time will depend on daylight saving. A cron expression of 0 16 * * ? *
translates to 5PM CET (Central European Time) in the winter and 6PM CEST (Central European Summer Time) in the summer.
AWSTemplateFormatVersion: 2010-09-09
Description: Stack that deletes all Glue Developer Endpoints in a region
Parameters:
CronExpression:
Type: String
Description: The cron expression for triggering the Glue Dev endpoint deletion (in UTC)
Default: 0 16 * * ? *
ConstraintDescription: Must be a valid cron expression
Resources:
DeleteGlueEndpointsLambda:
Type: AWS::Lambda::Function
Properties:
Description: A Lambda function that gets and deletes the AWS Glue Dev endpoints
Handler: index.handler
Code:
ZipFile: |
'use strict';
const AWS = require('aws-sdk');
const glue = new AWS.Glue({apiVersion: '2017-03-31'});
function deleteDevEndpoints(endpointNames) {
console.info('Deleting Glue DevEndpoints:', JSON.stringify(endpointNames));
const promises = endpointNames
.map(params => {
return glue.deleteDevEndpoint(params).promise()
.then(data => {
console.info('Deleted:', JSON.stringify(params));
return data;
});
});
return Promise.all(promises);
}
function extractEndPointNames(data) {
return data.DevEndpoints
.map(({ EndpointName }) => ({ EndpointName }));
}
exports.handler = (event, context, callback) => {
console.info('Event:', JSON.stringify(event));
glue.getDevEndpoints().promise()
.then(extractEndPointNames)
.then(deleteDevEndpoints)
.then(data => {
callback(null, data);
})
.catch(callback);
};
Role: !GetAtt LambdaExecutionRole.Arn
Runtime: nodejs6.10
TriggerRule:
Type: AWS::Events::Rule
Properties:
Description: Trigger for the DeleteGlueEndpointsLambda
ScheduleExpression: !Sub cron(${CronExpression})
Targets:
- Arn: !GetAtt DeleteGlueEndpointsLambda.Arn
Id: TriggerId
InvokeLambdaPermission:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !GetAtt DeleteGlueEndpointsLambda.Arn
Action: lambda:InvokeFunction
Principal: events.amazonaws.com
SourceArn: !GetAtt TriggerRule.Arn
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: GlueDeleteEndpointPolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- glue:GetDevEndpoint
- glue:GetDevEndpoints
- glue:DeleteDevEndpoint
Resource:
- '*'
Resources
The AWS documentation is comprehensive, yet it can be hard to navigate. Here are some links that you may find useful:
Awesome stuff! I was working on a similar solution and surprised with my hours of researching I didn’t find your blog prior to implementation. I approached it fairly similar, we have developer teams in multiple LOBs for our enterprise. They all use their own respective “team endpoints.” I split the solution into 2 methods:
1. Automated Solution
a. CloudWatch Event triggers Lambda function at scheduled intervals to create an Environment Snapshot of Active Endpoints, stored in S3
b. Lambda Permission allows CW Event invocation to Lambda above
c. Lambda Execution role permission to Delete EndPoints, Deletes all relevant endpoints we specify from a list
d. CloudWatch Event triggers Glue Endpoint creation. Function compares Snapshot against current active endpoints (some we keep active). Lambda Function than creates EndPoints from Snapshot that do not exisit. EndPoints active at the start of business day (same PubKeys, DPUs, dependency files etc.)
2. Hybrid Solution
a. CloudWatch Event triggers Lambda function at scheduled intervals to create an Environment Snapshot of Active Endpoints, stored in S3
b. Lambda Permission allows CW Event invocation to Lambda above
c. Lambda Execution role permission to Delete EndPoints, Deletes all relevant endpoints we specify from a list
d. Developers can launch Lambda function to createEndpoint. Lambda function validates JSON payload as endpoint variable. Correct endpoint variable will validate the Endpoint is deleted, and start a new one with the snapshot details.
Although it’s a bit more complexity, it functions similar to yours. I’ve kept the cloud native approach and ensured we have the same Glue Artifacts each day without change. Glad to see there’s someone else working on a solution too.
Best,
Jaz
https://www.linkedin.com/in/jazark/
@Jaz: Thanks for sharing!
/Mattias
Thanks for this – a scenario I just bumped into when after getting distracted I accidentally left an endpoint running overnight. Nevertheless implemented this, but had a couple issues with the js code, so rewrote it for python. Its not the slickest code but it works for us.
import json
import boto3
client = boto3.client(‘glue’)
def return_endpoints():
response = client.list_dev_endpoints(
MaxResults=123
)
endpoints = []
for endpoint in response[‘DevEndpointNames’]:
endpoints.append(endpoint)
print(f”Found {len(endpoints)} endpoints.”)
return endpoints
def kill_endpoints(event, context):
try:
endpoints = return_endpoints()
kills = 0
for endpoint in endpoints:
client.delete_dev_endpoint(EndpointName=endpoint)
print(f”Killed endpoint: {endpoint}”)
kills += 1
return {
‘statusCode’: 200,
‘body’: json.dumps(f”Successfully killed {kills} endpoints out of total {len(endpoints)} endpoints.”)
}
except Exception as e:
print(e)
Hopefully that can serve as a starting off point for anyone that stumbles across this but wants a python solution :)
@Rob F: Thanks for sharing!
// Mattias