Deploy a Machine Learning Service Quickly on AWS Fargate using AWS Copilot CLI and Streamlit 🚀

Charles Frenzel
7 min readDec 28, 2022

Authors: Charles Frenzel, Apeksha Agnihotri, Max Quinn

Overview

In this article, we’ll show you how to rapidly deploy a containerized Machine Learning service using the AWS Copilot CLI and Streamlit. You will learn how to get a text summarization service up and running with tooling using the command line, Python and AWS Fargate, which can scale up to thousands of transactions if needed*. To get the application up and running as quickly as possible, you will use AWS Copilot CLI to abstract the application infrastructure layer and Streamlit to abstract the User Interface UI. This will allow you to serve inference results in a robust way and remove the heavy lifting tasks typically needed for frontend or backend development.

Image by Authors

*Note: The primary focus of this tutorial is on both speed and abstraction for ML engineering and not on the principles behind abstractive text summarization transformers.

Prerequisites

To get started, you will first need to install the required packages on your local machine or on an Amazon EC2 instance. To learn more, read Getting Started with Amazon EC2. If you are using your local machine, credentials must first be configured, as explained in the Set up AWS Credentials and Region for Development documentation. Additionally, make sure you have Docker and the AWS Command Line Interface AWS CLI already installed. This tutorial assumes that you have an environment with the necessary AWS Identity and Access Management (IAM). Finally, install the AWS Copilot CLI. Releases currently support Linux and macOS systems. For more information, see Installing the AWS Copilot CLI.

To confirm that tooling is installed you can run:

aws --version
docker --version
copilot --version

Getting Started

After the proper tooling and permissions are in place, clone the repository on GitHub and you are ready to deploy the application.

git clone https://github.com/aws-samples/aws-copilot-cli-streamlit

Deployment

Copilot abstracts the creation of cloud components like VPC subnets, load balancers, deployment pipelines, and durable storage for your application’s stateful data. It will also create a docker image from the Dockerfile and then deploy it to ECS by pushing the image to ECR. See the docs for details.

To begin your deployment, follow the below steps:

1. First, set up the application and environment details. Execute thebelow commands one by one from the application home directory. Copilot prompts you through the set up of your application.

copilot init
copilot env init

After these commands you will be prompted for more details.

In the prompts inputs the following:

Application Name: text-summarizer
Workload Type: Load Balanced Web Service
Service Name: text-summarizer-service
Dockerfile: app/Dockerfile

Image by Authors

For setting up environment enter the below values for the prompts:

Would you like to deploy a test environment? : N
Environment Name: demo
Environment Configuration: Yes, use default

Image by Authors

This step will create copilot directory inside the home directory <app_home_dir> you set for the application, where you can configure your environment and app service.

2. Now, configure the application resource in manifest.yml.

This file allows you to customize your app and environment resource configurations. Since, our application is using pre-trained transformer model, the image will be large. We will need at least 4GB of disk space to build the image. To do so, edit


<app_home_dir>/copilot/text-summarizer-service/manifest.yml

and set below parameters to update the default settings.

cpu: 4096 # Number of CPU units for the task.
memory: 8192 # Amount of memory in MiB used by the task.
platform: linux/x86_64 # See https://aws.github.io/copilot-cli/docs/manifest/lb-web-service/#platform
count: 1 # Number of tasks that should be running in your service.
exec: true # Enable running commands in your container.

For more details about configuration using manifest, refer to this document.

3. Next we start deploying the infrastructure. Run

copilot env deploy — name demo

Copilot then deploys the ECS cluster by referring to the <app_home_dir>/copilot/demo/manifest.yml file configurations.

4. Lastly, we deploy our application by executing

copilot deploy

This will first create the Docker image from the Dockerfile provided in the app, push it to ECR and then create an ECS service in the cluster created in step 1. Once the application is deployed, you will find public URL which you can use to access the application from your browser.

Image by Authors

Under the Hood: Application and Deployment

Image by Authors

Step 1: Command Line Deployment

After building the application, a developer uses the AWS Copilot CLI to issue commands to kickoff the standup process.

AWS Copilot CLI is an open source command line interface that makes it easy for developers to build, release and operate production ready containerized applications on AWS App Runner, Amazon ECS, and AWS Fargate. AWS Copilot CLI will automate the creation of backend services and the application deployment. It is worth noting, that it is possible to deploy a continuous integration / continuous deployment (CI/CD) pipeline through the CLI but this is out of scope for the pattern.

Step 2: Application Infrastructure

AWS Copilot reads a manifest.yml file as input and transforms it to low-level infrastructure-as-code such as AWS CloudFormation templates to deploy your application

There are a few things going on at this step:

  1. Copilot uses the secure AWS CloudFormation API to make calls to various services
  2. It will use this to provision an Amazon ECS Service on AWS Fargate with an Application Load Balancer in front of it.
  3. It will build and push a container image to an Amazon ECR repository to spin up the containers as AWS Fargate task with your ML service inside.
  4. In addition, an Amazon S3 bucket to upload any local artifacts, such as environment files, and a KMS key to encrypt the contents of the bucket.

A step out of scope for this blog but worth mentioning is that setting up a CI/CD pipeline in AWS Copilot CLI is just another command. This would allow for automated release and deployment using CI/CD pipelines.

Another point is that by default logging will go directly to Cloud Watch logs.

Step 3: Load Balanced Web Application

AWS Fargate is a serverless, pay-as-you-go compute engine that lets you focus on building applications without managing servers.

Amazon ECS on AWS Fargate serves the web application as a load balanced service, with a large container for the machine learning inference. In addition, to a load balancer at the front, Amazon ECS services can provide autoscaling by integrating with Application Auto Scaling. This allows for the application to scale out horizontally based on demand.

This means your simple ML application has scalability and availability all baked in. Pretty sweet!

Step 4: Model Serving

The model serving piece of the application consists of using the both the Hugging Face HF Transformers library and Streamlit. The Streamlit app is simple text input interface for text chunks to be sent to the Transformers’ text summarization model for inference. The Transformers model will return a short summary based on the parameters selected. The application is set by default to use 10 to 50 words summaries.

HF Transformers provides a pool of pre-trained models to perform various tasks such as vision, text, and audio. Transformers provides APIs to download and experiment with the pre-trained models, and you can even fine-tune them. HF hosts a variety of text summarization (abstractive in this case) models but for our purposes you are using the lightweight Distilled Bart model.

Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. Streamlit will act as the User Interface UI for getting ML inference results back, automating the front-end layer of the application.

Under the hood, Streamlit uses the Tornado framework in Python, which by using non-blocking network I/O, Tornado can scale to tens of thousands of open connections.

Conclusion

In this blog post we showed you how to quickly stand up an ML service using the AWS Copilot CLI and Streamlit. This setup is a great starting point if you don’t want to get bogged down in UI design or Infrastructure as Code as it allows for the application to scale out to a ‘mid-tier’ level of service with minimal development. Using auto-scaling with AWS Fargate, along with Steamlit’s usage of Tornado, are more than enough to support a mid-tier level of usage and can scale to more advanced setups if necessary by modifying manifest files. Lastly, it’s important to note that since the steps presented here are ML agnostic, this tooling combination can ship most ML use cases, minimizing overall development time.

References

Set up a text summarization project with Hugging Face Transformers: Part 2

How to Build A Machine Learning Demo in 2022

Deploying a Simple Streamlit app using Docker

--

--