⚙️ How to launch a Ray Serve API on a remote cluster (AWS)

This tutorial will provide a step-by-step guide on how to deploy a Ray Serve application on a cloud cluster. By the end of the tutorial, you will have a minimal Ray Serve API application running on a remote cluster on AWS.

Ray.io is quite new. Even though is a great software, I found their documentation a bit confusing. It took me long than expected to deploy a ray scale on a cloud cluster. This guide will hopefully teach you something and make you spare some precious time.

I am writing this guide as I would have loved to find it: a step-by-step tutorial to follow along. This tutorial assume basic concepts such as ssh and the command line.

This guide was written for Ray 2.2.0, the stable version at the time of writing (January 2023).

Introduction

Ray

Ray is a distributed compute framework written in python that allows to parallelize and scale out workloads across a cluster of machines.

Ray abstracts away the complexities of distributed systems, allowing the programmer to focus on the logic of the applications. Under the hood, it uses a distributed actor model to execute tasks and share data between nodes. Actors are lightweight concurrent units of computation that can be created and scheduled on any node in the cluster. They can run in parallel, communicate with each other asynchronously, and share data using a distributed in-memory store called the object store.

Ray Serve

Ray is composed of many lbraries (Ray Core, Ray AIR, Ray Datasets, …) including Ray Serve, a scalable model serving library for building online inference APIs. Ray Serve is a toolkit that allows to serve various types of models and business logic, regardless of the framework they were built with. This includes deep learning models built with frameworks such as PyTorch, Tensorflow, and Keras, as well as Scikit-Learn models and arbitrary Python code.

Launch Ray Serve on a local cluster

One of the most compelling aspects of Ray is that it enables seamless scaling of workloads from a laptop to a large cluster. We will start by deploying Ray Serve application on a local cluster using the Ray CLI and later on deploy it on the cloud.

⚠️ We will make use of two different CLI, the Ray CLI to manage the clusters and the Serve CLI to manage the deployment of the Ray Serve application.

Installation

pip install "ray[serve]"

Pro-tip: If you are using an Apple computer with a Silicon chip, it is recommended to install grpcio with conda before calling pip install (conda install grpcio).

Local cluster

Start the local cluster.

ray start --head

By default ray start will start a Ray Cluster on port 6379. We can access the Ray Dashboard at 127.0.0.1:8265.

⚠️ Do not confuse the Ray Dashboard port used to access the Ray Dashboard with the ray dashboard agent’s port (default to 52365) used by the Ray Serve CLI to access the cluster.

Server API

We write a simple API server on api.py. It will be composed of a single endpoint that returns `hello world`. You can refer to the RAY Serve documentation to learn how to apply more complex models and how to compose it.

from starlette.requests import Request
from ray import serve


@serve.deployment(num_replicas=1, ray_actor_options={"num_cpus": 0.2, "num_gpus": 0})
class HelloWorld:
    async def __call__(self, http_request: Request) -> str:
        return "hello world"


helloworld = HelloWorld.bind()

Deploy the API

To deploy our Serve application we use the serve deploy command from the Serve CLI.

First, we define a Serve Config File that we call server_config.yaml. In here we specify the python path of the api server as well as the host and port of our API inference application.

import_path: api:helloworld

runtime_env: {}

host: 0.0.0.0

port: 8000

We start the server by calling serve deploy on the previously started cluster. Note that we use as address the Ray dashboard agent’s port.

serve deploy server_config.yaml -a http://127.0.0.1:52365

Note: we can use the serve status and serve config command to check the status of the deployed application.

serve status to see the deployment status of the API

serve config to see the current config file (in our case, this will be the same content as server_config.yaml).

Test

It is time to test our freshly-created API.

import requests
print(requests.post("http://127.0.0.1:8000/").text)

python test.py

The last command should display on screen “hello world”. Congratulations, you just created a Ray Serve application on a local cluster 🎉

Stop the local ray cluster

Time to stop the local cluster and move on to the next part.

ray stop

Launch a Ray Serve on a remote Cluster (AWS)

We will now follow similar steps to launch the cluster on the cloud.

Setting up AWS

In order to take advantage of Ray's cluster management feature, we must first give it the appropriate permissions. First, we will create an AWS user with the necessary access.

Open AWS and open the “IAM” settings.

Click on “Add users”

Set a username

Select “Access key - Programmatic access” and click next

Click the tab “Attach existing policies directly” and select the following policies:

AmazonEC2FullAccess
IAMFullAccess
AmazonS3FullAccess

Keep clicking “next” and create the new user

Save the aws_access_key_id and aws_secret_access_key keys under `~/.aws/credentials`. Do not share such keys with anyone!

[default]
aws_access_key_id = ...
aws_secret_access_key = ...

Cluster config

To specify the settings for the cluster, we will create a config file called cluster.yaml. Refer to the Cluster YAML Configuration Options page for more information about the available settings.

cluster_name: t2small

max_workers: 0 # default to 2

provider:
  type: aws
  region: us-west-2
  availability_zone: us-west-2a
  cache_stopped_nodes: False

docker:
  image: "rayproject/ray:latest-cpu"
  container_name: "ray_container"

  run_options:
   - --ulimit nofile=65536:65536

auth:
  ssh_user: ubuntu

available_node_types:
  ray.head.default:
    node_config:
      InstanceType: t2.small
    resources: {}

  ray.worker.default:
    node_config:
      InstanceType: t2.small
      InstanceMarketOptions:
        MarketType: spot
    resources: {}
    min_workers: 0
    max_workers: 0

head_node_type: ray.head.default

These settings configure the head node to run on a t2.small instance, which is one of the least expensive options. While this configuration is suitable for testing, it may not be sufficient for handling larger workloads with high demand.

Install boto3 (AWS SDK for python)

Ray requires boto3, the AWS SDK for python, to manage the cluster. Install it.

pip install boto3

Launch the cluster on AWS

We use the ray up command to launch the Ray cluster. This will take about 10 minutes. Go grab a coffee ☕.

ray up cluster.yaml

Start the dashboard

Once the cluster is operational, we can start the ray dashboard again, this time connecting to the remote cluster.

ray dashboard cluster.yaml

We can view the dashboard at 127.0.0.1:8265.

Deployment

⚠️ During my testing of Ray, I found the process of forwarding the Ray dashboard agent's port
to be somewhat confusing.

To deploy the Serve application, we must first use the ray attach command (or ssh) to forward the port to our laptop.

ray attach cluster.yaml -p 52365

Send api.py to the cluster:

ray rsync_up cluster.yaml api.py /home/ray/api.py -v

We can use the same command as before to deploy our Ray Serve application on the cloud cluster, but this time the address will point to the cloud cluster we just started due to the port forwarding.

serve deploy server_config.yaml -a http://127.0.0.1:52365

Time to test.

Open port 8000

ray attach cluster.yaml -p 8000

Test it:

python test.py
>>> "hello world"

Congratulations, you have successfully deployed a Ray Serve application on an AWS cloud cluster!

I hope this guide was helpful. If you have any questions or suggestions for improvement, don't hesitate to reach out.

Have fun working with Ray!

References

Ray’s documentation

Boto3' s documentation