Cortex is an open source platform that takes machine learning models — trained with nearly any framework — and turns them into production web APIs in one command.
Quickstart
Below, we’ll walk through how to use Cortex to deploy OpenAI’s GPT-2 model as a service on AWS. You’ll need toinstall Cortexon your AWS account before getting started.
Step 1: Configure your deployment
Define adeployment
and anapi
resource. Adeployment
specifies a set of APIs that are deployed together. AnAPI
makes a model available as a web service that can serve real-time predictions. The configuration below will download the model from thecortex-examples
S3 bucket. You can run the code that generated the modelhere.
#cortex.yaml-kind:deployment name:text-kind:API name:generator model:S3 : // cortex-examples / text-generator / gpt-2 / 124 M request_handler:Handler .py
Step 2: Add request handling
The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also decode the inference output. This can be implemented in a request handler file using thepre_inference
and (post_inference) functions:
#handler.pyfromencoderimportget_encoder encoder=get_encoder () def(pre_inference) (sample,metadata): context=encoder.encode (sample ["text"]) return{""context: [context]} defpost_inference(prediction,metadata): response=prediction ["sample"] returnencoder.decode (response)
Step 3: Deploy to AWS
Deploying to AWS is as simple as runningcortex deploy
from your CLI.cortex deploy
takes the declarative configuration fromcortex.yaml
and creates it on the cluster. Behind the scenes, Cortex containerizes the model, makes it servable using TensorFlow Serving, exposes the endpoint with a load balancer, and orchestrates the workload on Kubernetes.
$ cortex deploy deployment started
You can track the status of a deployment usingcortex get
. The output below indicates that one replica of the API was requested and one replica is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.
$ cortex get generator --watch status up-to-date available requested last update avg latency live 1 1 1 8s 123 ms url: http: //***.amazonaws.com/text/generator
Step 4: Serve real-time predictions
Once you have your endpoint, you can make requests:
$ curl http: //***.amazonaws.com/text/generator -X POST -H""Content-Type: application / json" -d''{"text": "machine learning"}'Machine learning, with more than one thousand researchers around the world today, are looking to create computer-driven machine learning algorithms that can also be applied to human and social problems, such as education, health care, employment, medicine, politics, or the environment ...
Any questions?chat with us.
More examples
-
Sentiment analysiswith BERT
-
Image classificationwith Inception v3 and AlexNet
Key features
-
Autoscaling:Cortex automatically scales APIs to handle production workloads.
-
Multi framework:Cortex supports TensorFlow, Keras, PyTorch, Scikit-learn, XGBoost, and more.
-
CPU / GPU support:Cortex can run inference on CPU or GPU infrastructure.
-
Rolling updates:Cortex updates deployed APIs without any downtime.
-
Log streaming:Cortex streams logs from deployed models to your CLI.
-
Prediction monitoring:Cortex monitors network metrics and tracks predictions.
-
Minimal declarative configuration:Deployments are defined in a single (cortex.yaml) file.
GIPHY App Key not set. Please check settings