SkyPilot
SkyPilot is a framework for running AI workloads in the cloud of your choice (AWS, Azure, GCP, etc.). It abstracts away the complexity of finding available GPU resources across clouds / zones, syncing data between storage systems, and managing the excution of distributed workloads.
Setup
First install SkyPilot and check that your cloud credentials are properly set:
pip install skypilot
sky check
Launch a deployment
Create a YAML configuration file called lorax.yaml
:
resources:
cloud: aws
accelerators: A10G:1
memory: 32+
ports:
- 8080
envs:
MODEL_ID: mistralai/Mistral-7B-Instruct-v0.1
run: |
docker run --gpus all --shm-size 1g -p 8080:80 -v ~/data:/data \
ghcr.io/predibase/lorax:main \
--model-id $MODEL_ID
In the above example, we're asking SkyPilot to provision an AWS instance with 1 Nvidia A10G GPU and at least 32GB of RAM. Once the node is provisioned, SkyPilot will launch the LoRAX server using our latest pre-built Docker image.
Let's launch our LoRAX job:
sky launch -c lorax-cluster lorax.yaml
By default, this config will deploy Mistral-7B-Instruct, but this can be overridden by running sky launch
with the argument --env MODEL_ID=<my_model>
.
Warn
This config will launch the instance on a public IP. It's highly recommended to secure the instance within a private subnet. See the Advanced Configurations section of the SkyPilot docs for options to run within VPC and setup private IPs.
Prompt LoRAX
In a separate window, obtain the IP address of the newly created instance:
sky status --ip lorax-cluster
Now we can prompt the LoRAX deployment as usual:
IP=$(sky status --ip lorax-cluster)
curl http://$IP:8080/generate \
-X POST \
-d '{"inputs": "[INST] Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? [/INST]", "parameters": {"max_new_tokens": 64, "adapter_id": "vineetsharma/qlora-adapter-Mistral-7B-Instruct-v0.1-gsm8k"}}' \
-H 'Content-Type: application/json'
Stop the deployment
Stopping the deployment will shut down the instance, but keep the storage volume:
sky stop lorax-cluster
Because we set docker run ... -v ~/data:/data
in our config from before, this means any model weights or adapters we downloaded will be persisted the next time we run sky launch
. The LoRAX Docker image will also be cached, meaning tags like latest
won't be updated on restart unless you add docker pull
to your run
configuration.
Delete the deployment
To completely delete the deployment, including the storage volume:
sky down lorax-cluster
The next time you run sky launch
, the deployment will be recreated from scratch.