Base Models
Supported Architectures
- 🦙 Llama
- CodeLlama
- 🌬️Mistral
- Zephyr
- 🔄 Mixtral
- 💎 Gemma
- Gemma2
- 🏛️ Phi-3 / Phi-2
- 🔮 Qwen2 / Qwen
- 🗣️ Command-R
- 🧱 DBRX
- 🤖 GPT2
- 🔆 Solar
- 🌸 Bloom
Other architectures are supported on a best effort basis, but do not support dynamic adapter loading.
Selecting a Base Model
Check the HuggingFace Hub to find supported base models.
Usage:
lorax-launcher --model-id mistralai/Mistral-7B-v0.1 ...
Private Models
You can access private base models from HuggingFace by setting the HUGGING_FACE_HUB_TOKEN
environment variable:
export HUGGING_FACE_HUB_TOKEN=<YOUR READ TOKEN>
Using Docker:
docker run --gpus all \
--shm-size 1g \
-p 8080:80 \
-e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
ghcr.io/predibase/lorax:main \
--model-id $MODEL_ID
Quantization
LoRAX supports loading the base model with quantization to reduce memory overhead, while loading adapters in full (fp32) or half precision (fp16, bf16), similar to the approach described in QLoRA.
See Quantization for details on the various quantization strategies provided by LoRAX.