Base Models

Supported Architectures

🦙 Llama
CodeLlama
🌬️Mistral
Zephyr
🔄 Mixtral
💎 Gemma
Gemma2
🏛️ Phi-3 / Phi-2
🔮 Qwen2 / Qwen
🗣️ Command-R
🧱 DBRX
🤖 GPT2
🔆 Solar
🌸 Bloom

Other architectures are supported on a best effort basis, but do not support dynamic adapter loading.

Selecting a Base Model

Check the HuggingFace Hub to find supported base models.

Usage:

lorax-launcher --model-id mistralai/Mistral-7B-v0.1 ...

Private Models

You can access private base models from HuggingFace by setting the HUGGING_FACE_HUB_TOKEN environment variable:

export HUGGING_FACE_HUB_TOKEN=<YOUR READ TOKEN>

Using Docker:

docker run --gpus all \
  --shm-size 1g \
  -p 8080:80 \
  -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
  ghcr.io/predibase/lorax:main \
  --model-id $MODEL_ID

Quantization

LoRAX supports loading the base model with quantization to reduce memory overhead, while loading adapters in full (fp32) or half precision (fp16, bf16), similar to the approach described in QLoRA.

See Quantization for details on the various quantization strategies provided by LoRAX.