LoRA
Low Rank Adaptation (LoRA) is a popular adapter method for fine-tuning response quality.
LoRAX supports LoRA adapters trained using frameworks like PEFT and Ludwig.
How it works
graph BT
I{{X}} --> W;
I --> A[/LoRA A\];
A --> B[\LoRA B/];
W --> P((+));
B--> P;
P --> O{{Y}}
LoRA works by targeting specific layers of the base model and inserting a new low-rank pair of weights LoRA A and LoRA B alongside each base model
param W. The input X is passed through both the original weights and the LoRA weights, and then the activations are summed together
to produce the final layer output Y.
Usage
Supported Target Modules
When training a LoRA adapter, you can specify which of these layers (or "modules") you wish to target for adaptation. Typically
these are the projection layers in the attention blocks (q and v, sometimes k and o as well for LLaMA like models), but can
usually be any linear layer.
Here is a list of supported target modules for each architecture in LoRAX. Note that in cases where your adapter contains target modules that LoRAX does not support, LoRAX will ignore those layers and emit a warning on the backend.
Llama
q_projk_projv_projo_projgate_projup_projdown_projlm_head
Mistral
q_projk_projv_projo_projgate_projup_projdown_projlm_head
Mixtral
q_projk_projv_projo_projlm_head
Gemma
q_projk_projv_projo_projgate_projup_projdown_proj
Phi-3
qkv_projo_projgate_up_projdown_projlm_head
Phi-2
q_projk_projv_projdensefc1fc2lm_head
Qwen2
q_projk_projv_projo_projgate_projup_projdown_projlm_head
Qwen
c_attnc_projw1w2lm_head
Command-R
q_projk_projv_projo_projgate_projup_projdown_projlm_head
DBRX
Wqkvout_projlm_head
GPT2
c_attnc_projc_fc
Bloom
query_key_valuedensedense_h_to_4hdense_4h_to_hlm_head
How to train
LoRA is a very popular fine-tuning method for LLMs, and as such there are a number of options for creating them from your data, including the following (non-exhaustive) options.