Skip to content

LoRA

Low Rank Adaptation (LoRA) is a popular adapter method for fine-tuning response quality.

LoRAX supports LoRA adapters trained using frameworks like PEFT and Ludwig.

How it works

graph BT
  I{{X}} --> W;
  I --> A[/LoRA A\];
  A --> B[\LoRA B/];
  W --> P((+));
  B--> P;
  P --> O{{Y}}

LoRA works by targeting specific layers of the base model and inserting a new low-rank pair of weights LoRA A and LoRA B alongside each base model param W. The input X is passed through both the original weights and the LoRA weights, and then the activations are summed together to produce the final layer output Y.

Usage

Supported Target Modules

When training a LoRA adapter, you can specify which of these layers (or "modules") you wish to target for adaptation. Typically these are the projection layers in the attention blocks (q and v, sometimes k and o as well for LLaMA like models), but can usually be any linear layer.

Here is a list of supported target modules for each architecture in LoRAX. Note that in cases where your adapter contains target modules that LoRAX does not support, LoRAX will ignore those layers and emit a warning on the backend.

Llama

  • q_proj
  • k_proj
  • v_proj
  • o_proj
  • gate_proj
  • up_proj
  • down_proj
  • lm_head

Mistral

  • q_proj
  • k_proj
  • v_proj
  • o_proj
  • gate_proj
  • up_proj
  • down_proj
  • lm_head

Mixtral

  • q_proj
  • k_proj
  • v_proj
  • o_proj
  • lm_head

Gemma

  • q_proj
  • k_proj
  • v_proj
  • o_proj
  • gate_proj
  • up_proj
  • down_proj

Phi-3

  • qkv_proj
  • o_proj
  • gate_up_proj
  • down_proj
  • lm_head

Phi-2

  • q_proj
  • k_proj
  • v_proj
  • dense
  • fc1
  • fc2
  • lm_head

Qwen2

  • q_proj
  • k_proj
  • v_proj
  • o_proj
  • gate_proj
  • up_proj
  • down_proj
  • lm_head

Qwen

  • c_attn
  • c_proj
  • w1
  • w2
  • lm_head

Command-R

  • q_proj
  • k_proj
  • v_proj
  • o_proj
  • gate_proj
  • up_proj
  • down_proj
  • lm_head

DBRX

  • Wqkv
  • out_proj
  • lm_head

GPT2

  • c_attn
  • c_proj
  • c_fc

Bloom

  • query_key_value
  • dense
  • dense_h_to_4h
  • dense_4h_to_h
  • lm_head

How to train

LoRA is a very popular fine-tuning method for LLMs, and as such there are a number of options for creating them from your data, including the following (non-exhaustive) options.

Open Source

Commercial