使用 LLaMA-Factory 微调 Qwen3 模型

1. 原始模型测试

启动环境

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
nerdctl run -it \
        --gpus all \
        --ipc=host \
        -p 8000:8000 \
        --ulimit memlock=-1 \
        --ulimit stack=67108864 \
        --name vllm \
        --volume /data/models:/data/models \
        --entrypoint /bin/bash \
        vllm/vllm-openai:v0.10.1.1

后面 vllm 相关的测试，都基于这个环境。

启动服务

1
2
3
4
5
6
export CUDA_VISIBLE_DEVICES=0
python3 -m vllm.entrypoints.openai.api_server \
  --model /data/models/Qwen3-0.6B \
  --served-model-name /data/models/Qwen3-0.6B \
  --host 0.0.0.0 \
  --port 8000

测试

1
2
3
4
5
6
7
8
curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/data/models/Qwen3-0.6B",
    "messages": [
      {"role": "user", "content": "介绍一下你自己/no_think"}
    ]
  }'

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
  "id": "chatcmpl-e831f557a8384a9b833e8033f7ac8ffa",
  "object": "chat.completion",
  "created": 1758524426,
  "model": "/data/models/Qwen3-0.6B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\n\n</think>\n\n您好！我是小明，来自中国。很高兴和您交流。如果您有任何问题或需要帮助，请随时告诉我！",
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning_content": null
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 13,
    "total_tokens": 42,
    "completion_tokens": 29,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null,
  "kv_transfer_params": null
}

2. 数据格式

常见的微调数据格式有 alpaca 和 sharegpt 两种。

2.1 alpaca

适合单轮问答

单轮问答

1
2
3
4
5
{
  "instruction": "介绍一下你自己",
  "input": "",
  "output": "我是 Ops Copilot，一个面向运维领域的智能助手，可以帮助你解决各种运维相关的问题。"
}

多轮对话

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
[
  {
    "instruction": "你能帮我监控 Kubernetes 吗？",
    "input": "",
    "output": "当然可以，我可以通过 Prometheus 采集 Kubernetes 的监控指标，并提供告警与可视化。",
    "system": "你是 Ops Copilot，一个面向运维领域的智能助手。",
    "history": [
      [
        "你好",
        "你好，我是 Ops Copilot，一个面向运维领域的智能助手，可以帮助你解决各种运维相关的问题。"
      ]
    ]
  }
]

2.2 sharegpt

适合多轮对话

基本结构

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
  "conversations": [
    {
      "from": "system",
      "value": "你是一个专业的Python编程助手，请提供清晰、准确的代码示例。"
    },
    {
      "from": "human",
      "value": "如何用Python读取CSV文件？"
    },
    {
      "from": "gpt",
      "value": "可以使用pandas库来读取CSV文件：\n\n```python\nimport pandas as pd\ndf = pd.read_csv('file.csv')\nprint(df.head())\n```\n\n这样就能轻松加载和查看CSV数据了。"
    }
  ]
}

3. 微调模型

启动环境

1
2
3
4
5
6
7
8
9
nerdctl run -it \
        --gpus all \
        --ipc=host \
        --ulimit memlock=-1 \
        --ulimit stack=67108864 \
        --name llamafactory \
        --volume /data/models:/data/models \
        --entrypoint /bin/bash \
        hiyouga/llamafactory:0.9.4

后面 llamafactory 相关的测试，都基于这个环境。

准备数据

1
2
3
4
5
6
7
echo '[
  {
    "instruction": "介绍一下你自己",
    "input": "",
    "output": "我是 Ops Copilot，一个面向运维领域的智能助手，可以帮助你解决各种运维相关的问题。"
  }
]' > /data/models/dataset/alpaca_test.json

注册数据

1
2
3
4
5
echo '{
  "alpaca_test.json": {
    "file_name": "alpaca_test.json"
  }
}' > /data/models/dataset/dataset_info.json

运行微调

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
export CUDA_VISIBLE_DEVICES=0
llamafactory-cli train \
    --do_train \
    --stage sft \
    --model_name_or_path /data/models/Qwen3-0.6B \
    --dataset alpaca_test.json \
    --dataset_dir /data/models/dataset \
    --template qwen3 \
    --finetuning_type lora \
    --output_dir /data/models/Qwen3-0.6B-lora-sft \
    --per_device_train_batch_size 1 \
    --max_steps 20 \
    --learning_rate 1e-4 \
    --logging_steps 1 \
    --save_steps 10 \
    --save_total_limit 1 \
    --overwrite_output_dir \
    --warmup_ratio 0

1
2
3
4
5
6
7
***** train metrics *****
  epoch                    =       20.0
  total_flos               =     1991GF
  train_loss               =     1.0216
  train_runtime            = 0:00:05.97
  train_samples_per_second =      3.345
  train_steps_per_second   =      3.345

查看 lora 权重

1
2
3
4
5
6
ls /data/models/Qwen3-0.6B-lora-sft

README.md                  all_results.json     special_tokens_map.json  trainer_log.jsonl
adapter_config.json        chat_template.jinja  tokenizer.json           trainer_state.json
adapter_model.safetensors  checkpoint-1         tokenizer_config.json    training_args.bin
added_tokens.json          merges.txt           train_results.json       vocab.json

合并权重

1
2
3
4
5
6
7
8
export CUDA_VISIBLE_DEVICES=0
llamafactory-cli export \
    --model_name_or_path /data/models/Qwen3-0.6B \
    --adapter_name_or_path /data/models/Qwen3-0.6B-lora-sft \
    --export_dir /data/models/Qwen3-0.6B-lora-merged \
    --template qwen \
    --export_size 2 \
    --finetuning_type lora

查看合并后的模型

1
2
3
4
5
ls /data/models/Qwen3-0.6B-lora-merged/

Modelfile            config.json             model.safetensors        tokenizer_config.json
added_tokens.json    generation_config.json  special_tokens_map.json  vocab.json
chat_template.jinja  merges.txt              tokenizer.json

微调前后，模型大小没有变化。

4. 测试微调后的模型

4.1 合并 lora 模型

启动服务

1
2
3
4
5
6
export CUDA_VISIBLE_DEVICES=0
python3 -m vllm.entrypoints.openai.api_server \
  --model /data/models/Qwen3-0.6B-lora-merged \
  --served-model-name /data/models/Qwen3-0.6B \
  --host 0.0.0.0 \
  --port 8000

合并之后的模型，具有更好的推理性能，也更容易管理和部署。

测试

1
2
3
4
5
6
7
8
curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/data/models/Qwen3-0.6B",
    "messages": [
      {"role": "user", "content": "介绍一下你自己/no_think"}
    ]
  }'

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
  "id": "chatcmpl-9da628e2e9ed466e82b2c5ff2ea295ad",
  "object": "chat.completion",
  "created": 1758596914,
  "model": "/data/models/Qwen3-0.6B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\n\n</think>\n\n我是 Ops Copilot，一个面向运维领域的智能助手，可以帮助你解决各种运维相关的问题。",
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning_content": null
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 13,
    "total_tokens": 38,
    "completion_tokens": 25,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null,
  "kv_transfer_params": null
}

看到模型输出的自我介绍是 Ops Copilot，说明我们构造的数据已经训练到模型中了。

4.2 单独加载 lora 模型

启动服务

1
2
3
4
5
6
7
8
export CUDA_VISIBLE_DEVICES=0
python3 -m vllm.entrypoints.openai.api_server \
  --model /data/models/Qwen3-0.6B \
  --served-model-name /data/models/Qwen3-0.6B \
  --enable-lora \
  --lora-modules ops-lora=/data/models/Qwen3-0.6B-lora-sft \
  --host 0.0.0.0 \
  --port 8000

单独加载 lora 模型更灵活，也更于分享模型，还可以同时加载多个 lora 模型。

测试

需要注意的是，这里的 model 名称是 lora 模块的名称 ops-lora

1
2
3
4
5
6
7
8
curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ops-lora",
    "messages": [
      {"role": "user", "content": "介绍一下你自己/no_think"}
    ]
  }'

1
{"id":"chatcmpl-8ad998bdcbc3447892ba0228d7b5f55d","object":"chat.completion","created":1758597178,"model":"ops-lora","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\n\n</think>\n\n我是 Ops Copilot，一个面向运维领域的智能助手，可以帮助你解决各种运维相关的问题。","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":13,"total_tokens":38,"completion_tokens":25,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}

合并 lora 模型与单独加载 lora 模型，输出结果一致。

5. 总结

本篇主要是记录 LLaMAFactory 微调 Qwen3 模型的过程，并测试了微调后的模型。其中很多参数并没有深入研究，只是尝试走一遍微调流程，为 Ops Copilot 模型的微调做准备。

目前的大模型相关的工具链越来越成熟，相关的操作变得越来越简单。LLaMAFactory 还提供了 Web 界面操作，可以很方便地进行微调。