fedllm代码调试

OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning

仓库地址：https://github.com/rui-ye/OpenFedLLM

运行指令

CUDA_VISIBLE_DEVICES=1 python main_sft.py  --model_name_or_path "dataroot/models/NousResearch/Llama-2-7b-hf"  --dataset_name "lucasmccabe-lmi/CodeAlpaca-20k"  --dataset_sample 20000  --fed_alg "fedavg"  --num_clients 20  --sample_clients 2  --max_steps 10  --num_rounds 200  --batch_size 16  --gradient_accumulation_steps 1  --seq_length 512  --peft_lora_r 32  --peft_lora_alpha 64  --use_peft  --load_in_8bit  --output_dir "./output"  --template "alpaca" 

这里修改了--model_name_or_path "dataroot/models/NousResearch/Llama-2-7b-hf"
修改了--dataset_name "lucasmccabe-lmi/CodeAlpaca-20k"
这两个是已下载的模型和数据集

同时因为是已有的数据集，所以要修改代码utils/process_dataset.py

1 2	`# dataset = load_dataset(dataset_name, split="train") dataset = load_from_disk(dataset_name)`

CUDA_VISIBLE_DEVICES=$gpu python main_dpo.py
–model_name_or_path ehartford/Wizard-Vicuna-7B-Uncensored
–dataset_name Anthropic/hh-rlhf
–dataset_sample 20000
–fed_alg “fedavg”
–num_clients 5
–sample_clients 2
–learning_rate 5e-4
–max_steps 10
–num_rounds 200
–batch_size 16
–gradient_accumulation_steps 1
–seq_length 512
–use_peft
–load_in_8bit
–output_dir ./output
–template “vicuna_v1.1” \

watch -n 1 nvidia-smi

screen -S jcx 新建一个名为jcx的屏幕
screen -ls查看pid号：9889
下次使用这个屏幕：screen -r 9889
lsof -i:5000查看指定port
screen -D -r jcx重新显示

free查看空闲内存
free -m以MB为单位展示
top S睡眠，R运行，T跟踪，Z僵尸进程
htop查看内存
ps u可以查看进程
kill -9 进程号
ps aux|grep pheimg

联邦学习 > 大模型

#论文代码调试

fedllm代码调试

http://example.com/2025/02/17/fedllm代码调试/

作者

杨桃非桃

发布于

2025年2月17日

许可协议

FL-LLM综述上一篇

常见问题下一篇