OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning
仓库地址:https://github.com/rui-ye/OpenFedLLM
- 运行指令
1 2 3 4 5
| CUDA_VISIBLE_DEVICES=1 python main_sft.py --model_name_or_path "dataroot/models/NousResearch/Llama-2-7b-hf" --dataset_name "lucasmccabe-lmi/CodeAlpaca-20k" --dataset_sample 20000 --fed_alg "fedavg" --num_clients 20 --sample_clients 2 --max_steps 10 --num_rounds 200 --batch_size 16 --gradient_accumulation_steps 1 --seq_length 512 --peft_lora_r 32 --peft_lora_alpha 64 --use_peft --load_in_8bit --output_dir "./output" --template "alpaca"
这里修改了--model_name_or_path "dataroot/models/NousResearch/Llama-2-7b-hf" 修改了--dataset_name "lucasmccabe-lmi/CodeAlpaca-20k" 这两个是已下载的模型和数据集
|
同时因为是已有的数据集,所以要修改代码utils/process_dataset.py
1 2
| dataset = load_from_disk(dataset_name)
|
CUDA_VISIBLE_DEVICES=$gpu python main_dpo.py
–model_name_or_path ehartford/Wizard-Vicuna-7B-Uncensored
–dataset_name Anthropic/hh-rlhf
–dataset_sample 20000
–fed_alg “fedavg”
–num_clients 5
–sample_clients 2
–learning_rate 5e-4
–max_steps 10
–num_rounds 200
–batch_size 16
–gradient_accumulation_steps 1
–seq_length 512
–use_peft
–load_in_8bit
–output_dir ./output
–template “vicuna_v1.1” \
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| watch -n 1 nvidia-smi
screen -S jcx 新建一个名为jcx的屏幕 screen -ls查看pid号:9889 下次使用这个屏幕:screen -r 9889 lsof -i:5000查看指定port screen -D -r jcx重新显示
free查看空闲内存 free -m以MB为单位展示 top S睡眠,R运行,T跟踪,Z僵尸进程 htop查看内存 ps u可以查看进程 kill -9 进程号 ps aux|grep pheimg
|