Instructions to use Fatnaoui/dpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Fatnaoui/dpo with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("aubmindlab/aragpt2-base") model = PeftModel.from_pretrained(base_model, "Fatnaoui/dpo") - Notebooks
- Google Colab
- Kaggle
dpo
This model is a fine-tuned version of aubmindlab/aragpt2-base on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.4430
- Rewards/chosen: 5.6454
- Rewards/rejected: 3.4725
- Rewards/accuracies: 0.8448
- Rewards/margins: 2.1729
- Logps/rejected: -779.9817
- Logps/chosen: -1153.5770
- Logits/rejected: -3.0501
- Logits/chosen: -3.3050
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 200
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.6408 | 0.0769 | 10 | 0.5783 | 0.8112 | 0.4760 | 0.8448 | 0.3351 | -809.9464 | -1201.9193 | -3.1723 | -3.4517 |
| 0.4779 | 0.1538 | 20 | 0.5212 | 2.0793 | 1.2294 | 0.8276 | 0.8499 | -802.4123 | -1189.2380 | -3.1494 | -3.4178 |
| 0.4764 | 0.2308 | 30 | 0.5064 | 2.9292 | 1.7859 | 0.8362 | 1.1433 | -796.8478 | -1180.7388 | -3.1253 | -3.3872 |
| 0.4429 | 0.3077 | 40 | 0.4839 | 3.3147 | 2.0341 | 0.8276 | 1.2806 | -794.3660 | -1176.8838 | -3.1091 | -3.3693 |
| 0.4766 | 0.3846 | 50 | 0.5141 | 3.4676 | 2.1403 | 0.8190 | 1.3274 | -793.3040 | -1175.3546 | -3.0906 | -3.3490 |
| 0.4798 | 0.4615 | 60 | 0.5002 | 3.5407 | 2.1864 | 0.8276 | 1.3543 | -792.8427 | -1174.6235 | -3.0892 | -3.3485 |
| 0.4054 | 0.5385 | 70 | 0.4733 | 3.5696 | 2.2000 | 0.8362 | 1.3696 | -792.7064 | -1174.3348 | -3.0960 | -3.3586 |
| 0.377 | 0.6154 | 80 | 0.4556 | 3.9933 | 2.4739 | 0.8448 | 1.5194 | -789.9678 | -1170.0979 | -3.0895 | -3.3516 |
| 0.4159 | 0.6923 | 90 | 0.4460 | 4.4103 | 2.7327 | 0.8362 | 1.6777 | -787.3801 | -1165.9279 | -3.0808 | -3.3423 |
| 0.3655 | 0.7692 | 100 | 0.4507 | 4.7961 | 2.9496 | 0.8448 | 1.8465 | -785.2107 | -1162.0699 | -3.0704 | -3.3290 |
| 0.335 | 0.8462 | 110 | 0.4592 | 5.0963 | 3.1378 | 0.8534 | 1.9585 | -783.3284 | -1159.0679 | -3.0658 | -3.3242 |
| 0.3374 | 0.9231 | 120 | 0.4784 | 5.4616 | 3.3750 | 0.8534 | 2.0866 | -780.9568 | -1155.4149 | -3.0582 | -3.3136 |
| 0.2969 | 1.0 | 130 | 0.4803 | 5.5532 | 3.4306 | 0.8534 | 2.1226 | -780.4006 | -1154.4990 | -3.0565 | -3.3120 |
| 0.2832 | 1.0769 | 140 | 0.4859 | 5.6912 | 3.5236 | 0.8448 | 2.1675 | -779.4703 | -1153.1194 | -3.0532 | -3.3079 |
| 0.3746 | 1.1538 | 150 | 0.4890 | 5.8066 | 3.5976 | 0.8448 | 2.2090 | -778.7309 | -1151.9652 | -3.0512 | -3.3061 |
| 0.386 | 1.2308 | 160 | 0.4675 | 5.7611 | 3.5620 | 0.8448 | 2.1990 | -779.0862 | -1152.4202 | -3.0508 | -3.3057 |
| 0.2852 | 1.3077 | 170 | 0.4615 | 5.7631 | 3.5564 | 0.8448 | 2.2067 | -779.1427 | -1152.4001 | -3.0499 | -3.3041 |
| 0.3886 | 1.3846 | 180 | 0.4501 | 5.6984 | 3.5097 | 0.8448 | 2.1887 | -779.6100 | -1153.0469 | -3.0502 | -3.3049 |
| 0.368 | 1.4615 | 190 | 0.4441 | 5.6548 | 3.4791 | 0.8448 | 2.1757 | -779.9158 | -1153.4833 | -3.0502 | -3.3049 |
| 0.318 | 1.5385 | 200 | 0.4430 | 5.6454 | 3.4725 | 0.8448 | 2.1729 | -779.9817 | -1153.5770 | -3.0501 | -3.3050 |
Framework versions
- PEFT 0.14.0
- Transformers 4.45.2
- Pytorch 2.3.1+cu121
- Datasets 3.2.0
- Tokenizers 0.20.3
- Downloads last month
- 15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Fatnaoui/dpo
Base model
aubmindlab/aragpt2-base