arxiv:2206.06614
Luckeciano Carvalho Melo
luckeciano
·
AI & ML interests
Reinforcement Learning
Organizations
models
1,128
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-10-HessianMaskToken-0.0-LR-7.5e-7_2916
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-9-HessianMaskToken-0.0-LR-7.5e-7_9573
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-8-HessianMaskToken-0.0-LR-7.5e-7_8245
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-7-HessianMaskToken-0.0-LR-7.5e-7_3803
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-FisherMaskToken-1e-4-5e-7-HessianMaskToken-0.005-LR-7.5e-7_9528
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-FisherMaskToken-1e-4-1e-6-HessianMaskToken-0.005-LR-7.5e-7_1755
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-6-HessianMaskToken-0.0-LR-7.5e-7_5828
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-5-HessianMaskToken-0.0-LR-7.5e-7_7105
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-5-HessianMaskToken-0.01-LR-7.5e-7_8346
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-4-HessianMaskToken-0.005-LR-7.5e-7_8590
Updated
datasets
19
luckeciano/pku-llama3.1-8b-dataset-test-generations
Viewer
•
Updated
•
4.7M
•
9
luckeciano/pku-llama3.1-8b-dataset-train-generations
Viewer
•
Updated
•
1.36M
luckeciano/pku-alpaca3.1-8b-eval-gt-rewards
Viewer
•
Updated
•
4.7k
•
3
luckeciano/pku-alpaca3.1-8b-gt-rewards
Viewer
•
Updated
•
6.05M
luckeciano/pku-llama3.1-8b-answers-features-test
Viewer
•
Updated
•
4.42M
•
11
luckeciano/pku-llama3.1-8b-answers-features-train
Viewer
•
Updated
•
1.28M
•
30
luckeciano/pku-llama3.1-8b-dataset-features-gt-reward-modeling
Updated
luckeciano/pku-llama3.1-8b-dataset-features
Viewer
•
Updated
•
18.3k
•
48
luckeciano/PKU-SafeRLHF-Shifts
Viewer
•
Updated
•
18.3k
•
4
luckeciano/mistral8x22b-reddit-post-features
Viewer
•
Updated
•
92.9k
•
100