·
AI & ML interests
Reinforcement Learning
Organizations
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-10-HessianMaskToken-0.0-LR-7.5e-7_2916
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-9-HessianMaskToken-0.0-LR-7.5e-7_9573
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-8-HessianMaskToken-0.0-LR-7.5e-7_8245
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-7-HessianMaskToken-0.0-LR-7.5e-7_3803
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-FisherMaskToken-1e-4-5e-7-HessianMaskToken-0.005-LR-7.5e-7_9528
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-FisherMaskToken-1e-4-1e-6-HessianMaskToken-0.005-LR-7.5e-7_1755
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-6-HessianMaskToken-0.0-LR-7.5e-7_5828
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-5-HessianMaskToken-0.0-LR-7.5e-7_7105
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-5-HessianMaskToken-0.01-LR-7.5e-7_8346
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-4-HessianMaskToken-0.005-LR-7.5e-7_8590
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-4-HessianMaskToken-0.001-LR-7.5e-7_4015
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-4-HessianMaskToken-0.01-LR-7.5e-7_8665
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-4-HessianMaskToken-0.001-LR-1e-6_2063
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-4-HessianMaskToken-0.001-LR-1e-6_3752
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-4-HessianMaskToken-0.005-LR-1e-6_2255
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-4-HessianMaskToken-0.005-LR-1e-6_2783
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-4-HessianMaskToken-0.005-LR-1e-6_1050
Updated
luckeciano/Qwen-2.5-7B-GRPO-Base-v2-LogShifts-Eval_1355
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-5-HessianMaskToken-0.01-LR-1e-6_4302
Updated
luckeciano/Qwen-2.5-7B-GRPO-Base-v2-LogShifts-Eval_5830
Updated
luckeciano/Qwen-2.5-7B-GRPO-Adam-FisherMaskToken-1e-4-HessianMaskToken-0.01-v2-LogShiftsEval_5031
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-4-HessianMaskToken-0.01-LR-1e-6_3906
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_9100
Updated
luckeciano/Qwen-2.5-7B-GRPO-Adam-FisherMaskToken-1e-4-HessianMaskToken-0.01-v2-LogShiftsEval_6545
Updated
luckeciano/Qwen-2.5-7B-GRPO-Adam-FisherMaskToken-1e-4-HessianMaskToken-0.01-v2-LogShiftsEval_2326
Updated
luckeciano/Qwen-2.5-7B-GRPO-Adam-FisherMaskToken-1e-4-HessianMaskToken-0.01-v2-LogShiftsEval_8011
Updated
luckeciano/Qwen-2.5-7B-GRPO-Adam-FisherMaskToken-1e-4-HessianMaskToken-0.01-v2-LogShiftsEval_4045
Updated
luckeciano/Qwen-2.5-7B-GRPO-Adam-FisherMaskToken-1e-4-HessianMaskToken-0.01-v2-LogShiftsEval_6925
Updated
luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-LR-7.5e-7-v2_7814
Updated
luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-LR-5e-7-v2_2010
Updated