11 8 7

Shashwat Goel

shash42

https://www.shash42.github.io

AI & ML interests

Science of Deep Learning, Safe AI

Recent Activity

liked a model about 10 hours ago

nikhilchandak/OpenForecaster-8B

liked a dataset about 10 hours ago

nikhilchandak/OpenForesight

upvoted a paper about 16 hours ago

Scaling Open-Ended Reasoning to Predict the Future

View all activity

Organizations

liked a model about 10 hours ago

nikhilchandak/OpenForecaster-8B

Question Answering • 8B • Updated about 24 hours ago • 4 • 4

liked a dataset about 10 hours ago

nikhilchandak/OpenForesight

Viewer • Updated 4 days ago • 52.7k • 32 • 1

upvoted a paper about 16 hours ago

Scaling Open-Ended Reasoning to Predict the Future

Paper • 2512.25070 • Published 1 day ago • 11

submitted a paper to Daily Papers about 16 hours ago

Scaling Open-Ended Reasoning to Predict the Future

Paper • 2512.25070 • Published 1 day ago • 11

upvoted a paper 3 days ago

Training AI Co-Scientists Using Rubric Rewards

Paper • 2512.23707 • Published 3 days ago • 15

submitted a paper to Daily Papers 3 days ago

Training AI Co-Scientists Using Rubric Rewards

Paper • 2512.23707 • Published 3 days ago • 15

upvoted a paper 3 months ago

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

Paper • 2509.14234 • Published Sep 17, 2025 • 5

commented a paper 4 months ago

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Paper • 2509.09677 • Published Sep 11, 2025 • 34 •

upvoted a paper 4 months ago

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Paper • 2509.09677 • Published Sep 11, 2025 • 34

liked a dataset 4 months ago

arvindh75/Long-Horizon-Execution

Viewer • Updated Sep 16, 2025 • 100 • 178 • 13

New activity in ByteDance-Seed/Seed-OSS-36B-Instruct 4 months ago

Official vllm support

👀 2

#1 opened 4 months ago by

shash42

upvoted a collection 6 months ago

answer-matching

Collection

Free-form datasets, human annotations, and sample-level model outputs for "Answer Matching Outperforms Multiple Choice for Language Model Evaluation" • 2 items • Updated Jul 3, 2025 • 2

commented a paper 6 months ago

Answer Matching Outperforms Multiple Choice for Language Model Evaluation

Paper • 2507.02856 • Published Jul 3, 2025 • 8 •

upvoted a paper 7 months ago

Pitfalls in Evaluating Language Model Forecasters

Paper • 2506.00723 • Published May 31, 2025 • 3

commented a paper 7 months ago

Pitfalls in Evaluating Language Model Forecasters

Paper • 2506.00723 • Published May 31, 2025 • 3 •

updated a dataset 8 months ago

shash42/GPQA-Diamond-Verify

Viewer • Updated May 9, 2025 • 792 • 18

published a dataset 8 months ago

shash42/GPQA-Diamond-Verify

Viewer • Updated May 9, 2025 • 792 • 18

updated a dataset 8 months ago

shash42/MATH-Verify

Viewer • Updated May 9, 2025 • 19.7k • 21

published a dataset 8 months ago

shash42/MATH-Verify

Viewer • Updated May 9, 2025 • 19.7k • 21

updated a dataset 8 months ago

shash42/MMLU-Pro-Verify

Viewer • Updated May 9, 2025 • 114k • 13

Shashwat Goel

AI & ML interests

Recent Activity

Organizations

shash42's activity

Official vllm support