Run your AI agent on test questions and submit answers
Run agent evaluation and submit answers for scoring