Spaces:

TeddyYao
/

grok4-gpqa-eval

Sleeping

grok4-gpqa-eval / README.md

Upload 38 files

8474f02 verified 5 months ago

803 Bytes

	---
	title: Grok-4 GPQA Evaluation
	emoji: 🧠
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: "4.31.0"
	app_file: run_hf_space.py
	pinned: false
	---

	# Grok-4 GPQA Evaluation Dashboard

	Real-time evaluation of Grok-4 model on GPQA benchmark.

	## ⚙️ Configuration Required

	Please set these secrets in your Space settings:
	- GROK_API_KEY: Your Grok API key from x.ai
	- HF_TOKEN: Your Hugging Face token (for GPQA dataset access)

	## 📊 Features

	- Real-time progress tracking
	- Accuracy metrics and performance stats
	- Detailed results export
	- Support for full GPQA dataset (448 questions)

	## 🚀 Getting Started

	1. Set the required secrets in Space settings
	2. Make sure you have GPQA dataset access
	3. The evaluation will start automatically
	4. Monitor progress in the dashboard