Spaces:
Sleeping
Sleeping
| title: Grok-4 GPQA Evaluation | |
| emoji: π§ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: "4.31.0" | |
| app_file: run_hf_space.py | |
| pinned: false | |
| # Grok-4 GPQA Evaluation Dashboard | |
| Real-time evaluation of Grok-4 model on GPQA benchmark. | |
| ## βοΈ Configuration Required | |
| Please set these secrets in your Space settings: | |
| - **GROK_API_KEY**: Your Grok API key from x.ai | |
| - **HF_TOKEN**: Your Hugging Face token (for GPQA dataset access) | |
| ## π Features | |
| - Real-time progress tracking | |
| - Accuracy metrics and performance stats | |
| - Detailed results export | |
| - Support for full GPQA dataset (448 questions) | |
| ## π Getting Started | |
| 1. Set the required secrets in Space settings | |
| 2. Make sure you have GPQA dataset access | |
| 3. The evaluation will start automatically | |
| 4. Monitor progress in the dashboard |