Follow us on Twitter or LinkedIn to stay up to date with future analysis
Artificial AnalysisArtificial Analysis
Insights Login
  • Artificial AnalysisArtificial Analysis
  • Hardware
  • AI Trends
  • MicroEvals
  • Articles
Insights Login

Speech Reasoning Benchmarking Overview

Our Speech Reasoning benchmark evaluates the ability of models that support native audio input and output, referred to as "native audio models", to answer reasoning-based questions.

Native audio models are provided with an input audio file and are expected to generate an output audio file that contains the answer to the question included in the input audio file. No additional information is provided to the native audio model.

The output audio file from the native audio model is transcribed, forming a "candidate answer". This candidate answer is then provided to an automatic evaluation system that leverages an AI model as a judge. The judge model is provided with the candidate answer, official answer and original question as context and is prompted to label the candidate answer as correct or incorrect.

Evaluation is performed on the Artificial Analysis Big Bench Audio dataset. More information about this dataset can be found here.

Footer

Key Links

  • Compare Language Models
  • Language Models Leaderboard
  • Language Model API Leaderboard
  • Image Arena
  • Video Arena
  • Speech Arena

Artificial Analysis

  • FAQ
  • Contact & Data access
  • Terms of Use
  • Privacy Policy
  • hello@artificialanalysis.ai

Subscribe to our newsletter

TwitterLinkedIn