MicroEvals | Artificial Analysis

👍23

p5.js physics

A collection of physics animations, mostly using p5.js

👍16

SVG Animals

Inspired by Simon Willison's pelican on a bicycle test

👍13

StrawberryEval

Easy questions that LLMs still trip up on

👍11

Arcade racing games

80s-style arcade racing games

👍8

Deep Meaningful Questions

Models responses to life questions

👍7

ASCII artwork

A series of challenging ASCII artworks in different styles

👍7

SVG of a Bicycle

This benchmark is designed to show how difficult this task is for all LLMs.

👍6

Medical Quiz 2

Renal physiology quiz.

👍6

Game Design

👍5

One-shot games

Generate a complete, ready-to-play browser game with a single prompt

👍5

Hacker News clone

Clone of Hacker News website

👍5

Workout plans

👍5

Medical Quiz 1

Endocrine system quiz. Disclaimer: translated from spanish to english

👍4

Existential Snake Game

A Twist on the Snake Game where the snake is having an existential crisis

👍4

Controversial questions

👍3

Paris

An ASCII artwork of the Eiffel Tower

👍3

Spotify clone

👍3

Conway's Game of Life - Gosper Glider Gun

Generate the complete HTML, CSS, and JavaScript code for a web-based simulation of Conway's Game of Life.

👍3

3D Engineering and Physics Simulations

A variety of difficult programming tests with complex input and output parameters.

👍2

Basis Eval

👍2

7 Seater SUV ELO Tournament

Who will win the battle of the 7 seater SUV fight to the death

👍2

Gantt Chart

👍2

Tic tac toe.

First test

👍2

Few Simple Experimentations

Testing a few simple experimentations and visualizations

👍2

Dungeon Crawler

👍2

Dungeon Generator

👍2

Random number check

👍2

GODOT

Pico

👍2

Gemma 3 sci-fi inteface

👍2

Can AI Push Its Boundaries to Make the Most Astonishing Pieces of Music

👍2

EchoCanvas

👍2

Fractals in Three.js

👍2

Animated Fractal with Three.js

This microeval asks the LLM to produce a single HTML code block with optional CSS JavaScript and GLSL that renders a full screen Julia set shader animation with smooth color transitions from golden yellow through orange magenta purple to deep indigo, continuously morphs via a rotating complex constant, supports click and drag panning mouse wheel or pinch zoom and a space bar toggle for play and pause, and relies solely on Three.js loaded from a CDN.

👍2

LLM Common Sense

👍2

QWERTY Eval

Words spelt out using indexes of the key on a QWERTY keyboard

👍2

Can it recreate DOOM?

👍2

Consultations pharmaceutiques

Cette évaluation contient des exemples de dialogues simulant des consultations pharmaceutiques courantes. Les prompts incluent des demandes de conseils sur des vitamines, des symptômes bénins, ou des situations nécessitant une évaluation initiale avant orientation vers un professionnel de santé. Ils sont conçus pour tester la capacité d’un modèle à fournir des réponses utiles, sécuritaires et conformes aux bonnes pratiques pharmaceutiques.

👍1

Simple Word Counting

👍1

Micro sci-fi

Write a tweet-length sci-fi story

👍1

0~9→23

👍1

百合是什么？

👍1

Interactive voronoi

👍1

IQ test Generator

An visual IQ test generator

👍1

Info graph

Use svg

👍1

Svelte 5

Can the models correctly write Svelte 5 components? Do they avoid using patterns from earlier versions?

👍1

面饼泡面

👍1

没有菜花汤的紫菜蛋花汤

👍1

偶像鸡/idol chicken

👍1

Medical Quiz 3

Cardiac phisiology quiz.

👍1

Czech Knowledge Prompts

Testing knowledge of Czech culture and language - designed to test smaller models based on https://semanticmachines.notion.site/evals

👍1

Guess a number

👍1

Full Glass of Wine

Glasses of wine are traditionally only half-full.

👍1

SVG作成(雲雨傘)

Kenta

👍1

benchmaxx catbench

Write a SVG animation that draws a cute kitten using html and css.

👍1

Public Simple Bench Questions

The 10 public Simple Bench questions (https://simple-bench.com/)

👍1

JPEval

JPEvalは、LLMの苦手とする日本語で問題を行います！

👍1

uyftu

API Key for "[email protected]"

👍1

Strawrbrerrry [sic] eval

Reasoning should include the ability to generalize to unfamiliar words instead of memorizing answers. Let's see if models can detect the number of 'r's in the word "strawrbrerrry."

👍1

Minecraft 3D

A basic minecraft 3D eval. It should create a basic chunk with a greedy mesher, block placing and distroying and fps camera

👍1

PUSHiNG BOUNDARiES - AiS VERSUS SVG GENERATiON

👍1

Minecraft Knowledge and Clone Creation

The purpose of this is to evaluate how good various AI models are at a variety of Minecraft skills like planning, designing, puzzles, and providing accurate information. It’ll also test to see how good each AI is at coding a web app clone of the game.

👍1

Balls bouncing inside a spinning hexagon

👍1

Double Pendulum Simulation

👍1

Candle test

👍1

AtttTention 't' count

More 't' s and a 'T' is added to confuse AI

👍1

Interactive Financial Report

👍1

Beautiful and creative login page

Simple and detailed prompt and persian version

👍1

Emoji Carma

A set of difficult tasks with the theme of emoji's including: tier list creation, music creation and, themeable website creation.

👍1

Interactive Liquid Sorting Puzzle game

👍1

Math riddles with visual solution

5 challenging math problem

👍1

The Zero-Knowledge Challenge: A Visual Primer (Game)

👍1

Blockchain Journey Game

Detailed prompt and simple prompt

👍1

Two Math challenges

Persian prompts but translated in output

👍1

Interactive Concepts for Mathematical Problems

Visual perception of the 5 most important unsolved concepts in mathematics!

👍1

draw the flag in svg mode

👍1

Planetary orchestra

👍1

SVGs

Assorted SVG generation prompts

👍1

Mobile Game Development (Web Apps)

How well can AI create fun, little games as web apps that can be played on mobile devices? This benchmark tests AI’s abilities to generate good mobile UI and controls as well as basic gameplay experiences.

👍1

Budget Tracker App

👍0

Requires reasoning + SHA1

from https://x.com/goodside/status/1934833254726521169/photo/1

👍0

GOV TEST

THE AI CAN MAKE AN INFOGRAPH ABOUT THE COMPOSITION OF THE GOV.

👍0

Quantum circuit

👍0

APLEX BENCH V1

👍0

Website creation ability

👍0

Death event extraction

👍0

Micro gold

👍0

Tetris

Tetris that runs in a web browser

👍0

Wordle Clone

👍0

Riemann Hypothesis

Attempted proof of it

👍0

Test

👍0

Logic capability based on historical information

👍0

Test

-

👍0

Ahmed

👍0

Dungeon Web App

👍0

Cycle Analyser

👍0

progress tracker (flutter)

syuaib

👍0

Meaning of life

👍0

bvgcg

nbhvvg

👍0

math reasoning

👍0

Readability Analysis Tool

This prompt tests knowledge and design sense of coding models. It compares smaller and larger models of the same family.

👍0

PG Wodehouse Variations

Shows how well the best models can write.

👍0

3d race

Race 3d

👍0

小宝龟（糖友版

一个帮助糖友管理日常饮食、住院记录、生活娱乐的助手

👍0

Finding and interpreting legal judgments

This asks about an Australian case that is widely cited, but not widely mentioned on the internet. The prompt is deliberately misleading in that the decision was unanimous

👍0

Simple arithmetic

👍0

manual sqrt

👍0

lecture slides introduction to dynamical sytems theory

👍0

i am faaiz and i am doing this art project generate image

👍0

5 odd numbers

Pallav Agarwal

👍0

Sample Roleplay

Simple roleplay prompt example (by olety)

👍0

Epoch converter

convert a timestamp to epoch

👍0

AI sjsjwki11

ai

👍0

HFY Sci-Fi Story Idea Eval

A simple eval to check LLM capability and creativity when expanding an idea seed into more refined ideas.

👍0

Roblox luau eval

Which llm is the best at generating Roblox related code?

👍0

Test

👍0

algorithm

👍0

شبس

👍0

QWERTY Prime Test

👍0

HTML5 Structure

👍0

Elliott Wave

Trading Analysis Using Elliott Wave

👍0

test

👍0

AI's Making Music

👍0

AI Gamecoding Clones

👍0

Vanishing HTML Eval

👍0

Financial Analysis

Revenue analysis of Apple

👍0

Generated p5.js prompts eval

👍0

Simple Webpages W/ Art Twist

👍0

Weird Frontpage

👍0

The Glitch Puppeteer

👍0

شبام

اي

👍0

The Deconstructed Avian Glitch-Scape

👍0

Web Design and IAs

👍0

DBD

👍0

哈哈我

龙

👍0

MCP

MCP server to facilitate other integrate with us

👍0

MicroEval2

👍0

Gender Gap: Italy

👍0

Voice AI agent pricing calculator

Create voice agent pricing calculator by Nikhil. R

👍0

Raytracing

👍0

mytitle

no

👍0

Mini Games

👍0

Posicionamiento Alegra

👍0

Physics Benchmarck

A high-level physics benchmarck

👍0

111

222

👍0

riddle reversed

e

👍0

GlassMorphism Website

👍0

Educational Web

👍0

Genaro villegas check

👍0

IQ Test Generator

Creates dynamic IQ Tests

👍0

meh

meh meh

👍0

Ant Riddle

Adapted from 2024 IMO Problem 5

👍0

Gibberish Avant-Garde HTML Prompts

👍0

line up for love requesting

👍0

R Shiny clone of Spotify

👍0

Semiconductor Analysis

Build out industry supply demand

👍0

生成3d的太阳系

👍0

Black Hole Visualization Application

Generation of a HTML application that renders a mathematically-accurate black hole. (I already feel the fascination the fans of Interstellar have right now.)