A collection of physics animations, mostly using p5.js
Inspired by Simon Willison's pelican on a bicycle test
Easy questions that LLMs still trip up on
80s-style arcade racing games
Models responses to life questions
A series of challenging ASCII artworks in different styles
Renal physiology quiz.
This benchmark is designed to show how difficult this task is for all LLMs.
Endocrine system quiz. Disclaimer: translated from spanish to english
Generate a complete, ready-to-play browser game with a single prompt
Clone of Hacker News website
A Twist on the Snake Game where the snake is having an existential crisis
An ASCII artwork of the Eiffel Tower
Generate the complete HTML, CSS, and JavaScript code for a web-based simulation of Conway's Game of Life.
Who will win the battle of the 7 seater SUV fight to the death
First test
Pico
Write a tweet-length sci-fi story
An visual IQ test generator
Use svg
Can the models correctly write Svelte 5 components? Do they avoid using patterns from earlier versions?
Testing a few simple experimentations and visualizations
Cardiac phisiology quiz.
Testing knowledge of Czech culture and language - designed to test smaller models based on https://semanticmachines.notion.site/evals
Glasses of wine are traditionally only half-full.
Kenta
Write a SVG animation that draws a cute kitten using html and css.
The 10 public Simple Bench questions (https://simple-bench.com/)
JPEvalは、LLMの苦手とする日本語で問題を行います!
API Key for "[email protected]"
Reasoning should include the ability to generalize to unfamiliar words instead of memorizing answers. Let's see if models can detect the number of 'r's in the word "strawrbrerrry."
A basic minecraft 3D eval. It should create a basic chunk with a greedy mesher, block placing and distroying and fps camera
The purpose of this is to evaluate how good various AI models are at a variety of Minecraft skills like planning, designing, puzzles, and providing accurate information. It’ll also test to see how good each AI is at coding a web app clone of the game.
from https://x.com/goodside/status/1934833254726521169/photo/1
THE AI CAN MAKE AN INFOGRAPH ABOUT THE COMPOSITION OF THE GOV.
Micro gold
Tetris that runs in a web browser
Attempted proof of it
Test
-
This prompt tests knowledge and design sense of coding models. It compares smaller and larger models of the same family.
Shows how well the best models can write.
Race 3d
一个帮助糖友管理日常饮食、住院记录、生活娱乐的助手
This asks about an Australian case that is widely cited, but not widely mentioned on the internet. The prompt is deliberately misleading in that the decision was unanimous
Pallav Agarwal
Simple roleplay prompt example (by olety)
convert a timestamp to epoch
ai
A simple eval to check LLM capability and creativity when expanding an idea seed into more refined ideas.
Which llm is the best at generating Roblox related code?
شبس
Trading Analysis Using Elliott Wave
test
Revenue analysis of Apple
اي
龙
MCP server to facilitate other integrate with us
no