Reasoning should include the ability to generalize to unfamiliar words instead of memorizing answers. Let's see if models can detect the number of 'r's in the word "strawrbrerrry."
Prompt
How many letters 'r' are in the word "strawrbrerrry?"
Answer guidance
We're hoping for 6