Really cool idea, but the site seems a bit biased for the chinese models, or is otherwise set up weird. I’m not able to reproduce how consistently bad the others are in web dev arena, which generally accepted as the gold standard for testing AI web dev ability.
Each model is allowed 2000 tokens to generate its clock. Here is its prompt:
Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting.
There’s a couple differences. It’s giving it the current time as part of the prompt, which is interesting. The other difference is that it’s asking it to make it responsive. But even when I use that exact prompt (inserting the time obv), it works fine on claude, openai, and gemini.
So there’s definitely an issue specific to this page somewhere. Maybe it’s not iframing them? I’m on mobile so I can’t check.
Really cool idea, but the site seems a bit biased for the chinese models, or is otherwise set up weird. I’m not able to reproduce how consistently bad the others are in web dev arena, which generally accepted as the gold standard for testing AI web dev ability.
are you using the same prompt?
There’s a couple differences. It’s giving it the current time as part of the prompt, which is interesting. The other difference is that it’s asking it to make it responsive. But even when I use that exact prompt (inserting the time obv), it works fine on claude, openai, and gemini.
So there’s definitely an issue specific to this page somewhere. Maybe it’s not iframing them? I’m on mobile so I can’t check.