Do You Find LM Arena Leaderboard Useful?

in #stem2 days ago

"AI" benchmarks can be gamed with some effort. I have seen scenarios where a changing of the order of the answers in an MCQ resulting in drastically difference performance "AI". When there is a massive gap in benchmarks between models, we can also see that they are either an older model or of a different size. In such cases, the users don't even need the benchmark to figure out which is better.

LM Arena Offer Blind Testing

LM Arena 1.png

I asked a very short and simple prompt about HIVE and I got the results in very fast. The paid subscriptions are handled by LM Arena. Uses can select one of four options. Once the the voting is complete, the model names are revealed.

LM Arena 2.png

The results of these votes are used to rank various models against each other. The votes come from a small sample of enthusiasts who already know about LM Arena. Since the same userbase is the one that is most likely to know and understand "AI", I don't think the sample size is going to bea problem.

Current Leaderboard With xAI on Top

LM Arena Top 10.png

Sort:  

I'll need to use it more and see. But normally, a ranking where users see side-by-side answers from two models and they don't know what models they are, and they select the better answer should be a pretty accurate ranking.

Congratulations @vimukthi! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

You got more than 27000 replies.
Your next target is to reach 27500 replies.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP