Large Language Models Benchmarks

11hon MSN

China's Z.ai GLM-5.2 tops OpenAI’s GPT 5.5 model on key benchmarks

Chinese startup Z.ai has launched GLM-5.2, a powerful AI model for complex coding projects. This new large language model ...

6hon MSNOpinion

Multilingual benchmark evaluates how well AI interprets clinical text and health records in nine languages

Researchers at Mass General Brigham recently developed BRIDGE, a multilingual benchmark that evaluates how well large ...

22h

Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on multiple long-horizon coding benchmarks for 1/6th the cost

It allows engineering teams to host frontier-level AI on their own sovereign infrastructure, entirely eliminating vendor lock ...

19h

Why Weibo’s tiny VibeThinker-3B has the AI world arguing over benchmarks again

B, a 3-billion-parameter AI model, is challenging OpenAI, Google and DeepSeek on math and coding benchmarks while reigniting ...

Anthropic sets AI performance records with new Mythos 5, Fable 5 frontier models

Anthropic PBC today introduced Claude Mythos 5 and Claude Fable 5, two large language models that it says outperform the ...

Geeky Gadgets

AI Benchmarks Are Broken : The Leaderboard Illusion

What if the tools we trust to measure progress are actually holding us back? In the rapidly evolving world of large language models (LLMs), AI benchmarks and leaderboards have become the gold standard ...

ascopubs.org

RadOncRAG: A Novel Retrieval-Augmented Generation Framework Improves Large Language Model Benchmark Performance in Radiation Oncology

Large language models (LLMs) show promise in assisting knowledge-intensive fields such as oncology, where up-to-date information and multidisciplinary expertise are critical. Traditional LLMs risk ...

Geeky Gadgets

How to Build Custom LLM Benchmarks for Your AI Applications

Have you ever wondered why off-the-shelf large language models (LLMs) sometimes fall short of delivering the precision or context you need for your specific application? Whether you’re working in a ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results