The Hallucination Leaderboard: LLMs comparison

1 min

Vectara’s Hallucination Leaderboard offers a unique benchmark for this, focusing on the tendency of Large Language Models (LLMs) to “hallucinate” or fabricate information during summarization tasks. Updated as of November 1, 2023, the leaderboard offers an insightful glimpse into how well-known LLMs like GPT-4, GPT-3.5, and Llama perform under scrutiny.

The Evaluation Method

The leaderboard, based on Vectara’s Hallucination Evaluation Model, measures several critical aspects:

  1. Accuracy: The percentage of responses without hallucinations.
  2. Hallucination Rate: The frequency of fabricated information in responses.
  3. Answer Rate: The proportion of prompts the model responds to.
  4. Average Summary Length: The word count of the model’s summaries.

Current Standings

Here’s a snapshot of the standings as of the last update:

Model Accuracy Hallucination Rate Answer Rate Avg. Summary Length (Words)
GPT 4 97.0 % 3.0 % 100.0 % 81.1
GPT 3.5 96.5 % 3.5 % 99.6 % 84.1
Llama 2 70B 94.9 % 5.1 % 99.9 % 84.9
Google Palm 87.9 % 12.1 % 92.4 % 36.2
Google Palm-Chat 72.8 % 27.2 % 88.8 % 221.1

Full Leaderboard

Behind the Rankings

The methodology involves feeding 1000 short documents to each LLM and analyzing their summarized outputs. The choice to focus on summarization accuracy, instead of overall factual accuracy, allows a comparative analysis of the model’s response to the provided information.

Looking Ahead

Vectara plans to regularly update this leaderboard, adding more models, including GPT4 Turbo, and expanding the scope to citation accuracy and multilingual capabilities. This ongoing effort reflects the dynamic nature of AI and the need for constant evaluation in the quest for reliable and truthful AI-generated content.

Explore more about this fascinating field and stay updated with the latest AI trends by visiting the Hallucination Leaderboard on GitHub.

Like it? Share with your friends!


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Send this to a friend
Hi, this may be interesting you: The Hallucination Leaderboard: LLMs comparison! This is the link: