The Hallucination Ranking of LLM

Brain Titan
2 min readNov 16, 2023

--

The Hallucination Ranking of LLM
LLM, Hallucination

The Hallucination Ranking of Large Language Models GPT 4 is the lowest, Google is at the bottom

The list compares the performance of different large language models in hallucination when summarizing short documents.

GPT-4 has an accuracy rate of 97.0%, an illusion rate of 3.0%, and a response rate of 100.0%.

The two Google Palm performers were at the bottom, with Palm Chat 2 having an accuracy rate of 72.8%, an illusion rate of 27.2%, and a response rate of 88.8%.

This ranking is calculated by the hallucination evaluation model of @vectara , which evaluates how often LLM introduces hallucinations when summarizing documents. The ranking data will be updated regularly as the model and LLM are updated.

Data on the leaderboard includes accuracy, hallucination rate, answer rate, and average summary length (number of words) for different models. For example, GPT-4 has an accuracy rate of 97.0%, an illusion rate of 3.0%, a response rate of 100.0%, and an average summary length of 81.1 words. Other models such as GPT-3.5, Llama 2 70B, Llama 2 7B, etc. also have similar data.

To determine this ranking, Vectara trained a model to detect hallucinations in LLM output, using various open source datasets from studies on factual consistency of summary models. They then fed the aforementioned LLMs 1,000 short documents via a public API and asked them to summarize each document, using only the facts presented in the document. Among these 1000 documents, only 831 documents were summarized by each model, and the remaining documents were rejected by at least one model due to content restrictions. Using these 831 documents, they calculated the overall accuracy (no hallucinations) and hallucination rates (100 — accuracy) for each model.

The model has been open sourced for commercial use on Hugging Face at: https://huggingface.co/vectara/hallucination_evaluation_model

GitHub: github.com/vectara/hallucination-leaderboard

--

--

No responses yet