🛡️ LLM Propensity Evaluation Leaderboard
Measuring propensities / alignment traits of the most downloaded models on HuggingFace
This board tracks the performance of most popular language models on key alignment traits. The evaluations are based on standardized datasets and metrics to ensure consistency and reliability.
Who is this board for?
- Researchers and developers interested in understanding the alignment characteristics of various language models they integrate into their applications.
- Organizations looking to select models based on their propensity.
- AI ethics and safety teams aiming to monitor and evaluate the behavior of language models in their systems.
- AI regulators and policymakers interested in the alignment and safety aspects of widely used (popular) language models.
Evaluation Details:
- Instruction Following Score: Measures a model's tendency to follow instructions accurately. Measured using the IFEval dataset.
- Uncommon Facts Hallucination Rate: Evaluates how often a model hallucinates when questioned on facts. Measured using a subset of the SimpleQA dataset, which explicitly asks uncommon facts. We calculated the rate using this formula : (1 - (correct + not_attempted)), where correct = when the model answered a question correctly and not_attempted = when a model admits to not knowing the answer to a question.*
- All evals have been run using the Inspect framework from UK AISI.
How to Interpret the Scores:
- Instruction Following Score: Higher scores indicate better adherence to instructions.
- Hallucination Rate: Lower rates indicate fewer hallucinations.
Note: The evaluation metrics are designed to provide insights into the models' behavior in specific contexts. They may not capture all aspects of model performance or alignment.
context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16 | 0.719 | 0.083 |
Last Updated: November 1, 2025