Page History
...
Target applications that can not be evaluated with gold/reference data because the gold standard is changing over time (grounded knowledge), similarity-based methods are failing (hallucination), or to enhance test data with RAG (Bias detection).
- General Methods:
- extracting KG triples from LLM outputs and evaluate the results
- enhancing example inputs for the LLM and evaluate biased results
- When are KG needed in LLM evaluation:
- analyzing grounding capabilities of LLMs (knowledge coverage and factuality)
- analyzing hallucination of LLMs (factuality)
- analyzing inherent bias from training data
- extracting KG triplets from LLM outputs and evaluate the results
- enhancing example inputs for the LLM and evaluate biased results
Answer 1: Using KGs to Evaluate LLM Knowledge Coverage
...