Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Target applications that can not be evaluated with gold/reference data because the gold standard is changing over time (grounded knowledge), similarity-based methods are failing (hallucination), or to enhance test data with RAG (Bias detection).

  • General Methods:
    • extracting KG triples from LLM outputs and evaluate the results
    • enhancing example inputs for the LLM and evaluate biased results
  • When are KG needed in LLM evaluation:
    • analyzing grounding capabilities of LLMs (knowledge coverage and factuality)
    • analyzing hallucination of LLMs (factuality)
    • analyzing inherent bias from training data
    General Methods:
    • extracting KG triplets from LLM outputs and evaluate the results
    • enhancing example inputs for the LLM and evaluate biased results

Answer 1: Using KGs to Evaluate LLM Knowledge Coverage

...