Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Represented Knowledge: Which fact queries can the LLM answer correctly & consistently?
  • Factuality: When generating an output, are the facts an LLM uses in its answer correct?
  • Biases: How can bias in LLMs be detected and mitigated using KG?

Brief description of the state of the art 


Answer 1: Using KGs to Evaluate LLM Represented Knowledge

...

Draft from Daniel Burkhardt

Description: This involves using knowledge graphs

KGs hold factual knowledge for various domains, which can be used to analyze and evaluate various aspects of LLMs, such as knowledge coverage and factuality. KGs provide structured information for assessing LLMs' knowledge capture and representation across domainsLLM knowledge coverage. This involves verifying the knowledge represented in an LLM using KGs. By extracting and comparing knowledge or facts from LLM outputs with the structured data in KGs, this approach can identify gaps in knowledge and areas for improvement in LLM training and performance.

(First Version): The first evaluation process can be divided into two parts. Those can be executed through various techniques, which this section will not discuss. First, the LLM generates output sequences based on an evaluation set of input samples. Specific KG triplets are then identified and extracted from the generated output sequence. The variants for extraction and identification can be found in other subchapters of this DIN SPEC. The extracted KG triplets are usually domain or task-specific. These KG triplets are used to generate a KG. 
In the second step, the KG can now be analyzed. For instance, factuality can be checked by analyzing each KG triplet in the generated KG, given the context provided. Alternatively, the extracted KG triplets can be compared with an existing, more extensive KG to analyze the knowledge coverage of an LLM.

Similar to previous solutions, the target object can be predicted using either a QA pattern, predicting the answer using masked language modeling, or predicting the correct statement from a multiple-choice item. However, the information embedded in a KG can not be compared with the target object of an LLM due to the abstract structure of KG triples. Therefore, the output prediction has to be transformed into a meaning representation that describes the core semantic concepts and relations of an output sequence. Meaning representations should be extracted as a graph-based semantic representation. Thereby, the congruence of the extracted target graph and an objective KG can then be evaluated, and missing or misplaced relations and missing or false knots can be detected.  

Considerations:

  • meaning representation
  • KG congruence


- Considerations:
- Standards and Protocols and Scientific Publications:

...