Page History
...
Multi-hop Reasoning with KGs: Multi-hop reasoning involves traversing several relationships within a KG to connect multiple pieces of information. By structuring queries across these relational "hops," LLMs can access layered knowledge, making it possible to answer complex questions that require linking distant but related entities. This is particularly useful for in-depth domain-specific queries where multiple steps are essential to arrive at accurate answers [1, 6].
Tool-Augmented Reasoning: In tool-augmented reasoning, LLMs integrate external resources like KGs to aid in decision-making during reasoning. Models like ToolkenGPT [5] leverage tool-based embeddings, allowing the LLM to perform dynamic, real-time KG queries. This augmentation enables LLMs to retrieve structured knowledge that informs reasoning paths and aids logical, stepwise problem-solving [2, 5].
Consistency Checking in Reasoning: Consistency checking ensures that LLMs adhere to logical coherence throughout their reasoning process. By systematically cross-referencing LLM outputs with KG facts, systems like KONTEST [4] can evaluate the alignment of generated answers with established knowledge, identifying logical inconsistencies that may arise during reasoning. This reduces contradictions and improves the factual reliability of responses. [4, 5].
- Chain of Thought (CoT) Reasoning Enhanced by KGs: CoT reasoning, when combined with KGs, supports a structured multi-step reasoning process. By organizing KG-based reasoning paths in a sequence, LLMs can maintain logical flow and improve interpretability in complex queries. This structured reasoning, enabled by tracing relationships in the KG, enhances transparency in decision-making by allowing users to follow the steps that led to the final answer [3, 7].
Graph-Constrained Reasoning: In graph-constrained reasoning, LLMs are guided by the constraints of the KG, which restricts possible reasoning paths to those that align with verified entity relationships. By adhering to KG-encoded logical structures, LLMs can reduce spurious or unrelated associations, focusing only on reasoning paths that conform to the graph’s factual framework. This enhances logical accuracy and minimizes errors in multi-step reasoning [11, 12].
Brief description of the state of the art
The current landscape for enhancing LLM reasoning with KGs is advancing with several state-of-the-art methods and tools. Multi-hop reasoning and RAG are foundational techniques enabling LLMs to connect multiple pieces of information across KG paths, facilitating answers to complex, layered questions. Multi-hop approaches, like Paths-over-Graph (PoG), dynamically explore reasoning paths in KGs, integrating path pruning to optimize the reasoning process and focus on relevant data [1, 6, 7, 11].
Another significant development is tool-augmented reasoning, exemplified by systems such as ToolkenGPT, which equips LLMs with the ability to access and utilize KG-based tools during inference. ToolkenGPT creates embeddings for external tools (referred to as "toolkens"), enabling real-time KG lookups that aid in logical, structured reasoning by supplementing the LLM's outputs with factual data drawn directly from KGs. Similarly, Toolformer offers dynamic API-based access to KG data, facilitating reasoning with external support for tasks requiring specific, fact-based insights [2, 5].
Consistency-checking frameworks are also essential for enhancing reasoning accuracy. Systems like KONTESTevaluate LLM outputs against KG facts, ensuring logical coherence and flagging inconsistencies. This method reduces the logical errors LLMs might otherwise produce in reasoning tasks by cross-referencing generated answers with verified KG knowledge. Furthermore, GraphEval is used to assess the factuality of LLM responses, leveraging a judge model that systematically aligns generated answers with KG-derived facts [4, 13, 14].
Chain of Thought (CoT) reasoning combined with KGs enables LLMs to approach reasoning tasks in a multi-step, structured manner. By organizing KG-based reasoning paths into sequential steps, CoT supports transparency and traceability in complex queries, particularly useful for answering multi-entity questions. Lastly, graph-constrained reasoning, as seen in frameworks like Graph-Constrained Reasoning and PoG, directs LLM reasoning within predefined KG paths, minimizing irrelevant associations and enhancing logical consistency by adhering to factual constraints within the graph structure [3, 7, 11, 12].
Answer 1: KG-Guided Multi-hop Reasoning
...
Explanation of concepts: Firstly, KG triplets can be used to evaluate the amount of knowledge that is represented by the LLMs parameters how much knowledge an LLM can leverage from the training process and how consistently this knowledge can be retrieved. Secondly, KG triplets can be used to evaluate the output of an LLM . The triplets can be compared by extracting information from the output and comparing it with a KG to check factuality or knowledge coverage. Examples of this knowledge coverage would be political positions, cultural or sporting events, or current news information. Furthermore, the extracted KG triplets can be used to evaluate tasks/features where a similarity comparison of the LLM output is undesirable. This is the case for identifying and evaluating hallucinations of LLMs.
The final reason is to use KGs to enhance LLM inputs with relevant information. This method is beneficial, for example, if the goal is to use in-context learning to provide relevant information for a specific task to the LLM. In addition, planned adversarial attacks can be carried out on the LLM to uncover biases or weak points. Both variants are explained in more detail below as examples.
...
- Represented Knowledge: Which fact - queries can the LLM answer correctly & consistently?
- Factuality: When generating an output, are the facts an LLM uses in its answer correct?
- Biases:
...
Relational triples from a Knowledge Graph (KG) can be leveraged to create up-to-date and domain-specific datasets for evaluating knowledge within Language Models (LMs) [1, 2, 4, 5, 6]. The LLM can then be queried with the subject and relation to predict the object using either a question-answer pattern [6], predicting the answer using masked language modeling [4, 5], or predicting the correct statement from a multiple-choice item [1,2]. KGs not only KGs provide the correct answers but also and enable the creation of plausible distractors (incorrect answer options), allowing the generation of multiple-choice items to evaluate the knowledge represented in an LM [1, 2].
...
- While KGs are inherently structured to maintain consistency in factual representation, LMs do not always yield consistent answers, especially when queries are rephrased [2, 3]. The integration of Integrating KGs for evaluation set generation can address this by allowing multiple phrasings of a single query, all linked to the same answer and relational triple. This approach helps measure an LM’s robustness in recognizing equivalent rewordings of the same fact [1, 2, 3].
- When using a question-answering format (to evaluate text-generating / autoregressive LMs), the free-form answer of the model needs to be compared to the reference answer [1]. While there are multiple ways of comparing the answer to the reference [1], no single approach is ideal. Multiple-choice-based approaches mitigate this problem entirely [1, 2, 6] but inherently have a limited answer space and may encourage educated guessing, simplifying the task by providing plausible options. Conversely, open-ended answers require the model to generate the correct response without cues and may align better with real-world use cases.
- Verbalizing a triple not only requires labels for the subject and object (which are typically annotated with one or multiple labels), but a rule [1] or template [2] which translates the formal triple into natural text. This may requires require humans to create one or multiple rules per relation. Depending on the target language and the number of relations used relations , this can be a non-negligible amount of work.
Standards and Protocols and Scientific Publications:
The use of KGs for the generation of generating evaluation datasets has been successfully employed in various scientific publications [1, 2, 4, 5, 6, 7].
...
Description: This involves using knowledge graphs to analyze and evaluate various aspects of LLMs, such as knowledge coverage and biases. KGs provide a structured framework for assessing how well LLMs capture and represent knowledge across different domains. This involves assessing the extent to which LLMs cover the knowledge represented in KGs. By comparing LLM outputs with the structured data in KGs, this approach can identify gaps in knowledge and areas for improvement in LLM training and performance.
(First Version): The first evaluation process can be divided into two parts. Those can be executed through various techniques, which this section will not discuss. First, the LLM generates output sequences based on an evaluation set of input samples. Specific KG triplets are then identified and extracted from the generated output sequence. The variants for extraction and identification can be found in other subchapters of this DIN SPEC. The extracted KG triplets are usually domain or task-specific. These KG triplets are used to generate a KG.
In the second step, the KG can now be analyzed. For instance, factuality can be checked by analyzing each KG triplet in the generated KG, given the context provided. Alternatively, the extracted KG triplets can be compared with an existing, more extensive KG to analyze the knowledge coverage of an LLM.
...