Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The training of large language models typically employs unsupervised methods on extensive datasets. Despite their impressive performance on various tasks, these models often lack the practical, real-world knowledge required for both domain-specific applicationsand enterprise applications. Furthermore, since domain-specific data is not included in the public domain datasets used for pre-training or fine-tuning large language models (LLMs), integrating knowledge graphs (KGs) become fundamental for injecting proprietary knowledge into LLMs, especially for enterprise solutions. To infuse this knowledge into LLMs during training, many techniques have been researched in recent years, resulting in three main state-of-the-art methods art approaches (Pan et al., 2024): 

  1. Integration Verbalization of KGs into training objectives LLM inputs (See answer 1)Verbalization of KGs into LLM inputs
  2. Integrate KGs during pre-training (See answer 2)Integrate KGs by Fusion Modules
  3. : Joint training of graph and language models Integration KGs during Fine-Tuning (See answer 3)

Explanation of concepts

...

Brief description of the state-of-the-art

First draft to be created until 11 October 2024Regarding the verbalization of KGs into LLM-Inputs


Proposed solutions:

Answer 1: Integrate KGs into LLM Inputs (verbalize KG for LLM training) – Before pre-training enhancement

...

Automatic evaluation of LLMs is usually done by comparing generated model output with a desired result. Therefore, many well-established metrics, like direct matching or similarity metrics (BLEU, N-gram, ROUGE, BERTScore), are used. However, especially when the output deviates from the reference answers, conventional similarity metrics are insufficient to measures measure the factuality of the generated output. Incorporating information from knowledge graphs (KGs) into the evaluation can help ensuring ensure an accurate measurment measurement of the factual integrity and reliability of LLM outputs.

However, there are various reasons why KG should be used in the evaluation to support or enhance these evaluations.


Explanation of concepts 

  • Represented Knowledge: KG triples can be used to evaluate how much knowledge an LLM can leverage from the training process and how consistently this knowledge can be retrieved.
  • Factuality: KG triplets can be used to evaluate the output of an LLM by extracting information from the output and comparing it with a KG to check factuality or knowledge coverage. Examples of this knowledge coverage would be political positions, cultural or sporting events, or current news information. Furthermore, the extracted KG triplets can be used to evaluate tasks/features where a similarity comparison of the LLM output is undesirable. This is the case for identifying and evaluating hallucinations of LLMs.
  • Biases: The final reason is to use KGs to enhance LLM inputs with relevant information. This method is beneficial, for example, if the goal is to use in-context learning to provide relevant information for a specific task to the LLM. In addition, planned adversarial attacks can be carried out on the LLM to uncover biases or weak points. 

...

  • For meaningful graph representations, the standard protocols are, for instance, Abstract Meaning Representation (AMR) or Open Information Extraction (OpenIE). AMR is a semantic representation language generated as rooted, directed, edge-labeled, and leaf-labeled graphs. In AMR, the edges are semantic relations, and the nodes are concepts. AMR has a fixed relation vocabulary of approximately 100 relations and the inverse of each relation. In OpenIE, on the other hand, relation triples are represented as a subject, an open relation, and the object of the open relation. An open relation means that OpenIE does not contain a fixed relation vocabulary. Therefore, each sentence is represented as a directed acyclic graph, and an extractor is used to enumerate all word pairs and make a parallel prediction of the relation.
  • Extracting information from a text and generating or enhancing a KG from it will be discussed in Chapter 4.2. NLP tasks like named entity recognition, coreference resolutions, and relation extraction are well-established problems in this field of research that are solved using either generative LLMs or fine-tuned language models. The third option of using prompting for generating a KG is based on two techniques: in-context learning and chain-of-thought reasoning (explained in Section 4).
  • KG factuality The standard protocol for checking the factuality of a generated KG from an LLM output sequence would be to encode the KG using an LLM or a GNN and predict the factuality using binary classification. For both models, context can be provided in addition to the generated KG for higher precision in the prediction. For this task, the GNN has to be fine-tuned to factuality prediction. When using an LLM for the prediction, prompting can be used to predict the factuality of KG triples. The prompt can be enhanced with in-context learning examples or the context of factual KG relations.
  • Current publications that use the explained techniques are GraphEval and FactGraph. GraphEval uses SOTA LLMs like LLaMA to extract and generate the KG from a given model output. It identifies each extracted triple on whether they are factually consistent given the provided context. FactGraph builds on text and graph encoders that are augmented with structure-aware adapters to classify factuality. 

Answer 3: Analyzing LLM Biases through KG Comparisons

...