Page History
...
In the context of integrating KGs into LLM inputs, the current state-of-the-art approach focuses on infusing knowledge without modifying the textual sequence itself. The methods proposed by Liu et al. [3] and Sun et al. [4] address the issue of "knowledge noise", a challenge highlighted by Liu et al. [4] that can arise when knowledge triples are simply concatenated with their corresponding sentences, as in the approach of Zhang et al [5].
Answer 1: Integrate KGs into LLM Inputs (verbalize KG for LLM training) –
...
before pre-training
...
Contributors:
- Diego Collarana (FIT)
- Daniel Baldassare (doctima) – Lead
- Michael Wetzel (Coreon)
- Rene Pietzsch (ECC)
- ...
...
Considerations:
Standards:
...
Answer 2: Integrate KGs
...
during pre-training
Description: The These methods learn use KG knowledge directly during the LLM pre-training phase by improving modifying the LLM's encoder and training tasks.
- Incorporate knowledge encoders
- Insert knowledge encoding layers
- Add independent adapters
- Modify the pre-training task
Considerations:
...
encoder side of the transformer architecture and improving the training tasks.
LLMs do not process KG structure directly; therefore a KG representation that allows us a combination with text embeddings is necessary, i.e., KG embeddings. There is a need to have an alignment between the text and
...
the (sub)graph pre-training data. To allow the LLM to learn from KG embeddings, there are three main modifications to the transformer-encoder architecture (which future research may extend):
a) Incorporate a knowledge encoder to fuse textual context (Text Embeddings) and knowledge context (KG embeddings). The LLM could stay frozen and reuse just the output of the Transformer encoder. b) Insert knowledge encoding layers in the middle of the transformer layers to adjust the encoding mechanism, enabling the LLM to process knowledge from the KG. c) Add independent adapters to process knowledge. These adapters match 1x1 the transformer layers and are easy to train because they do not affect the parameters of the original LLM during pre-training. |
Although nothing prohibits implementing all these modifications at the same time, we see (recommend) the implementation of just one of these variations during LLM pretraining.
The pre-training task allows LLM to learn and model the world, thus, another option is to modify pretraining task. In the Encoder side of LLMs, the typical task is to MASK words in the context. A simple modification is to MASK not random words but entities represented in the KG. Another option is to perform a multi-tasking pre-training, i.e., perform MASK prediction and KG link prediction.
Considerations:
- As a result, we have LLMs with better language understanding.
- Empirical evaluation has shown that this combination can improve reasoning capabilities in LLMs.
- Tail entities, i.e., entities not frequently mentioned in the text, are better learned and modeled by the resulting LLM.
- It is challenging to harmonize and combine heterogeneous embedding spaces, i.e., text and graph embeddings. Therefore, experts in NLP and Graph Machine Learning are required for the proper application of these methods.
- More resources are required because of the pre-training time being extended.
Standards:
- TODO
Answer 3: Integrate KGs during Fine-Tuning – Post pre-training enhancement
Description: These methods inject KG knowledge into LLMs through fine-tuning with relevant data on additional tasks. The goal is to improve the model’s performance on specific domain tasks. We focus on Parameter Efficient Fine-Tuning (PEFT) methods on Decoder-Only transformers, such as GPT and Llama models, due to the significant potential to complement LLMs widely offered by different organizations. The methods transform structured knowledge from the KG into textual descriptions and are utilized in the following ways:
- Task specificity should go hand in hand with domain orientation. Thus, we generate fine-tuning data, leveraging the power of KGs and reasoning to build task—and domain-specific corpora for LLM fine-tuning.
- Knowledge-Enhanced Prompts, automatically generating prompts to improve the outputs of LLMs, in scenarios that require recommendations and explain causality. Iteratively partition and encode the neighborhood subgraph around each node into textual sentences for finetuning data. This transforms graph structure into a format that large language models can ingest and fine-tune. We explore encoding strategies.
Considerations:
- The methods are low-cost and more straightforward to implement than pre-training LLMs.
- They can effectively improve LLMs’ performance on specific tasks.
- Suitable for domain-specific tasks and text generation scenarios that require sensitive information filtering.
- Finding the most relevant knowledge in the KG may limit the Fine-Tuning process.
- These methods may impose certain limitations on the LLM to freely create content.
Standards:
- Fine-Tuning Large Enterprise Language Models via Ontological Reasoning
- GLaM: Fine-Tuning Large Language Models for Domain Knowledge Graph Alignment via Neighborhood Partitioning and Generative Subgraph Encoding
- GraphGPT: Graph Instruction Tuning for Large Language Models
...
Standards:
- Content ...
- Content ...
- Content ...
Answer 3: Integrate KGs during Fine-Tuning – Post pre-training enhancement
Description:
Considerations:
Standards:
- Content ...
- Content ...
- Content ...
References:
- [1] S. Pan, L. Luo, Y. Wang, C. Chen, J. Wang, und X. Wu, „Unifying Large Language Models and Knowledge Graphs: A Roadmap“, IEEE Trans. Knowl. Data Eng., Bd. 36, Nr. 7, S. 3580–3599, Juli 2024, doi: 10.1109/TKDE.2024.3352100.
- [2] T. Wang u. a., „What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?“, in Proceedings of the 39th International Conference on Machine Learning, PMLR, Juni 2022, S. 22964–22984. Zugegriffen: 3. Oktober 2024. [Online]. Verfügbar unter: https://proceedings.mlr.press/v162/wang22u.html
- [3] Liu, Weijie, u. a. K-BERT: Enabling Language Representation with Knowledge Graph. arXiv:1909.07606, arXiv, 17. September 2019. arXiv.org, http://arxiv.org/abs/1909.07606.
- [4] Sun, Tianxiang, u. a. „CoLAKE: Contextualized Language and Knowledge Embedding“. Proceedings of the 28th International Conference on Computational Linguistics, herausgegeben von Donia Scott u. a., International Committee on Computational Linguistics, 2020, S. 3660–70. ACLWeb, https://doi.org/10.18653/v1/2020.coling-main.327.
- [5] Zhang, Zhengyan, u. a. „ERNIE: Enhanced Language Representation with Informative Entities“. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, herausgegeben von Anna Korhonen u. a., Association for Computational Linguistics, 2019, S. 1441–51. ACLWeb, https://doi.org/10.18653/v1/P19-1139.
...