Page History
...
- Verbalization of KGs into LLM inputs (See answer 1)
- Integrate KGs during pre-training (See answer 2)
- Integration KGs during Fine-Tuning (See answer 3)
Explanation of concepts
...
- Pre-training: an unsupervised training step where a language model learns language representations by processing big unannotated corpora.
- Pre-Training objectives: the techniques that guide the learning process of a model from its training data
...
- . Standards techniques for language learning are Masked Language Modeling (MLM) and Casual Language Modeling (CLM). For knowledge aware training with KGs triplets directly into llm inputs Knowledge masked language model
- Masked language model (MLM): a pre-training technique where the model predicts a masked token in a sequence by considering the context of sorrounding tokens.
- Casual language model (CLM): a pretraining technique where the model is presented with a sequence of tokens and learns to predict the next token in the sequence based solely on the preceding
...
- tokens
- Knowledge masked language model (KMLM): a knowledge-aware pre-training technique that extends MLM by adding triplets from KGs directly to the text tokens. In contrast to MLM, not only tokens from the text sequence are masked, but also entities of the triplet that have been added as tokens to the LLM input.
- Verbalization of KGs: the task of representing knowledge graphs through text, thereby transforming structured data into a text format from which the LLM can process and learn. Verbalization can take place at different stages of the LLM lifecycle, during training (pre-training, fine-tuning) or during inference (in-context learning).
...
- Embeddings: a numerical representation of data that captures semantic meaning in a continous multidimensional space.
- Finetuning: a process of taking a pre-trained model and extending it for:
- Task Adaption: adapting the model for a new specific task, such as classification or sentiment analysis
...
- Knowledge Enhancement: expanding the pretrained model's knowledge to specialize it for a particular domain or enterprise needs
...
- Instruction Tuning: teaching the model to follow human instrcutions using datasets of prompts
...
- Adapter: A trainable layer that can be added to the original LLM architecture, introducing a small number of new parameters that are independent of the LLM parameters. These can be used either for knowledge-based pre-training (see answer 2) or for efficient fine-tuning (Peft).
- Parameter Efficient Tuning (PeFT): a technique to fine tune models where only a small number of the pretrained model's parameters or a new small subset of parameters are trained. This technique is useful when computational resources are limited.
Brief description of the state-of-the-art
...
The current state-of-the-art method to integrate knowledge graph (KG) triplets into llm inputs is Knowledge Masked Language Modelling (KMLM), where tokens from both the text sequence and the KG triplets are masked during training to encourage the model to learn contextual relationships. For example, ERNIE [1] prependes knowledge triplets to the text sequence and randomly masks tokens from either the text or the triplets. In contrast, K-BERT integrates triplets by appending knowledge information immediately after the corresponding entities in the text tokens, while restricting the triplets from influencing text tokens in the sentence. This method prevents semantic changes to the original sentence and mitigates potential knowledge noise that can occur when tokens from the triplets and the text interact directly, as is the case in ERNIE. In addition, verbalising knowledge graphs can improve performance, as demonstrated by KELM [7].
When it comes to integrating KG knowledge directly into the training process using knowledge embeddings, current state-of-the-art techniques modify the model architecture in three main ways:
- Inserting a knowledge encoder after the transform encoder to fuse text embeddings with knowledge embeddings after the initial text encoding, allowing the model to use both textual and knowledge-based information.
- inserting a knowledge encoding layer inbetween the transformer to enable the llm to process knowledge from the KG
- Add an adapter that is trained independently of the LLM and is easier to train because it has its own set of parameters
Answer 1: Integrate KGs into LLM Inputs (verbalize KG for LLM training) – before pre-training
...