Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Description: These methods use KG knowledge directly during the LLM pre-training phase by modifying the encoder side of the transformer architecture and improving the training tasks.
LLMs do not process KG structure directly; therefore a KG representation that allows us a combination with text embeddings is necessary, i.e., KG embeddings. There is a need to have an alignment between the text and the (sub)graph pre-training data. To allow the LLM to learn from KG embeddings, there are three main modifications to the transformer-encoder architecture (which future research may extend): 

a) Incorporate a knowledge encoder to fuse textual context (Text Embeddings) and knowledge context (KG embeddings). The LLM could stay frozen and reuse just the output of the Transformer encoder.

b) Insert knowledge encoding layers in the middle of the transformer layers to adjust the encoding mechanism, enabling the LLM to process knowledge from the KG.

c) Add independent adapters to process knowledge. These adapters match 1x1 the transformer layers and are easy to train because they do not affect the parameters of the original LLM during pre-training.

Image Modified

Although nothing prohibits implementing all these modifications at the same timesimultaneously, we see (recommend) the implementation of implementing just one of these variations during LLM pretraining.

The pre-training task allows LLM to learn and model the world, thus. Thus, another option is to modify the pretraining task. In the Encoder side of LLMs, the typical task is to MASK words in the context. A simple modification is to MASK not random words but entities represented in the KG. Another option is to perform a multi-tasking pre-training, i.e., perform MASK prediction and KG link predictionpredictions.

Considerations:

  • As a result, we have LLMs with better language understanding.
  • Empirical evaluation has shown that this combination can improve reasoning capabilities in LLMs.
  • Tail entities, i.e., entities not frequently mentioned in the text, are better learned and modeled by the resulting LLM.
  • It is challenging to harmonize and combine Harmonizing and combining heterogeneous embedding spaces, i.e., such as text and graph embeddings, is challenging. Therefore, experts in NLP and Graph Machine Learning are required for the proper application of to properly apply these methods.
  • More resources are required because of the pre-training time being extended.

...