Weitere Themen
KI-Interessierte auf DIN.ONE

Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

Diego Collarana (FIT)
Daniel Baldassare (doctima) – Lead
Michael Wetzel (Coreon)
Rene Pietzsch (ECC)
...

Description:

Verbalizing knowledge graphs for LLM is the pre-training task of representing knowledge graphs as text so that they can be written directly in the prompt, the main input source of LLM. Verbalization consists of finding textual representations for nodes, relationships between nodes, and their metadata. Verbalization can take place at different stages of the LLM lifecycle, during training (pre-training, instruction fine-tuning) or during inference (in-context learning), and consists in:

Mark boundaries of graph data using special tokens, like already for SQL-Queries: Improving Generalization in Language Model-Based Text-to-SQL
Semantic Parsing: Two Simple Semantic Boundary-Based Techniques
Encoding strategies for nodes, relationship between nodes, nodes communities, and metadata Talk like a graph: Encoding graphs for large language models (research.google)
What needs to be verbalized and where? System prompt for static information like KG-schema, user prompt for data instances

Considerations:

...

Simple concatenation of KG triples with text
Entity/Token alignment prediction

Considerations:

Simple concatenation of tokens and triples from KG can cause "knowledge noise"

Standards:

Prediction alignment links between tokens and entities
Entity embeddings + additional entity prediction task to token-only pretraining objective

Answer 2: Integrate KGs during pre-training

...

Content

Space Tools

Versions Compared

Old Version 73

New Version 74

Key

Answer 2: Integrate KGs during pre-training