Page History
How do I construct the underlying ontology of a knowledge graph with the help of an LLM? [Construction (T-Box)] – length: up to two pages
Lead: Sven
Problem statement:
According to Thomas R. Gruber (1,2), an ontology in computer science can be defined as an explicit and formal specification of a shared conceptualization. Thus, the question arises to what extent large language models can be utilized within the ontology engineering and evaluation process. In 1995, Grüninger and Fox developed a methodology for the manual creation of an ontology. Each step is described in the following.
Explanation of concepts: ontology, ontology engineering, ontology evaluation
Answer/Step 1: Ontology Design - User Story Generation
Description:
The process commences with the selection of a specific domain and a scenario to be modeled. In interviews with domain experts, an LLM can be used to describe the scenario and retrieve other pertinent use cases that may be included or explicitly excluded from the modeling domain. Once the scenario has been determined, user stories are generated for each use case. Furthermore, they encompass the delineation of potential personas, accompanied by a definition. In this process, the assistance of an LLM is beneficial in formulating the text in question, provided that certain notes are available (3). It is evident that the text produced in this manner should be subjected to further improvement and correction by a human being. Nevertheless, the time-saving and concentration-on-content benefits of this approach are significant.
Considerations:
The LLM in this step should be used in a chat fashion such that the description of each user story is improved with a human in the loop. Many LLMs also have a specific system prompt that can be adapted in a way that the LLM is supposed to be an ontology engineer.
Contributors:
- Sven Hertling (FIZ), Harald Sack (FIZ), Heike Fliegl (FIZ)
- Michael Wetzel (Coreon)
Answer/Step 2: Competency Question (CQ) Generation
Description:
Once the user stories have been generated, whether manually or with the assistance of an LLM, the subsequent step is to extract competency questions (CQs) from them (4). The LLM could prove beneficial in this phase by generating CQs based on the input user story (5,6,7). It is possible that the competency question (CQ) will be complex and comprise multiple requirements. In order to ascertain whether a CQ is overly complex, it is possible to instruct the LLM to respond with a simple 'yes' or 'no' to a corresponding prompt. Furthermore, the tool can assist in reducing the complexity by dividing one CQ into multiple, more straightforward CQs.
The generated questions may be redundant or exhibit a high degree of semantic similarity, rendering it unnecessary to model each CQ. The removal of these near duplicates is achieved through paraphrase detection, whereby the LLM is instructed to identify sentences or questions that are semantically similar. In case of a vast quantity of items, it is also appropriate to apply sentence transformer (SBERT) models to first embed each sentence and then retrieve similar ones by the cosine similarity in the embedding space. Subsequently, the nearest sentences can be validated once more by an LLM. In addition, the LLM is capable of assisting in the process of identifying similar yet distinct topic-wise CQs, which can then be grouped together.
Considerations:
The subsequent stage is to construct an ontology. This is the most crucial phase, and even if the LLM is capable of generating valid ontologies in a structured format, it should be either done manually or semi-automatically, with a thorough examination of the LLM output by a human.
Answer/Step 3: Ontology Evaluation
Description:
Next, the ontology must be evaluated in relation to the competency questions (8,9). In order to achieve this, it is necessary to create a formal representation of the CQs, which is typically achieved through the use of the SPARQL query language. In order to translate natural language questions into SPARQL queries, the LLM requires both the ontology and the corresponding CQs to develop queries using the provided identifiers in the ontology. Once the SPARQL queries have been generated and validated, the evaluation can be deterministically executed. Alternatively, to ascertain whether the CQs are fulfilled by the ontology, one may provide the ontology together with the CQ in the prompt and request a binary yes/no answer as to whether the ontology fulfills the CQ. In each step, one may revisit any previous step to enhance the results.
Consideration:
In order for an LLM to be applied automatically, the output must adhere to a defined structure. For instance, the generated text should comprise a valid SPARQL query or a list of competency questions that can be extracted. In such cases, it is advisable to either fine-tune the model to align it with the specific task at hand or, at the very least, provide examples of the desired output format (few-shot). In instances where the LLM is employed in a chat environment, the fine-tuned chat models can also be directly deployed (zero-shot).
In instances where the ontology must be included in the prompt, it is possible that it exceeds the capacity of the context window of the LLM in question. In such instances, a retrieval augmented generation approach (RAG) may be employed, whereby only a portion of the ontology is provided in the prompt. In such a case, the ontology should be divided into discrete units at the topic or OWL construct level, rather than at the triple level, in order to maintain the logical coherence of the concepts.
Contributors:
- Sven Hertling (FIZ), Harald Sack (FIZ), Heike Fliegl (FIZ)
Answer/Step 4: Ontology Learning
Description:
Up to this point, it has been assumed that a domain expert is available. However, it is also possible to generate lightweight ontologies from natural language texts pertaining to specific domains (16,17). In such instances, an LLM can also be employed to identify potential classes or relations. With this vocabulary in mind, the time required for ontology creation can be significantly reduced.
The tasks within this step can be further divided into detecting corresponding terms for classes and properties, finding corresponding types for terms appearing in the text (term typing), extracting subclass and subproperty relations (taxonomy detection), and other kinds of relations. OWL is expressive enough to provide more logical constructs in an ontology, such as union or intersection of classes, restrictions, or property chains. For this level of complexity, the LLM needs more examples and a human in the loop to finally decide (the number of these statements is rather small at least for domain ontologies).
Consideration:
The created ontologies might either not be very expressive (or using specialized LLMs for each complexity increase) or a human or even a group is finally deciding on the correctness of the extracted terms. Given that all extracted statements are on a T-Box level, the amount is often low enough to be examined.
Contributors:
- Sven Hertling (FIZ), Harald Sack (FIZ), Heike Fliegl (FIZ)
Answer/Step 5: Ontology Mapping
Description:
Additionally, LLM technology can be employed to enhance or refine an existing ontology. One illustrative example is ontology alignment/matching, whereby concepts that are semantically equivalent are connected via owl:equivalentClass or owl:equivalentProperty relationships in order to enhance compatibility between different ontologies (10,11,12,13). In order to achieve an alignment between the two ontologies, it is possible to provide both ontologies in the prompt, but this approach does not yield satisfactory results and has the disadvantage of being unsuitable for use with large ontologies. A more frequently employed methodology is to initially retrieve potential candidates and subsequently query the LLM as to whether the specified entities represent an identical concept. In addition to establishing equivalence relationships, the LLM can also be utilized to identify the kind of relations between the aforementioned concepts, such as rdfs:subClassOf. These relations appear with greater frequency and would further enhance the integration of numerous existing ontologies. Another task is to find common ontology design patterns that appear in multiple ontologies to further guide ontology engineers.
Considerations:
In all cases where an LLM is involved, a prompt must be formulated. In this step, an ontology needs to be provided in the prompt as well. Still, the question of its serialization remains unanswered. A variety of formats have been established, including serialization by RDF/XML, N-Triple, or any other RDF format, as well as the creation of a natural language representation of the ontology through the use of templates for various OWL constructs. In a number of scientific experiments, it has been shown that an RDF serialization format is an effective approach, primarily due to the fact that the LLM has been trained on a substantial corpus of documents containing such RDF representations.
Contributors:
- Sven Hertling (FIZ), Harald Sack (FIZ), Heike Fliegl (FIZ)
Answer/Step 6: Ontology Documentation
Description:
Another use case is to document a given ontology and to enhance/create the labels and comments for classes and properties (14,15). In particular, the multilingual feature of LLMs allows the labels to be translated into other languages, which in turn increases their usability for a wide range of people.
Consideration:
The right semantics in the generated text (e.g., whether the concept should be used in such a way or not) should be considered, especially when creating longer descriptions for classes and properties. It is also helpful to provide not only the concept itself but also the close neighborhood or even the whole ontology to better grasp the overall scope of the ontology.
Contributors:
- Daniel Baldassare (doctima)
- Sven Hertling, Heike Fliegl (FIZ Karlsruhe)
Draft from Daniel Baldassare :
Data model description: nodes and relationships classesMetadata model description: nodes and relationships' s metadata (identifiers, optional and required metadata)Use with LLM:How to use it for standard RAG ( which embedding model)How to use it for GraphRAG/semantic layer (which embedding model, which additional metadata)
Perhaps add translation of literals using LLMs (one sentence)
References:
- (1) https://doi.org/10.1006/knac.1993.1008
- (2) https://doi.org/10.1007/978-3-540-92673-3
- (3) https://doi.org/10.48550/arXiv.2403.05921
- (4) https://doi.org/10.1007/978-0-387-34847-6_3
- (5) Can LLMs Generate Competency Questions? (no DOI) https://www.eurecom.fr/publication/7699https://2024.eswc-conferences.org/wp-content/uploads/2024/04/ESWC_2024_paper_268.pdf
- (6) The Role of Generative AI in Competency Question Retrofitting (no DOI) https://2024.eswc-conferences.org/wp-content/uploads/2024/05/77770001.pdf
- (7) The Polifonia Ontology Network: Building a Semantic Backbone for Musical Heritage https://doi.org/10.1007/978-3-031-47243-5_17, An Experiment in Retrofitting Competency Questions for Existing Ontologieshttps://dl.acm.org/doi/abs/10.1145/3605098.3636053, RevOnt: Reverse engineering of competency questions from knowledge graphs via language models https://doi.org/10.1016/j.websem.2024.100822
- (8) Ontology testing-methodology and tool https://doi.org/10.1007/978-3-642-33876-2_20
- (9) How, What and Why to test an ontology https://doi.org/10.48550/arXiv.1505.04112
- (10) Language Model Analysis for Ontology Subsumption Inference https://doi.org/10.18653/v1/2023.findings-acl.213
- (11) LLMs4OM: Matching Ontologies with Large Language Models https://doi.org/10.48550/arXiv.2404.10317 https://2024.eswc-conferences.org/wp-content/uploads/2024/05/77770022.pdf
- (12) OLaLa: Ontology Matching with Large Language Models https://doi.org/10.48550/arXiv.2311.03837
- (13) BERTMap: A BERT-Based Ontology Alignment Systemhttps://doi.org/10.1609/aaai.v36i5.20510
- (14) The Live OWL Documentation Environment: A Tool for the Automatic Generation of Ontology Documentation https://doi.org/10.1007/978-3-642-33876-2_35
- (15) WIDOCO: A Wizard for Documenting Ontologies https://doi.org/10.1007/978-3-319-68204-4_9
- (16) Towards Ontology Construction with Language Models https://doi.org/10.48550/arXiv.2309.09898
- (17) LLMs4OL: Large Language Models for Ontology Learning https://doi.org/10.1007/978-3-031-47240-4_22
How do I fill and instantiate a KG (A-Box) with knowledge contained in texts?
Lead: Desiree, Length: up to two pages
Problem statement (only a few sentences, at most one paragraph)
Populating knowledge graphs (KGs) with A-Box statements based on a given T-Box typically involves a labor-intensive manual process. In case of unstructured data, usually text corpora, Natural Language Processing (NLP) methods can be leveraged to extract pertinent information in a structured format, enabling its integration into the KG. This process generally involves two key steps: (1) identifying typed entities and optionally linking them to existing KG resources, and (2) extracting additional A-Box statements about entities or relationships between entities in general. Some approaches, however, combine these steps and directly extract entities as part of triples. Traditional NLP methods from the pre-LLM era typically require dedicated training to adapt them to new or modified T-Boxes, which is a significant limitation.
Explanation of concepts
Entities (or named entities) are specific, distinct instances of classes or concepts, representing objects that typically exist in the real world, for example, individuals (persons, organizations, locations, etc) or documents (books, reports, presentations, etc).
- KG Instances are already existing unique entities which reside in a KG. They are used as candidates in disambiguation and linking processes.
Triples (or Statements) are formal structures to link two entities via a relationship (predicate). In the Resource Description Framework (RDF) they are defined as (subject, predicate, object) while others state them as (head entity, relation, tail entity).
T-Box Statements define the conceptual structure of the knowledge graph. This comprises in particular the characterization of classes and predicates.
A-Box Statements are facts about concrete entities. They can either describe an entity directly or its relations to other entries.
Relationships are also called predicates or relation types. A relationship of an entity to another entity (or value) can also characterize an entity's property.
Brief description of the state-of-the-art (only a few sentences, at most one paragraph)
Add citations
Pre-trained LLMs show good natural language understanding capabilities. Moreover, with their large amount of parameters and training data, they acquire emergent capabilities which means have an already good performance on tasks they were not specifically trained on (1). State-of-the-art shows how pre-trained LLMs can be used in knowledge graph information extraction tasks to advantage. For these tasks, typically the LLM is prompted with the input text, task-specific instructions and contextual background information such as descriptions of entity classes or relationships. Compared to pre-LLM NLP approaches, fine-tuning LLMs to specific domains is dispensable. However besides providing rich contextual information, various approaches include examples in the prompts to realize an In-Context Learning, i.e., guiding the LLM output with few examples in order to improve the output accuracy.
In general, state-of-the-art methods use LLMs for two main tasks. One is the extraction of named entities and their respective types from a set of classes defined by an ontology. Another is the extraction of triples by using three text elements which involve just words. The latter task can be designed as a single step or be divided into multiple subtasks, e.g. entity and relation extraction. Besides these main tasks, LLMs can also act as information linkers by disambiguating and linking previously extracted information to the KG. However, this step is often neglected, since LLMs show unsatisfactory results. Yet, some works explore the usage of LLMs in relation prediction tasks (i.e. disambiguate and link relations). At least, literature shows that LLMs can act as helper methods, for instance, in simplifying texts to support named entity linking (2).
In the following, we describe the two key steps to be able to fill and instantiate a KG with knowledge contained in texts.
Step 1: Extraction of Entities from Text and Linking Entities to Knowledge Graph Instances
Description: Extracting entities from texts using LLMs comprises two tasks. In a first step, named entities have to be extracted from a given text, so-called Named Entity Extraction (NER). Typically, extracted entities are also classified into types which can come from classes defined in ontologies (T-Box). In a second step, extracted entities must be linked to the KG, so-called Named Entity Linking (NEL). In this process unique entity instantiations in the KG are searched for candidates that could represent the entity mention found in the text.
Regarding Named Entity Extraction, LLMs need to be prompted for a suitable structured output format, since such models are trained on text generation and not specifically on labeling or classification tasks. Prompts can instruct LLMs how identified entities and their types should be returned, for instance, enclosed with special markers in texts or simply listed in a structured way. Giving examples, i.e., using a few-shot setting, including similar inputs and outputs in the requested output format, can make the task and the desired output format clearer. In order to guide the extraction process for typing entities, existing classes in ontologies together with their descriptions can be given in prompts.
In case of Named Entity Linking, some approaches leverage LLMs with highly specific disambiguation instructions, incorporating verbalized knowledge from KGs into prompts or combining them in multi-modal settings that include, e.g., images. Besides using LLMs directly for entity linking, they can be also utilized in pre-processing steps. For example, they can be applied to reformulate texts by unifying multiple entity mentions, thereby improving the performance of employed entity linking methods.
Considerations: In order to perform extraction, disambiguation and linking tasks, LLMs require suitable descriptions of entities which are often not immediately available. Workarounds could be indirect descriptions found in texts when entities are mentioned or textual information of neighborhood resources in KGs. However, such indications might not be sufficient to help the LLM selecting the correct entity candidate.
Standards and Protocols:
- Possible: https://python.langchain.com/docs/how_to/output_parser_structured/ (generate structured output)
- Possible: https://persistence.uni-leipzig.org/nlp2rdf/ (use NLP Interchange Format (NIF) to state where entities where found/extracted)
Step 2: Extraction, Completion or Testing of A-Box Statements defining characteristics and relations of entities
Description: A-Box triple extraction refers to the process of identifying statements about entities from a given text. This involves identifying a subject (head entity), predicate (relation) and an object (tail entity). Hereby, entities extracted and linked from the previous step can be considered as an additional input alongside the given input text. Besides extracting entities, this task also comprises the identification of relations and their linkage to unique pre-defined KG predicates, typically established during a preceding T-Box construction phase. Furthermore, these relations can also have defined restrictions regarding their domain and range, specifying the possible classes of head and tail entities. When using LLMs for this task, these restrictions and general descriptions of the relations are usually part of the prompt.
Extracting triples can be performed in a single step or multiple consecutive steps. In the single-step approach, an LLM is prompted to directly extract entire triples from the given text. In contrast, multi-step approaches typically first extract head entities as explained in the previous section. In a second step, for the identified entities as head entities, statements are extracted by identifying fitting relations and tail entities. Alternatively, other constellations of filling triple components are possible. For instance, given two triple components, an LLM can be prompted to fill the missing one. However, the latter approach is more commonly used in knowledge graph completion scenarios and rely on the implicit knowledge LLMs acquired during their training on large document corpora, rather than on input texts as information sources. Additionally, extracted triples can be verified, also considering their source texts. Hence, an LLM can be prompted to select the most accurate triple from multiple candidates or to accept or reject a triple optionally with a confidence or acceptance score. Besides, an LLM can be also used to correct faulty triples. For all sub steps and approaches, example inputs and outputs can be included in the prompt to make the task clearer and showcase the desired output format to improve the LLM outputs. Besides, the examples can be also complemented with additional explanations that might also be demanded from the LLM to encourage well-founded outputs.
Extracted triples are typically linked component-wise to a KG, i.e, the head and tail entity are linked to pre-existing unique KG entities, as explained in the last section, and similarly the identified relation is linked to a defined KG predicate. Hereby, the linkage of relations can be either already preformed during the extraction phase by just allowing existing predicates as relation outputs or in a later step by prompting an LLM with the instruction to disambiguate and link relation mentions based on descriptions of unique KG predicates. Moreover, it can be also meaningful to search the KG for existing triples that represent the same information as an extracted triple to avoid duplicates. This can be performed by checking whether there already exists a triple with identical components or whether there exists a triple that is not identical on the component-level but conveys the same semantic information.
Furthermore, LLMs can also assist other NLP approaches with intermediate tasks in the triple extraction and linking process. Such tasks can be, for instance, co-reference resolution, i.e., linking expressions like personal pronouns to the underlying entity like a person, or the simplification of extracted relation mentions to facilitate their linking process.
Considerations:
- When processing larger texts that have to be split because they are too large to fully incorporate them in the prompt, it could be that information could be split and extraction of this information as a triple is no longer possible.
Standards and Protocols:
- Possible: https://python.langchain.com/docs/how_to/output_parser_structured/ (generate structured output)
References
(1): Wei, Jason, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama et al. "Emergent Abilities of Large Language Models." Transactions on Machine Learning Research (2022).
(2): Borchert, Florian, Ignacio Llorca, and Matthieu-P. Schapranow. "Improving biomedical entity linking for complex entity mentions with LLM-based text simplification." Database: The Journal of Biological Databases and Curation 2024 (2024).
Notes:
General:
- Possible papers: https://github.com/RManLuo/Awesome-LLM-KG#llm-augmented-kg-completion ?
- sometimes the T-Box is constructed in parallel → but we focus here solely on A-Box statements;
- Possible papers: https://github.com/RManLuo/Awesome-LLM-KG#llm-augmented-kg-completion ?
Intro+Terminology:
SOTA:
Phases, describe how LLMs can be prompted (compatible format, Beschreibungen etc.), how to get structured output (essential), some link them to T-Box others focus on extraction the information structured, how to improve the process (Few-shot)
- (LLMs can also be used for this task) → advantages (over traditional few shot methods): no training but (few shot) prompting or prompting based on a description → adaptability to domains, maybe also mention the emergent abilities of LLMs (good performance also on task they had not been trained on specifically)
- LLMs can be used as information extractors to
- extract named entities and also assign them their respective type, i.e., a class defined by the ontology (Description of classes)
- Triple extraction: many approaches focus on extracting triples with string components (i.e. just words). Triple extraction can be seen as a one step task or be divided into multiple steps (e.g. entity extraction and then relation extraction) → corresponds to triple completion or a triple testing task.
- LLMs can act as information linkers (disambiguate + link previously extracted information to the KG)
- A sentence about the linking to the KG (the actual KG completion step): Often neglected. Similar as in the entity extraction case, LLM-based approaches oftentimes neglected the entity and relationship linking process with respect to a knowledge graph with an existing ontology. However it is similar to the entity linking process also possible to link or disambiguate relations ("relation prediction") (see e.g. https://doi.org/10.48550/arXiv.2404.03868).
- LLMs can also act as helper methods (e.g. NEL simplify entity mentions (Borchert, 2024), )
- (LLMs can also be used for this task) → advantages (over traditional few shot methods): no training but (few shot) prompting or prompting based on a description → adaptability to domains, maybe also mention the emergent abilities of LLMs (good performance also on task they had not been trained on specifically)
Step 1:
- See also: Survey on NER (see 4.5 for LLMs): https://arxiv.org/pdf/2401.10825
Describe the first step in more detail:
- To this regard, the LLM prompt needs to define an suitable structured output format, since LLMs are trained on text generation and not specifically on labeling or classification tasks.
- LLMs can be prompted to label found entities by returning the given text with found entities enclosed into special start and end markers (Wang et al., 2023). Here also class information can be encoded into enclosing markers (Hu et al., 2024)
- Alternatively, instead of labeling the input text, the identified entity mentions and their classes can be also returned in a structured way, e.g., as tuples, lists or dictionaries (see, e.g., Chen et al., 2024).
- In order to guide the extraction process regarding classes defined in the ontology, existing entity classes and their descriptions can be given in the prompt (see, e.g., Chen et al., 2024).
- Moreover, few-shot prompting, i.e., giving examples might be beneficial to make, in particular, the requested output format clearer (see, e.g., Ashok et al., 2023)
- To this regard, the LLM prompt needs to define an suitable structured output format, since LLMs are trained on text generation and not specifically on labeling or classification tasks.
Describe the second step in more detail:
- Oftentimes, named entity linking (NEL): looking up the knowledge graph in order to find existing unique entity candidates for an extract entity mention is neglected by pure LLM-based approaches extracting named entities from text.
- More prominently, smaller language models, based on, e.g., BERT or RoBERTa, that are specially fine-tuned for this task are utilized as an end-to-end approach (Ayoola, 2023) → maybe do not necessarily mention (focus more on what is possible with LLMs)
- However, the problem of entity linking can also be formulated as a sequence-to-sequence generation task there are, in particular, multi-modal approaches using text and images as input that disambiguate entity mentions with the help of LLMs taking into account Wikidata descriptions of entities (Song et al., 2024) or approaches formulating very specific disambiguation instructions with which LLMs are prompted (Liu & Fang, 2023) or approaches can employ (verbalized) triples about KG entities to disambiguate mentions (Xu et al., 2024) → latter possibility: typically rather done later and not directly after the entity extraction
- Also possible prompt LLM to unify multiple entity mentions (pre-step to entity linking)
- Moreover, LLMs can be also used in pre-processing steps supporting the NEL process finally conducted by other methods, e.g. by simplifying entity mentions (Borchert, 2024)
- Maybe too much detail?
- Such candidates are considered to decide whether a KG instance, representing an entity in question, already exists. If no candidate matches, a new unique KG instance representing the extracted entity is created.
- Consideration: Entity disambiguation and linking with text LLMs is often not possible due to the lack of descriptions for unique KG entities (alternatively: indirect descriptions based on entity mention contexts can be used or the KG neighborhood could be verbalized to compile a textual description → however, the information contained in the textual contexts might not be sufficient to describe a mention in a way such that a suitable entity candidate can be identified)
Step 2:
How to solve the extraction
- multiple interpretations/approaches of the task are possible:
- one step: the task can be defined as extracting triples directly from a given text One step: Extraction of whole triples: strategies: generate triple directly generate triple + explanation
- as a component filling task in which for instances first an entity is extracted and then the task is identifying predicate and object to get a complete triple (diff to entity identification: component is typically not extracted independently but conditioned by the already fixed triple components and their types → are there any approach conditioning on that??).
Component filling (Triple Completion): Triple extraction can be also formulated as a triple completion problem in which either the subject, predicate or object has to be completed given the two other components. TODO: Literal Extraction/Completion special form of object completion?
- E.g. first entity extraction then relation extraction (typical)
- Description of relations or examples can be given in the prompt or see entity extraction +linking
- searching for one missing component (entity linking → KG Completion)
- typically requires descriptions of entity
- or text-free: exploit LLM "knowledge"
- Moreover, triple candidates can be also generated in a preliminary step and an LLM can be prompted to select to most promising triple candidate for a given piece of text.
- Triple probing (or testing) - sometimes also called calibration (pre-generated outputs/multiple steps/chain-of-thought think step-by-step): Triple candidates are given to an LLM that decides which triples should be accepted, can include confidence score assignments (textual or logit-based) or yes/no decision
→ can be also corrected (input: text+triples; output: correct triples) see e.g. Zhang 2024 (Table 2): https://aclanthology.org/2024.kallm-1.12/
- one step: the task can be defined as extracting triples directly from a given text One step: Extraction of whole triples: strategies: generate triple directly generate triple + explanation
- multiple interpretations/approaches of the task are possible:
→ for all task interpretations: description of relations/classes/domain+range + few shot examples can be provided in the prompt; explanations (chain-of-thought)
How to link/disambiguate
- similar as in the case of entity identification, also relations as part of triples need to be linked to the KG (entity linking already discussed in the last section/paragraph)
Second step (linking):
- Triple linking on the component level
- However it is similar to the entity linking process also possible to link or disambiguate relations (see e.g. https://doi.org/10.48550/arXiv.2404.03868).
- but also possible on a triple level (check if two triples represent the same information / standradize them → can be a pre-step to linking (duplicate removal)
- Triple linking on the component level
Helper methods for the process
- Description generation
- LLMs can also assist in co-reference resolution
- https://doi.org/10.1016/j.nlp.2024.100099 Chen et al. 2024
- https://doi.org/10.1093/jamia/ocad259 Hu et al, 2024
- https://doi.org/10.48550/arXiv.2304.10428 Wang et al., 2023
- https://arxiv.org/abs/2305.15444 Ashok et al., 2023
- https://doi.org/10.1609/aaai.v38i17.29867 Song et al., 2024
- https://aclanthology.org/2023.emnlp-main.350.pdf Ayoola, 2023
- https://doi.org/10.1093/database/baae067 Borchert, 2024
- https://doi.org/10.2991/978-94-6463-264-4_79 Liu & Fang, 2023
- https://doi.org/10.48550/arXiv.2403.00953 Cao et al., 2024
- https://doi.org/10.3389/fncom.2024.1389475 Xu et al. 2024
User Interface / Access (move this section to the "prompt enhancement" section – 3.2.4?)
How do I search knowledge in KGs using Natural Language using LLMs?
- Natural Language to SPARQL
- Definition/Description:
- Text2QueryLanguage (e.g. Text2SPARQL)
- Direct translation of user queries into equivalent knowledge graph query languages by prompting an LLM
- The knowledge graph structure can be given either by including a (sub)schema or a subgraph in the query translation prompt
- KG-RAG
- Can be done by extracting knowledge graph entities and potentially relevant relationships from the user query and retrieving relevant triples from the knowledge graph which are given as a context to the prompt containing the user query
- References:https://doi.org/10.48550/arXiv.2407.01409, https://ceur-ws.org/Vol-3592/paper6.pdf
Contributors:
- Diego Collarana (FIT)
- Sven Hertling (FIZ), Harld Sack (FIZ), Heike Fliegl (FIZ)
- Desiree Heim (DFKI)
- Markus Schröder (DFKI)
- Sabine Mahr (word b sign)
Assertional Knowledge Engineering
Information Extraction
Contributors:
- Diego Collarana (FIT)
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
KG Completion (A-Box)
Link Prediction
Relation Prediction
Fact Checking / Triple Testing
Literal Completion (labels/comments/descriptions)
Entity Linking (between KGs)
Contributors:
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
Entity Disambiguation
Contributors:
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
Terminological Knowledge Engineering
Ontology Design
Contributors:
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
Competency Question (CQ) Generation
User Stories / Personas Generation
Ontology Learning (Automated ontology design from text)
Ontology Evaluation
Contributors:
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
Competency Question (CQ) generation (from given ontologies)
CQ to SPARQL
Ontology Mapping
Contributors:
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
Ontology Documentation
Contributors:
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
Class and Relation Descriptions/Labels
Reasoning
Aprox/Probabilistic Reasoning via LLMs (LLM supported)
Contributors:
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
Constraint Checking
Contributors:
- Robert David
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
Data Repairs (→ maybe move to completion?)
Contributors:
- Robert David
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
Downstream Tasks
KG/Ontology Embeddings
Contributors:
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
Please add other Downstream Tasks ...
Please add other Downstream Tasks ...
Please add other Downstream Tasks ...
User Interface / Access
Natural Language Interface to KG
Contributors:
- Diego Collarana (FIT)
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
KG to Natural Language (verbalization)
Contributors:
- Daniel Baldassare (doctima)
- Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...
Multilingual Translation of Literals
Contributors:
- Please add yourself if you want to contribute ... Please add yourself if you want to contribute ...
- Please add yourself if you want to contribute ...
- ...