UHG
Search
Close this search box.

How to Implement Knowledge Graph Using Python

Information Extraction is a process of extracting information in a more structured way i.e., the information which is machine-understandable. It consists of subfields which cannot be easily solved. Therefore, an approach to store data in a structured manner is Knowledge Graph which is a set of three-item sets called Triple where the set combines a subject, a predicate and an object.

Share

Knowledge Graph
Table of Content

Information Extraction is a process of extracting information in a more structured way i.e., the information which is machine-understandable. It consists of sub fields which cannot be easily solved. Therefore, an approach to store data in a structured manner is Knowledge Graph which is a set of three-item sets called Triple where the set combines a subject, a predicate and an object.

In this article, we will discuss how to build a knowledge graph using Python and Spacy.

Let’s get started.

Code Implementation

Import all the libraries required for this project.

import spacy
from spacy.lang.en import English
import networkx as nx
import matplotlib.pyplot as plt

These hubs will be the elements that are available in Wikipedia. Edges are the connections interfacing these elements to each other. We will extricate these components in an unaided way, i.e., we will utilize the punctuation of the sentences. 

The primary thought is to experience a sentence and concentrate the subject and the item as and when they are experienced. First, we need to pass the text to the function. The text will be broken down and place each token or word in a category. After we have arrived at the finish of a sentence, we clear up the whitespaces which may have remained and afterwards we’re all set, we have gotten a triple. For example in the statement “Bhubaneswar is categorised as a Tier-2 city” it will give a triple focusing on the main subject(Bhubaneswar, categorised, Tier-2 city).

Below we have defined the code to get triples that can be used to build knowledge graphs.

def getSentences(text):
    nlp = English()
    nlp.add_pipe(nlp.create_pipe('sentencizer'))
    document = nlp(text)
    return [sent.string.strip() for sent in document.sents]
def printToken(token):
    print(token.text, "->", token.dep_)
def appendChunk(original, chunk):
    return original + ' ' + chunk
def isRelationCandidate(token):
    deps = ["ROOT", "adj", "attr", "agent", "amod"]
    return any(subs in token.dep_ for subs in deps)
def isConstructionCandidate(token):
    deps = ["compound", "prep", "conj", "mod"]
    return any(subs in token.dep_ for subs in deps)
def processSubjectObjectPairs(tokens):
    subject = ''
    object = ''
    relation = ''
    subjectConstruction = ''
    objectConstruction = ''
    for token in tokens:
        printToken(token)
        if "punct" in token.dep_:
            continue
        if isRelationCandidate(token):
            relation = appendChunk(relation, token.lemma_)
        if isConstructionCandidate(token):
            if subjectConstruction:
                subjectConstruction = appendChunk(subjectConstruction, token.text)
            if objectConstruction:
                objectConstruction = appendChunk(objectConstruction, token.text)
        if "subj" in token.dep_:
            subject = appendChunk(subject, token.text)
            subject = appendChunk(subjectConstruction, subject)
            subjectConstruction = ''
        if "obj" in token.dep_:
            object = appendChunk(object, token.text)
            object = appendChunk(objectConstruction, object)
            objectConstruction = ''
    print (subject.strip(), ",", relation.strip(), ",", object.strip())
    return (subject.strip(), relation.strip(), object.strip())
def processSentence(sentence):
    tokens = nlp_model(sentence)
    return processSubjectObjectPairs(tokens)
def printGraph(triples):
    G = nx.Graph()
    for triple in triples:
        G.add_node(triple[0])
        G.add_node(triple[1])
        G.add_node(triple[2])
        G.add_edge(triple[0], triple[1])
        G.add_edge(triple[1], triple[2])
    pos = nx.spring_layout(G)
    plt.figure(figsize=(12, 8))
    nx.draw(G, pos, edge_color='black', width=1, linewidths=1,
            node_size=500, node_color='skyblue', alpha=0.9,
            labels={node: node for node in G.nodes()})
    plt.axis('off')
    plt.show()

We can use pyplot libraries to build the Knowledge Graph. The above code is for displaying the graph.

if __name__ == "__main__":
       text = "Bhubaneswar is the capital and largest city of the Indian state of Odisha. The city is bounded by the Daya River " \
            "to the south and the Kuakhai River to the east; the Chandaka Wildlife Sanctuary "\
            "and Nandankanan Zoo lie in the western and northern parts of Bhubaneswar." \
            "Bhubaneswar is categorised as a Tier-2 city." \
            "Bhubaneswar and Cuttack are often referred to as the 'twin cities of Odisha'. " \
            "The city has a population of 1163000."        
sentences = getSentences(text)
nlp_model = spacy.load('en_core_web_sm')
triples = []
print (text)
for sentence in sentences:
    triples.append(processSentence(sentence))
printGraph(triples)

When data processing is being done, the Spacy library attaches a tag to every word so that we know a word is either a subject or an object. Given below is an example.

Knowledge Graph

Final Thoughts

In this article, we figured out how to extricate data from a given book as triples and fabricate an information diagram from it. Further, we can explore this field of data extraction in more details to learn extraction of more perplexing connections. Hope this article is useful to you.

📣 Want to advertise in AIM? Book here

Related Posts
19th - 23rd Aug 2024
Generative AI Crash Course for Non-Techies
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.
Flagship Events
Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.