Understanding Knowledge Graphs

Introduction

Humans understand and learn from their experiences. They gain insights and knowledge through this learning process. All this knowledge stored in our brain is well structured and we easily devise relations among two different but similar subjects.

To me this is a layman understanding of the concept of a knowledge graph.

Though the phrase “knowledge graph” has been used in the literature since at least 1972, the modern incarnation of the phrase stems from the 2012 announcement of the Google Knowledge Graph followed by further announcements of the development of knowledge graphs by Amazon, Facebook, IBM, Microsoft, Uber, and more besides.

Underlying all such developments is the core idea of using graphs to represent data, often enhanced with some way to explicitly represent knowledge. Employing a graph-based abstraction of knowledge has a number benefits in such settings when compared with, for example, a relational model or NoSQL alternatives. Graphs provide a concise and intuitive abstraction for a variety of domains, where edges capture the (potentially cyclical) relations between the entities inherent in social data, biological interactions, bibliographical citations and co-authorships, transport networks, and so forth. Graphs allow maintainers to postpone the definition of a schema, allowing the data and its scope to evolve in a more flexible manner than typically possible in a relational setting, particularly for capturing incomplete knowledge.

More formally a KG is defined as a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent relations between these entities. The graph of data (aka data graph) conforms to a graph-based data model, which may be a directed edge-labelled graph, a property graph, etc. By knowledge, we refer to something that is known. Such knowledge may be accumulated from external sources, or extracted from the knowledge graph itself. Knowledge may be composed of simple statements, such as New Delhi is the capital of India”, or quantified statements, such as “all capitals are cities”. Simple statements can be accumulated as edges in the data graph.

Example Knowkedge Graph

Knowledge Graph Embeddings

Methods for machine learning have gained significant attention in recent years. In the contextof knowledge graphs, machine learning can either be used for refining the knowledge graph itself, which involves predicting new edges and/or identifying erroneous edges (discussed further or for downstream tasks, where the knowledge graph is used to train models for classification, recommendation, regression, etc., in the application domain. However, many traditional machine learning techniques assume dense numeric input representations in the form of vectors, which is quite distinct from how graphs are usually expressed.

So how can graphs – or nodes, edges, etc., thereof – be encoded as numeric vectors?

  • Translational Models
  • Tensor Decomposition Models
  • Neural models
  • Neural models
  • Language models

Graph Neural Networks

While embeddings aim to provide a dense numerical representation of graphs suitable for use within existing machine learning models, another approach is to build custom machine learning models adapted for graph-structured data. Most custom learning models for graphs are based on (artificial) neural networks, exploiting a natural correspondence between both: a neural network already corresponds to a weighted, directed graph, where nodes serve as artificial neurons, and edges serve as weighted connections (axons). However, the typical topology of a traditional neural network more specifically, a fully-connected feed-forward neural network is quite homogeneous, being defined in terms of sequential layers of nodes where each node in one layer is connected to all nodes in the next layer. Conversely, the topology of a data graph is quite heterogeneous, being determined by the relations between entities that its edges represent.

A graph neural network (GNN) [436] builds a neural network based on the topology of the data graph; i.e., nodes are connected to their neighbours per the data graph. Typically a model is then learnt to map input features for nodes to output features in a supervised manner; output features for example nodes may be manually labelled, or may be taken from the knowledge graph.


Knowledge is power.

Resources:

DBPedia FreeBase HolE