Sunday 25 August 2013

Graph Databases: Analysing a wide variety of datasets, relationships, events, behavior - Part 1

Graph databases allow us to analyse and query connected data. Doing the same with relational databases although possible becomes cumbersome. Relationships are at the centre of Graph modelling and are themselves data. Consider the information that 'Charlie called Betty over phone'. We can model this using relational data or others. The question always when data modelling is what queries do you want to support. If we expand the above example to include sms, emails (with CC, BCC) and then we want to find who called Charlie and email'ed Betty from the Finance department to which they returned the call/email? The relations emailed, called, sms'ed are easy to model in graphs. This type of data modelling is applicable to real world scenarios such as communication analysis, behavior analysis, networks, asset management and social networks. 

Another example on social networks. Suppose we want to find who are friends with Charlie, who like Heineken, watched the movie 'Avatar' and also friends with  Betty? This is where Graphs are better at modelling the data. In a graph database you deal with entities/nodes, relationships and attributes. Both nodes and relationships can have attributes. Attributes are simply key-value pairs. Have a look at the code that creates the example graph below in this post. Based on your query you walk the graph to find nodes that satisfy your criteria. Social networks such as Facebook, Google Plus and Linkedin use such datasets to suggest connections, recommendations and serving ads.   

The following is an example of a small graph of 10 people, their likes and connections to one another. We use Neo4j graph database and its query interface Cypher for this example. The entities involved are listed below.

a) List of friends: Hari, Catherine, Ted Lawson, Brandon Bringle, Barney Bringle, Jamie Lawson, Vickie Lawson, Harriet Bringle, Julie Crawford, Mathew Alan. 

b) Other entities:
Hobbies: Coding, Walking
Food: Burger, Icecream
Beverage: Heineken, Coke
Sport: Badminton, Rugby



c) Sample relationships for our small social network

Catherine LIKES Badminton, Coding, Walking
Catherine is Friends with Jamie and Hari on the network.

Ted Lawson LIKES rugby, Ice cream 
Ted is Friends with Vickie, Catherine and Brandon.
......etc. You get the idea. The Cypher code to create this graph is listed at the end.

d) We can run queries such as "Who are friends with Hari and like Coding?".
Who are friends with Hari, Mathew and like coding and Icecream?

Screen shots: a) Creating the database on Neo4j


b) Running a query1 mentioned above

c) Running query2 discussed above

Sample Code: a) Creating the database

CREATE (Hari {name: "Hari", born: "1980"}),
(Cath {name: "Catherine", born: "1982"}),
(Ted {name: "Ted Lawson"}),
(Bran {name: "Brandon Bringle"}),
(Barney {name: "Barney Bringle"}),
(Jamie {name: "Jamie Lawson"}),
(Viki {name: "Vickie Lawson"}),
(Harriet {name: "Harriet Bringle"}),
(Julie {name: "Julie Crawford"}),
(Matt {name: "Mathew Alan"}),

(Hobby1 {name: "Coding", type: "tech"}),
(Food1 {name: "Veggie Beany Burger", type: "food"}),
(Beverage1 {name: "Heineken", type: "beverage"}),
(Sport1 {name: "Badminton", type: "game"}),

(Hobby2 {name: "Walking", type: "activity"}),
(Food2 {name: "Black Current", type: "icecream"}),
(Beverage2 {name: "Coke", type: "beverage"}),
(Sport2 {name: "Rugby", type: "game"}),

(Cath)-[:LIKES]->(Sport1), (Cath)-[:LIKES]->(Hobby1), (Cath)-[:LIKES]->(Hobby2),
(Cath)-[:FRIEND_WITH]->(Hari), (Cath)-[:FRIEND_WITH]->(Jamie),

(Ted)-[:LIKES]->(Sport2), (Ted)-[:LIKES]->(Food2), 
(Ted)-[:FRIEND_WITH]->(Viki), (Ted)-[:FRIEND_WITH]->(Cath),  (Ted)-[:FRIEND_WITH]->(Bran),

(Bran)-[:LIKES]->(Food1),  (Bran)-[:LIKES]->(Food2), (Bran)-[:LIKES]->(Beverage1),
(Bran)-[:FRIEND_WITH]->(Hari), (Bran)-[:FRIEND_WITH]->(Ted),  (Bran)-[:FRIEND_WITH]->(Harriet),

(Barney)-[:LIKES]->(Beverage1), (Barney)-[:LIKES]->(Beverage2),
(Barney)-[:FRIEND_WITH]->(Hari), (Barney)-[:FRIEND_WITH]->(Ted),  (Barney)-[:FRIEND_WITH]->(Harriet),
(Barney)-[:FRIEND_WITH]->(Cath),

(Jamie)-[:LIKES]->(Hobby1), (Jamie)-[:LIKES]->(Food2),
(Jamie)-[:FRIEND_WITH]->(Hari), (Jamie)-[:FRIEND_WITH]->(Viki),  (Jamie)-[:FRIEND_WITH]->(Harriet),
(Jamie)-[:FRIEND_WITH]->(Julie),

(Viki)-[:LIKES]->(Hobby1), (Viki)-[:LIKES]->(Food2), (Viki)-[:LIKES]->(Food1),
(Viki)-[:FRIEND_WITH]->(Hari), (Viki)-[:FRIEND_WITH]->(Jamie),  (Viki)-[:FRIEND_WITH]->(Harriet),
(Viki)-[:FRIEND_WITH]->(Julie),

(Harriet)-[:LIKES]->(Food2), (Harriet)-[:LIKES]->(Food1),
(Harriet)-[:FRIEND_WITH]->(Matt), (Harriet)-[:FRIEND_WITH]->(Jamie),  (Harriet)-[:FRIEND_WITH]->(Julie),

(Julie)-[:LIKES]->(Food2), (Julie)-[:LIKES]->(Food1),(Julie)-[:LIKES]->(Hobby2), (Julie)-[:LIKES]->(Beverage2),
(Julie)-[:FRIEND_WITH]->(Matt), (Julie)-[:FRIEND_WITH]->(Jamie),  (Julie)-[:FRIEND_WITH]->(Harriet),

(Matt)-[:LIKES]->(Food1), (Matt)-[:LIKES]->(Beverage2),
(Matt)-[:FRIEND_WITH]->(Harriet), (Matt)-[:FRIEND_WITH]->(Julie);


b) Query 1
START hari=node(*) MATCH (hari)<-[:FRIEND_WITH]-(friends), (friends)-[:LIKES]->(Hobby)
WHERE Hobby.type! = 'tech'
RETURN DISTINCT friends.name;


c) Query 2
START hari=node(*) MATCH (hari)<-[:FRIEND_WITH]-(friends), (friends)-[:LIKES]->(Hobby), (friends)-[:LIKES]->(Food), (friends)-[:FRIEND_WITH]->(target)
WHERE target.name! = "Mathew Alan" AND Food.type! = "icecream" AND Hobby.name! = "Walking"
RETURN DISTINCT friends.name; 


References:
1) www.neo4j.org




No comments: