Greg Dziemidowicz's Blog

software development related topics

Extracting Your LinkedIn Connections Into Neo4j Graph Database

Today I was playing with exploring my LinkedIn network – just for fun and as a pretext to play with Neo4j.

I’ve managed to ask simple questions like “With whom do I have the most contacts in common?” or “What is the most popular first name in my network?” (Piotr & Marcin)

Challange #1 – How to get the data?

I have decided to use inmaps from Linkedin. Below you can see screenshot of my network:

alt tag

By using Chrome Developer Tools you can see Network traffic made by inmaps. There are 2 interesting resources for us:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
http://inmaps.linkedinlabs.com/network_data
{
  "edges":[
    {"dest":"Um6QrFGUXX","src":"Uv1KrWpoXX"}, ...
  ]
}


http://inmaps.linkedinlabs.com/connections_data

{ 
  "Um6QrFGUXX":{
    "firstName":"Marcin",
    "lastName":"Nowak",
    "headline":"Developer"
  },
  "Uv1KrWpoXX":{
    "firstName":"Piotr",
    "lastName":"Kowalski",
    "headline":"Manager"
  },
  ...
}

I’ve saved content of both on my file system. This is the data that we can now import into Neo4j.

Importing data into Neo4j

First you need Neo4j, you can get it here: http://www.neo4j.org/download

I’ve used simple Ruby script similar to this in order to import data:

import.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# gem 'neo4j-core', "~>3.0.0.alpha"


@session = Neo4j::Session.open(:server_db, "http://localhost:7474")

def get_person id
  Neo4j::Label.find_nodes(:person, :id, id).first
end

def upsert_person key, value
  id = key
  properties = {
    id: key,
    firstname: value["firstName"],
    lastname: value["lastName"],
    headline: value["headline"]
  }

  existing = get_person(id)
  if existing
    existing.update_props(properties)
  else
    person = Neo4j::Node.create(properties, :person)
  end
end

def upsert_connection(id, other_id, type)
  properties = {type: type}

  person = get_person(id)
  connection = get_person(other_id)

  existing = person.rels(type: :connected, dir: :incoming, between: connection).first
  if existing
    existing.update_props(properties)
  else
    person.create_rel(:connected, connection, properties)
  end

end

network_data = JSON.parse(File.read('network_data.json'));
connections_data = JSON.parse(File.read('connections_data.json'));

connections_data.each do |key, value|
  upsert_person key,value
end

network_data["edges"].each do |edge|
  upsert_connection(edge["dest"],edge["src"],"src")
end

Exploring the data

Once data was imported I went to http://localhost:7474/browser/ and started experimenting with queries.

With whom do I have the most contacts in common?

1
2
3
4
5
MATCH    (user)-[r]-(friend)
WITH     user, count(friend) AS friends
ORDER BY friends DESC
WHERE    friends > 90 
RETURN   user.firstname, user.lastname, user.headline, friends

What is the most popular first name in my network?

1
2
3
4
5
6
7
8
MATCH    (user)
WITH     user.firstname as firstname, 
         collect(DISTINCT user.lastname) as lastnames,  
         count(DISTINCT user.lastname) as c
ORDER    BY c DESC     
RETURN   firstname, lastnames, c

("Piotr", "Marcin", "Pawel", "Anna", "Paul", "Lukasz")

What are the most popular headlines in my network?

1
2
3
4
5
6
7
8
9
MATCH    (user)
WITH     left(upper(trim(user.headline)), 12) as headline, 
         collect(DISTINCT (user.firstname + user.lastname)) as names,  
         count(DISTINCT (user.firstname + user.lastname)) as c
ORDER    BY c DESC
RETURN   headline, names, c


("RECRUITMENT", "SOFTWARE DEV", "SOFTWARE ENG")

That’s it for today ;)

Comments