CS224W Lecture 4 Link Analysis: PageRank
introduction
- the web as a graph
- nodes: webpage, edges: hyperlinks
- directed graph
- all nodes are not equally important
- => rank the pages using link structure
- link analysis algorithm (to compute importance of nodes)
- page rank
- personalized page rank (PRR)
- random walk with restarts
PageRank
- Links as votes
- => page rank
- long term distribution of the surfers
- page rank: pincipal eigenvector of stochastic adjacency matrix
- we can efficiently solve with power iteration.
- page rank summary
-
how to solve pagerank
- problem:
- spider trap (all out links are within the group)
- spider traps are not a problem, but with traps, page rank scores are not what we want
- solution: random jump (or teleport)
- some pages are dead ends (no out link)
- dead ends are a problem
- solution: 100% teleport
- spider trap (all out links are within the group)
- problem:
Random Walk with Restarts and Personalized PageRank
Matrix Factorization and Node Embeddings
- simplest approach -> embedding lookup
- =>
- limitation:
- cannot obtain embeddings for nodes not in the trainset
- cannot capture structural similarity
- cannot utilize node, edge, and graph features
November 11, 2021
Tags:
cs224w