Visualizing Internal Link Juice Flow and Power with Gephi

Screaming Frog, SEMrush, and other SEO tools provide us with a staggering amount of data to help us understand our internal links.

However, making sense of all this data can be overwhelming at times. If you’re looking to make further improvements to your links, you may benefit from seeing things from a fresh perspective.

(Screenshot of Internal Link Juice Flow for seoinc.com from Gephi using Fruchterman Reingold Layout)

In this tutorial, I’m going to show you how you can use Gephi and the URL rating metric from Ahrefs to visualize your site’s internal linking structure. You’ll be able to see not only how but where link juice flows between your internal links.

What You’ll Need

  • Screaming Frog (or similar crawler such as Xenu)
  • Excel (or similar open source programs such as Google Sheets with VLOOKUP functionality)
  • Ahrefs (or similar backlink tool such as Majestic or Open Site Explorer)
  • Gephi (Compatible with Windows, Mac OS X and Linux)

What You Can Do with This

Working through this process will let you visualize your internal linking structure and internal links by URL Rating, anchor text, follow status, status code, and more.

Gephi as an interactive tool will help you quickly identify gaps and deficiencies in your internal linking and identify outliers and clusters. You’ll gain a new perspective on your site’s internal linking that you can’t get from looking at raw data from tools such as SEMrush and Screaming Frog.

Getting Started

Screaming Frog

Start by running a crawl on your website with Screaming Frog. Once the crawl finishes, go to the Bulk Export option, select the Response Codes drop down menu, and select Success (2xx) Links.

Save the file in .CSV format in a location you will remember.

Ahrefs

Open up Ahrefs and enter your domain in the site explorer. Under the Pages section on the left hand side, click Best by links, then click on the Export option on the upper right.  

Once you get the Export option, be sure to select Full Export so you get the URL Rating for all of the pages Ahrefs has been able to crawl. Make sure the Microsoft Excel (UTF-16) button is selected beneath the .CSV format menu. Then click Start Export.

You should now have two .CSV files. The first is from Screaming Frog, which will have all of your internal links that are currently resolving with 200 status codes. Your second should be from Ahrefs with a list of all your pages and their URL ratings.

Next, we’ll need to clean up and remove some of the unnecessary data from the spreadsheets. Then we’ll go about consolidating all of the important information into a single .CSV file.

Open both .CSV files, then drag the Ahrefs .CSV tab into the Screaming Frog Excel window. Like so:

In your success_(2xx)_inlinks file, go ahead and remove the first row that says “Success (2xx) Inlinks.” I recommend adding a filter to the Type column and removing all entries with Types other than “HREF,” then delete the column after all of the CSS, JS, SWF, and IMG entries have been removed.

Now we can add the URL Rating to the success_(2xx)_inlinks file for each internal link source using VLOOKUP functionality. Create a new column next to the Source column and call it URL Rating.

In the column below the URL Rating label, add the following formula:

=VLOOKUP($A2,'seoinc.com-best-pages-by-links-'!$A$2:$B$3500,2,FALSE)

What does this formula do? This formula will automatically search for the URL Rating for each row’s Source URL that exact matches the Ahrefs file. It will then fill in the URL Rating column automatically.

  • Be sure to change the name of the second spread sheet (the Ahrefs spreadsheet), as your domain name will be different. So change seoinc.com to your website.
  • You will also need to change the range of the rows selected from the Ahrefs spreadsheet (the number next to $B). My example above has 3500 rows so that is where I chose to end the formula.

Once you have the formula plugged in for the first column, you should see a correlating URL Rating number for your first source URL. It looks like this:

Now we need to fill up the rest of the rows under column B with the appropriate URL Rating by auto filling the formula. All you have to do is double click the green box in the bottom right corner of that cell.

You may see some columns with URL Ratings that are #N/A. Don’t worry – this will happen on URLs that Ahrefs hasn’t crawled yet. It’s safe to say that these pages likely have a very low URL rating, so you can leave these entries as they are, as Gephi will read the values as null once you import the spreadsheet.

Change the Destination column label text to Target. Then save the active spreadsheet file as a .CSV.

Visualizing the Data

Now you’re ready to import the spreadsheet into Gephi. This will let you see a diagram of how your internal links are structured.

Gephi

Open Gephi, and click “New Project” in the dialog that pops up.

Next, click on File tab and click the Import Spreadsheet option in the drop down.

Find the .CSV file that we made in the previous step, the one with our conjoined spreadsheets. Select the Edges table option under As table:

Click Next, and Gephi will ask you to assign a data type for each column. You will want to change the URL Rating data type from the default String option to Float. I would also recommend changing the Status Code data type to Float and changing the Follow data type to Boolean. Make sure the Create missing nodes option at the bottom is checked, then click Finish.

Now you should have some data in your Data Laboratory tab. Within the Data Laboratory tab, your data is split up into two tabs.

In one tab, you have your Nodes. These are the Target URLs from your spreadsheet. In your other tab you have your Edges, which are the Source URLs and all of their accompanying data.

Now, let’s go to the Overview tab and start visualizing your data! You should see a dark blob, which represents all of your unfiltered data.

We’ll want to apply a Layout to make the information digestible and visually pleasing. A few Layouts do a good job of visualizing this type of data such as Yifan Hu, Force Atlas, and Fruchterman Reingold.

In my example I will be using the Fruchterman Reingold layout, as it yields a visualization with less overlap so all of my Nodes (Target URLs) are more visible. The Force Atlas and Fruchterman Reingold layouts tend to run indefinitely, so you can pause them after they have run for about a minute (once the Graph has mostly settled). If you do choose to use the Yifan Hu, this layout doesn’t run indefinitely and will stop once it is finished, so you don’t need to worry about pausing.

You should see something similar to this now (although probably without the two clusters like mine):

This data is still pretty ugly and indigestible, so let’s change the appearance of our Nodes and Edges based on the various attributes in our data.

First, under the Appearance tab, click on the Nodes tab, and select Ranking with the In-Degree (which will color the nodes based off the number of incoming links) option with the color palette option selected. Let’s change the color scheme from white to green to light yellow to dark red to create a nice looking heat map, then click the Apply button when you are satisfied with the color scheme selected.

Now you should have something that looks similar to this, with the colored nodes based on incoming links:

There’s still room for improvement in the appearance department, though. Let’s also change the size of the nodes based on the amount of incoming links using In-Degree.   

Select the second button on the upper right (the one with the different sized circles), and under the Nodes tab click on Ranking. Choose In-Degree (number of incoming links) from the drop-down menu again and change the Min size to 10 and Max size to 30 for now. Click Apply.

Still looks kind of spooky, doesn’t it? Now let’s change the color scheme of the incoming links. We’ll use a heat map based on their URL rating so we can see where your most powerful internal links are.

Click on the Edges tab and go to the Ranking option with the color palette selected. Choose URL Rating as our attribute and change the color scheme to a heat map like before, but with some darker colors. Click Apply.

Now your incomings links are colored by URL Rating! It should look something like this:

Looking good. At this point we can explore your data and interpret it.

You can use your mouse to hover over nodes. Right click and Select inspect in data laboratory to see what page the node represents.

In this particular example, the darkest/largest node in the center is seoinc’s home page. The node that is right next to our home page that is lighter/smaller represents our blog. You’ll notice that in the graphs I’ve created, it looks like two cells dividing under a microscope.

The smaller circle in the bottom right with some of the darker/larger nodes represents our websites service pages, as we heavily promote those pages internally on our website so they soak up a lot of our internal link juice.

The links in the outer orbit of the graph represent some of our blog posts that have very few internal links being pointed at them. Some of the blogs that we internally link to more often make up the inner orbit surrounding our home page, blog, and service pages.

I didn’t have any disconnected outliers in my example, but you’ll certainly want to keep an eye out for these on your site. Each outlier you find is an opportunity to improve your internal linking.

Now that we’ve examined some of the data, let’s go over to the Preview tab and tidy up your graph a little bit so you can show your colleagues how awesome we are.

The preview tab will be blank until you click on the Refresh button. Click it, then you should see a curved version of the graph:

Feel free to experiment with various options and presets to get what you’re looking for, but I think this is the most visually pleasing. More importantly, it also lets you easily see where the pages with your strongest URL ratings are linking to.

Continue the Experiments

Now that you’ve got the hang of Gephi, feel free to experiment with your internal linking data. There are a lot of cool things that you can highlight such as follow and nofollow links, internal anchor text, status codes, and more.

Want to take your newfound skills to the next level? You can look into visualizing your external link data and other information as well. The possibilities are endless!

Note: The opinions expressed in this article are the views of the author, and not necessarily the views of Caphyon, its staff, or its partners.

Author: John Caiozzo

John Caiozzo is an SEO Analyst at SEO Inc., one of the top Search Engine Optimization companies in the world since 1997. John specializes in creating advanced technical SEO solutions and strategies to drive more traffic and conversions to client websites.

6 thoughts on “Visualizing Internal Link Juice Flow and Power with Gephi”

  1. Dear John,

    Your post appeared at a fabulous time. I’d been wanting to play with Gephi ever since I saw a link in SELand (i think). So far, I’ve been procrastinating it. After reading the post I suddenly have an itch to test it out.

    I noticed you’ve only selected only the 200 status urls. I wonder how the visualization would change if we also add the 404 URLs data. Or maybe study only the 404 urls data.
    Interesting, would you say?

    Thanks
    Rajiv

    1. Hey Rajiv,

      Thank you! Our website has maybe 1 or 2 404 errors so a graph wouldn’t be interesting for our site, however, I think it could be extremely useful for a new project or client as it may reveal clusters of broken URL’s from previous platform migrations, broken product categories, etc…

  2. Your VLOOKUP formular didnt work for me so i found another approach (replace ; with , for localization problems):

    =INDEX(bestpages!$A$1:$A$1000; MATCH($A2; bestpages!$B$2:$B$1000;0))

  3. Nice! Now if only Screaming Frog would take into account the position of the link. An inlink now gets the same value. But we all know that Google takes into account if the internal link is inside a menu, or in-text, or in the footer (least value).

    If anyone would know a tool that could calculate that internal flow mathematical/googly correct, let us know!

  4. Hi,

    Thank you for this great internal linking process method. Does it make any problem If my Webpage contains two same links with different anchor text? I’m asking you this because, I use WordPress for blogging and there is a “Recent Post widget” in sidebar. Sometimes, If I link a “recent post” in my upcoming article then the “recent post” will get two links with different anchor text.

    What you suggest me on this issue?

    Thank You

Leave a Reply

Your email address will not be published. Required fields are marked *