Misbehaving Data: When Direct Traffic Isn’t Direct Traffic

SEOs have long regarded Google with a counter-intuitive mix of wild fascination and aloof suspicion. The technological juggernaut that is Google aggregates a large amount of data – data that many assume is private – and has a great deal of control over how they organize and use that data, whether for good or (don’t be) evil.

When dealing with Google in search engine optimization, any good SEO will attempt to follow Google’s rules and guidelines to get results, and the rules change (sometimes without our knowledge).

Earlier this year, I wrote about how Google has made changes to the SERP space to incorporate more AdWords results at the expense of Organic results. And there have been numerous instances of the company using a policy of silence or secrecy when it comes to what data SEOs and searchers alike can access.

Google has often veiled the data it doesn’t want users to see. One example is its shift to hiding search terms from Google Analytics a few years back. Google also makes mistakes when it comes to data – for example, the ongoing problem of spam referral traffic registering fake hits in Google Analytics.

Which brings up the problem this article addresses: SEO practitioners have noticed in the past few years the “Direct Traffic” bucket in Google Analytics has shifted, and this article will be exploring why and what to do about it. But before that…

Let’s Head Back to School! (the Google Analytics Academy, that is)

In Google Analytics, traffic is segmented into source categories: Organic, Direct, Paid, Referral and Other. Digital Marketers and SEOs have an implicit trust that when a session hits their sites, Google will place the session in the correct traffic “bucket.”

But simply put, this isn’t always the case. Traffic is mislabeled or ends up in the wrong bucket all the time.

For this article, I would like to discuss “Dark Traffic”, which is a fancy way of saying “direct traffic that isn’t supposed to be labeled as direct traffic.”

Top SEO publications and even the general media have talked about the prevalence of Dark Traffic. This trend has some particularly doomsday-esque headlines such as this article from The Guardian from 2012 titled “Dark Social: We Have the Whole History of the Web Wrong.” This article not only dubbed the phenomenon “Dark Social” (which later morphed into “Dark Traffic”) but is cited as one of the first instances where this type of traffic caught marketers’ attention.

Why do SEOs care?

Simply put, this “free-for-all” data paints an inaccurate and potentially messy picture of your marketing efforts, both offline and online, as well as a problem for evaluating the success of your SEO services. It amounts to an Attribution Error: who or which channel gets the credit for the traffic or the conversions that are on the rise due to this Dark Traffic?

If your direct traffic begins to inflate, your organic results will appear smaller by comparison. This can greatly change your outlook on the data you are gathering, giving you unrealistic benchmarks and progress towards your goals.

Every session should be thoroughly examined. How can you evaluate the success of all the great PR, social media and link building that your company is doing? There is a diverse body of referral sources out there. Is it voice search? Social media? Apps?

Just how widespread is this?

Groupon asked the same question. In 2014, Groupon conducted a study which revealed that nearly 60% of the direct traffic to their longer URL pages was from organic search results that were miscategorized as direct.

To Google’s credit, there are countless ways that visitors can land on a site and new technologies, apps and other digital ecosystems are being built every day. Giving Google the benefit of the doubt, it’s very likely that adapting to these rapidly changing technologies can make reporting very difficult.

Your site and your data may be affected, especially if you’ve noticed that organic traffic has been gradually decreasing, while your direct traffic ratio grows. Here’s a non-exhaustive list of possible types of traffic that are ending up in your “direct traffic bucket.”

Dark Traffic could potentially be:

  • Apps – Links shared and activated in mobile apps can appear as direct traffic.
  • SMS / Chat – A link sent through a messaging program or through text can end up in the direct bucket.
  • Social Media – Social media has made reporting difficult for SEO professionals. Not only can the traffic be misattributed but some social media platforms have begun hosting links on their own site. Facebook’s Instant Articles are a type of this in-app visit.
  • Email – The link to the cat meme sent by your coworker doesn’t necessarily provide the site you visit with that referral data.
  • Secure Browsing or Incognito Mode – Some web users can cloak their referral data through “incognito” browsing or browsing securely.
  • Bots – They are here and numerous. Bots are notorious for complicating analytics data and they are no exception when it comes to Dark Traffic.
  • HTTPS to HTTP – Traffic that travels from a HTTPS domain to a non-HTTPS domain can shed referral data, making it appear as a direct visit in Google Analytics.

Actionable Steps

Now that SEOs have identified this phenomenon of Dark Traffic, what can be done to work around it?

Below, this article will explore some ways to identify a traffic source for misattributed direct traffic.

Although there are services that claim to clear up the direct traffic issue, the techniques below have been curated from some of the best SEO resources online to get your hands dirty with digging into the data with (mostly) free solutions.

Each site will have a unique direct traffic profile

There’s no rule book for identifying this traffic, so experimenting with your site’s unique direct traffic scenario will yield the best results. The most successful strategy will be to test all of these techniques and see which works best for your site.

1. Gather all social media posts & potential linking content within scope of date that the direct sessions were logged in Google Analytics. An ambitious scope would be 30 days and an optimistic one would be 90 days. You will want to find and categorize all of the potentially linking backlinks and store them so you can check against them when looking at session data.

It’s likely that you will need to get in touch with your Marketing and PR team to see what publications and channels they have been using. Also, getting your backlink profile from Majestic can help you get a complete picture.

2. Export your Google Analytics Direct Traffic Data to Excel and look at the URLs and then try to correlate with those recent or lively referral sources.

a. Check URLs of Landing Pages

Longer URLs are typically referrals because web users navigate to main pages directly, but are less likely to remember a long string of URLs.

a1. Pay Attention to URL Structure

If you find particularly telling long and specific URLs, you can look up the URL with your favorite backlink checking tool.  This could tell you the list of possible sources on the web.

a2. Test in Real Time

After you have managed to find an IP and the site source of a direct traffic visit, experiment with replicating the behavior by beginning on the referring URL in another browser and following the link to your site. Did it register as direct again? You can be reasonably sure that referral traffic from the link is being misattributed as direct.

3. Getting Technical

a. Examining a Logfile or using Opentracker

These activities can tell you what the server was doing at the time of the session and can yield the IP address of the visit.

b. Grouping similar IP Addresses

Grouping traffic addresses together can help create a pattern, if there is one. In the informative podcast, Experts on the Wire Episode 41, Dan Shure and Marshall Simmonds discuss that often the IP addresses will be from the visitor’s Internet Service Provider. But sometimes webmasters can get lucky and see the session’s referring IP address is connected to a referring site, like a social media click.

4. For the future:

a. URL Builder

The Google URL Builder tool from Google allows you to manually configure a link and use it for all your email marketing and social media. When a visitor uses that link, the data will be passed through to your Analytics using the dimensions you have set under a campaign.

b. Google Tag Manager

It’s an SEO’s best friend. Making custom tags for your various campaigns can help you force source and medium data from the campaigns to your Analytics.

c. Block Bots in GA

Bots are useful, annoying, and here to stay. In your “View Settings” tab in Google Analytics, Google has created the “Exclude all hits from known and spiders”. Although this is not a 100% fix and there are hundreds of more advanced filters out there, make sure you have this checked:

Conclusion

Traffic misattribution is a real problem in Google Analytics, but Google is a titanic company with thousands of projects on the back-burner, and sometimes even Google misses important problems and errors, but SEOs should still act with awareness of the problem and plan their reporting efforts accordingly.

Note: The opinions expressed in this article are the views of the author, and not necessarily the views of Caphyon, its staff, or its partners.

Author: Phil Mackie

Phil Mackie is a Digital Analyst for seoWorks, an Australia-based SEO service company. Phil loves to discuss and ponder Google’s next big move, while helping clients in different industries meet their Organic Traffic Goals.

7 thoughts on “Misbehaving Data: When Direct Traffic Isn’t Direct Traffic”

  1. What a terrific article. I’ve had times where my clients’ analytics show 90% direct traffic, leaving a mere 10% of other data that I am to use to measure trends!? It is frustrating. The Not – Set in the Queries section of Analytics is also a source of frustration. Arg!

  2. Interesting read Phil. This is a great but very intriguing topic. Thanks for the info! I have a question though, what do you think of traffic that comes from links you leave when commenting on blogs? it’s a direct traffic right? because it’s becoming too spammy at some point lately.

    1. Hi Emmerey, Apologies on my late reply. Actually, many blogs will not have their referral data removed. You can clearly see where the source is on those links most of the time.

      Larger social media platforms, you are far less likely to see that data pull through.

  3. Great article. One issue we are having is that on October 21, 2016, our direct traffic data basically flipped with our organic traffic data. So we went from 60% organic and 20% direct to 60% direct and 20% organic. And for the last year our organic traffic has been slowly decreasing. I am stumped. and can’t see any changes that would have caused this. Feel like doing an audit? We need help!

Leave a Reply

Your email address will not be published. Required fields are marked *