WebSpy is a Fastvue Product
  • Fastvue Home
  • Partners
  • Contact Us
WebSpy Vantage 3.0 Logo WebSpy Vantage 3.0 Logo WebSpy Vantage 3.0 Logo
  • Features
  • How it Works
  • Supported Log Files
  • Pricing
  • Support
  • Blog
  • Free Trial
Previous Next

Making Sensible Employee Internet Reports for the Modern Web (Part 2)

Update: The technique described in this series of blog articles has since been improved upon, and integrated into WebSpy Vantage 3.0 via the Origin Domain summary that is present in when analyzing any log files that contain URLs. We’ve called this feature Site Clean. It is also available in our separate Fastvue Reporter applications. See further details about our unique Site Clean engine.

In part one of this series, we looked at some of the challenges of reporting on the modern web. Advertising, visitor tracking, CDNs, widgets and APIs all contribute to the problem of cluttering employee Internet reports and making it difficult to find out what sites a user actually visited.

In this part of the series, we look at how we can derive some sense from all the noise the modern web creates.

Making Employee Internet Reports Make Sense Again

So how do we make sense of all this and create meaningful web reports that actually reflect peoples online activity? A knee jerk reaction would be to create a domain filter that removes all the advertising, tracking, CDNs and widgets from your reports. But then what would you be left with? Perhaps a 10 KB hit to the original website? That’s not very useful.

A better solution is to utilize both Referrer URLs and Mime Types to group resources under the original requesting site.

Referrer URLs

Referrer URL is the URL someone was on before they accessed the current URL. They’re typically used to discover how a visitor found your website. For example, if the Referrer URL is a Google search page, then they found your site through a Google search.

However Referrer URLs can also be used to find the original site that requested a web resource. For example, when browsing facebook, a friends profile picture will be downloaded from the CDN fbexternal-a.akamaihd.net, but the Referrer URL is still set to facebook.com as shown below.

Facebook Profile Picture Referer Example

So wouldn’t it be cool if web reports displayed the Referrer URL instead of the requesting URL for Advertising, Tracking, CDNs and Widgets? Yes. It would be very cool.

But how do we work out if a web resource is one of these things?

Again, a knee jerk reaction would be to create a domain list, or even use URL Categories if you’re analyzing logs from a secure web gateway or UTM that does URL Filtering. But creating and maintaining a list like this would be a nightmare! Fortunately there is a better way.

Mime Type

If you look at all the advertising, tracking, and CDN hits, you’ll notice that most of the hits are images, scripts, css files, streaming media content and so on. Fortunately there is a field in most log files called Mime Type that identifies the type of resource. Here are some Mime Types for common web resources:

  • image/png
  • text/javascript
  • text/css
  • application/octet-stream

For normal original web pages, the Mime Type is usually one of the following four types:

  • text/plain
  • text/html
  • text/html;charset=utf-8
  • text/html; charset=iso-8859-1

Note: The charset=… part is not technically part of the Mime Type, however Forefront TMG logs it as such. When analyzing Forefront TMG, we therefore need to take these strings into account.

The facebook profile picture from my example above has a Mime Type of image

Facebook Profile Picture MimeType Example.png

The Idea

So in theory, we can make a more sensible looking web report with the following assumption:

Anything with a Mime Type other than text/plain, text/html, text/html;charset=utf-8, or  text/html; charset=iso-8859-1  is a web resource (image, script etc).

For web resources, display the Referrer URL and for everything else (original html pages), display the original requesting URL.

Of course, this will also display the referrer URL for web resources hosted on the original site, but that doesn’t really matter, as the referrer URL will still be the original requesting site.

So how do we go about doing this in WebSpy Vantage? Continue on to part three in this series to find out!

See also:

  • Making Sensible Employee Internet Reports for the Modern Web (Part 4)
  • Making Sensible Employee Internet Reports for the Modern Web (Part 5)
  • Making Sensible Employee Internet Reports for the Modern Web (Part 3)
  • Making Sensible Employee Internet Reports for the Modern Web (Part 1)
  • The Best Way To Report On Websites

By Scott| 2018-04-30T07:16:02+00:00 October 3rd, 2013|Employee Internet Reports, How To, Log File Analysis, Microsoft Threat Management Gateway, Reports, Tips and Best Practices, Vantage, Web Browsing Analysis, WebSpy|Comments Off on Making Sensible Employee Internet Reports for the Modern Web (Part 2)

Share This Story, Choose Your Platform!

FacebookTwitterLinkedinRedditTumblrGoogle+PinterestVkEmail

About the Author: Scott

Co-founder and Chief Product Officer at Fastvue. I spend my time making sense of the way firewalls and web gateways log traffic so that our customers don't have to!

Related Posts

  • WebSpy Vantage 3.0 Now Available

    December 13th, 2017
  • Analyzing Blocked Traffic in Log Files for Suspicious Activity

    March 27th, 2017
  • Creating a Remote Desktop Report (RDP Connections) with WebSpy Vantage

    February 15th, 2016
  • Distributing Web Activity Reports to Managers Using WebSpy Vantage

    February 3rd, 2016
  • Web Activity Reporting with Palo Alto Firewall Log Files

    December 15th, 2015

WebSpy Vantage Ultimate

  • Features
  • How it Works
  • Supported Log Files
  • Pricing
  • Support
  • Blog
  • Free Trial

Fastvue Quick Links

  • Fastvue Home
  • Partners
  • Contact Us

About WebSpy

WebSpy Vantage Ultimate is an extremely flexible, generic log file analysis and reporting framework supporting over 200 log file formats. WebSpy Vantage Ultimate is developed and maintained by Fastvue, a team of log analysis professionals dedicated to making sense of your log file data!
Copyright 2020 Fastvue Inc | All Rights Reserved | Privacy Policy | Terms Of Use | Cookie Settings
TwitterFacebookVimeo