WebSpy Vantage 3.0

Making Sensible Employee Internet Reports for the Modern Web (Part 4)

Through parts onetwo, and three of this series, the challenges of creating employee Internet reports for the Modern Web have been explained, a solution has been proposed, and we’ve implemented it using Custom Expressions in WebSpy Vantage. In this fourth part of the series we will look at some ways we can further improve the Custom Expression.

Tweaking the Sensible Sites Expression

To see why these ‘junk’ sites are still appearing, lets add a new node to the report called ‘Debugging’ that  shows the Sensible Site, along side the original domain and the referrer domain. We’ll also show the Mime type and URL Category as well.

  1. Go back to your Report template and duplicate the Sensible Sites node in your report with copying / pasting:
    • Right-click the Sensible Sites section and click Copy.
    • Then click the top/root node (Sensible Site Report node) and click Paste.
  2. Now double-click the Second ‘Sensible Sites’ node that you just pasted
  3. On the General page, rename the node to Debugging.
  4. Still on the General page under the columns section, click Add | Key.
  5. Select Site Domain and click OK.
  6. Click Add | Key again. Select Referrer Domain and click OK.
  7. Click Add | Key again. Select Mime Type and click OK.
  8. As I’m using Forefront TMG, I’m also going to check out the URL Categories for the sites. If you are too, click Add | Key again. Select URL Category and click OK.
  9. Rearrange the new key columns to push them all up to the top of the column listing.
  10. Click Next and sort the node by Sensible Sites Ascending. This will sort the list alphabetically by the sensible site.
  11. Click OK to save the new node to your report.

Now run the report again. Your new report will have a new section called ‘Debugging’ that looks like this.

Fixing Blank Referrer URLs

You can see why the third site in my Sensible Sites report was a blank. There are 49 hits in my data set where the Referrer URL is blank. We can modify the custom expression to show the original requesting URL when the Referrer URL is blank.

Here’s the custom expression to show the original requesting URL when the Referrer URL is blank:

iif([MimeType] = "text/plain" || [MimeType] = "text/html" || [MimeType] = "text/html;charset=utf-8" || [MimeType] = "text/html; charset=iso-8859-1", domain([Site.Host]), iif(domain([Referrer.Host]) = '' || domain([Referrer.Host]) = '-', domain([Site.Host]), domain([Referrer.Host])))

In other words, if the Mime Type is any one of our four ‘normal page’ Mime Types (as discussed in part two), show the original Site Domain, otherwise  if the Referrer Domain is blank or ‘‘, show the original Site Domain, otherwise show the Referrer Domain.

Phew… Got that? 🙂

Even though this new expression places more of the ‘junky’ sites back into the report, you’ll notice that the actual sites are still in a dominating position.

 

You may be tempted to leave the Custom Expression the way it was so that these ‘junky’ sites get grouped under the blank site, however I recommend against that.

Embedded YouTube videos unfortunately do not include a Referrer URL for the streaming media content, and nor do many other embedded web page elements that use iFrames. Also other applications such as Windows Updates do not include a Referrer URL. These important applications will therefore be hidden under the blank referrer, if you do not use the new expression above.

Improving with URL Categories

You may also notice some other situations where it makes sense to show the Referrer URL even when the Mime Type is text/plain or text/html. For example:

In this case, while browsing techcrunch.com, my browser requested a resource from the advertising site atwola.com that has the Mime Type text/html. This happens. For example, some normal HTML is required to display a facebook ‘Like’ button on a page, in addition to scripts and images.

My Forefront TMG server has correctly classified this hit as Web Ads. We can improve the custom expressions to always show the Referrer URL for Web Ads.

iif( ([MimeType] = "text/plain" || [MimeType] = "text/html" || [MimeType] = "text/html;charset=utf-8" || [MimeType] = "text/html; charset=iso-8859-1" ) && [UrlCategoryName] != 'Web Ads', domain([Site.Host]), iif( domain([Referrer.Host]) = '' || domain([Referrer.Host]) = '-', domain([Site.Host]), domain([Referrer.Host]) ) )

Lets take this one step further to include the URL Category for CDNs. Forefront TMG categorizes CDNs as Edge Content Servers/Infrastructure.

iif( ([MimeType] = "text/plain" || [MimeType] = "text/html" || [MimeType] = "text/html;charset=utf-8" || [MimeType] = "text/html; charset=iso-8859-1" ) && ([UrlCategoryName] != 'Web Ads' && [UrlCategoryName] != 'Edge Content Servers/Infrastructure'), domain([Site.Host]), iif( domain([Referrer.Host]) = '' || domain([Referrer.Host]) = '-', domain([Site.Host]), domain([Referrer.Host]) ) )

For those playing at home, this basically says, if the Mime Type is any one of our four ‘normal page’ Mime Types AND the URL Category is not Web Ads AND the URL Category is not Edge Content Servers/Infrastructure, show the original Site Domain, otherwise  if the Referrer Domain is blank or ‘‘, show the original Site Domain, otherwise show the Referrer Domain.

The great thing about including URL Categories in the expression is that this gives you a way of including sites into the mix. You can use Forefront TMG’s URL Overrides to re-classify sites as Web Ads or Edge Content Servers/Infrastructure to ensure that the Referrer URL is shown whenever possible.

Let’s rerun the report and check out the Debugging section again.

 

In the Screenshot above, I’ve highlighted the  rows where the URL Category is either Web Ads or Edge Content Servers/Infrastructure and the Mime Type is one of the four ‘normal page’ Mime Types, and you can see that the Sensible Site is now using the Referrer URL correctly.

So lets check out our report in the fifth and final part of this series.

See also: