WebSpy Vantage 3.0

A Complete Guide to Useful Reverse Proxy Reporting

Reverse proxy reporting (using WebSpy Vantage Ultimate) is a great way to gain insight to how the Internet is using your published web sites or web applications.

Forward proxy reporting is all about the users accessing content on the Internet from within your corporate network. Reverse proxy reporting is the opposite. When you have a web site that is hosted on your internal or DMZ network and you publish it to the Internet through a device like Microsoft Forefront TMG or Sophos UTM’s Web Application Firewall feature then you have a reverse proxy scenario.

Proxy Logs vs Web Analytics Apps

One question that often comes up is why should one use reverse proxy logs to analyse a site’s usage rather than something like Google Analytics. There are numerous reasons, but if your published application or business application is not simply a public website the following are the most important reasons:

Sounds good, how do I set up Reverse Proxy Reports?

This guide will step you through the basics to get going. We always recommend working with a small manageable dataset to speed up the development of filters, templates and reports. If you are following along, we suggest importing a single log file or a limited amount of files. That said, if the sample is too small you might not get good visibility, but if it’s too big it will slow down your testing.

Filter by Source Network

Some environments are fortunate enough to have a separate or discreet forward and reverse proxy setup. That is, the forward proxy and the reverse proxy are not on the same device, such as a single Microsoft Forefront TMG Server instance. Even if your environment is consolidated, having both the forward and reverse proxy on the same device, you can still make use of this guide. The key is knowing how to filter by source network.

Reverse proxy traffic always has the Internet as the source. In Microsoft Forefront TMG, that is defined as the External Network. By isolating traffic that was initiated on the External Network you eliminate other sources of traffic such as internal, DMZ or VPN.

Since you would generally be running very different reports for forward and reverse proxy traffic, it makes sense to create a separate Storage in WebSpy Vantage for reverse proxy reporting. Specifying the source network in an Import Filter makes sure that your Storage only contains the reverse proxy log data, which makes further filtering and analysis simpler and faster.

Create a Storage and import your log files

Your raw log files need to be imported into a Storage, which will act as a “database” against which the reports will be generated.

Summary analysis

Once your (filtered) log data has been imported you can have a look to see if everything is as you expect it to be. The Summary Analysis in Ad-hoc mode will also show you what log data is available for use in generating templates and reports.

To verify that the import filter was successful check the Source Network node and confirm that only External is listed.

Identifying your published web sites

Using the same ad-hoc Summary Analysis, check the Rule node and verify that your “publishing rules” are the only ones listed. Having a separate publishing rule for each site makes analysis and reporting a little easier because each rule can be considered a ‘site’. If however, you have multiple sites published through a single rule you can use the Site Name field to differentiate between sites.

The inverse of this is when you have a single application that is published on multiple servers with multiple rules. In this case you would want to combine the rules for reporting purposes. The best way to do this is by using an Alias.

Now that the Alias has been created we can define groups and add values to them.  The easiest way is to simply use the Analysis we already have open.

This gives you a single rule for the whole application published across multiple rules or servers. The alias not only cleans up your view, but it can also be used when specifying filters and generating reports. The alias consolidates the log data, but it is still available if you want to break it up again.

Selecting the alias will trigger a drill down and from here you will be able to see the individual rules again if you change the Alias view to ‘No Alias’.

In practice you would use both methods. You may for instance want to get an overall picture of your Exchange web site usage. In many cases this would be published with a single rule, but it will contain multiple site names such as webmail, autodiscover, legacy and so on. If you want to get more details on which components are used, you would use Site Names without an alias.

Useful information for reverse proxy reports

Now that we have a method to isolate the various published web sites or applications, we can investigate the kind of information you might want to report on. In the reverse proxy scenario you typically want to know:


Since the usernames are generally not known, you can determine who the user is by looking at the Source IP and the User Agent fields. This helps identify unique users and the device types they are using.


Knowing what your user base is accessing can tell you many of things about your application. The following fields will give you better insight:  Site URL, MIME Type, Operation (GET, POST) and Protocol (HTTP or HTTPS).


Typically, this would be information requiring date and time so the fields of interest here would be  Date, Day of Week and Hour.


Knowing where your site’s users are coming from can be very useful, especially if you are trying to measure the effectiveness of advertising campaigns. The fields of interest here would include Referrer Domain and Referrer URL.

The ad-hoc analysis would have shown you that there are a huge number of fields available, most of them adding little or no value to reverse proxy reporting.

To consolidate things, but still give you enough information to work with, we can reduce the required summaries down to the following:

Create a Reverse Proxy Report Template

Having all of the data in an analysis is great for doing a ad-hoc drill investigations, but most likely you will need the information to be reduced and condensed into an easy to digest report. Next we are going to set up a simple report template that will show the key pieces of information per site:

Now that a blank template has been created, we can add the fields we are interested in. The steps below add seven nodes to the template. This looks like a lot to do but it is really quick and easy. The template will also be available for download below so you can skip this step if you’re using Microsoft Forefront TMG.

By the time you are done you should have something that looks like this:

Using the Reverse Proxy Template in an Analysis

The template we created can be used to generate a report (Word, PDF, CSV etc), but the same template can also be used for doing a Summary Analysis. This is a great way to check your template structure.

You can now browse through the Analysis and you will see that it is much cleaner to look through than the default ad-hoc analysis we ran earlier. A nice feature of the running the template on the Summaries tab as a template-based analysis, is that you can still drill down past the bounds of what you defined in the report template. For instance, you can click the http protocol to get more details on the resources not served over HTTPS.

When you click on an item to drilldown into it, all of the available summaries are displayed. Also note in the navigation bar at the top, your filters are being cumulatively applied.

Distributing Reverse Proxy Reports

Next you can generate a report in one of the many different formats available within WebSpy Vantage. These reports can then be automatically emailed out on a schedule, or could be published and made available through the web module.

Once the task is done you will have a separate report for each Web Application for the past week only.

The Results!

Each report contains loads of useful information. We can determine that the bulk of the site’s traffic peaks between 9 and 10 AM. The bulk of the users are connecting via Safari browsers, and the biggest external sources for users to connecting to the site are google.co.za followed by facebook.com, while only 87 hits came from pinterest.com (30th on the list).

By looking at the source IP’s, we can determine that a lot of users of the site stick around and click through multiple links and download a fair amount of content.

Applying some knowledge about the site WRT to HTTP vs HTTPS content, we can tell that a high percentage of users are actively logging in, not simply browsing without converting to actual sales.

Furthermore, because we have a holistic picture of all the sites being published, we know that this particular site consumes 70% of all the available bandwidth to the hosting site.

This level of information can be extremely useful in retail settings, but can be equally important for corporate web applications. Usage patterns can help guide you to determining the best windows for system maintenance etc.

Since reports templates are almost indefinitely customizable, you can tune them to show the exact data you are interested in.

What next?

This was an introduction on how to explore the information available to you in a reverse proxy log file. Typically you would want to import more log files into your Storage, and report across larger time frames. At this point the summary and report generation time will start to increase, but the bigger picture becomes more accurate and valuable.

You may also want to tweak your template to show just the right level of information for your report audience.

Lastly, you should automate log importing, report generation and publishing, and purging of log data older than what is required. All of this is easily accomplished via the Tasks tab in WebSpy Vantage.

I hope this basic ‘start to finish’ tutorial was helpful in not just creating a basic reverse proxy report, but also in showing you some of the awesome reporting possibilities with WebSpy Vantage.

Vantage supports log files from over 200 popular network devices, and with its comprehensive aliasing feature, is the most flexible log analysis and reporting framework you’ll find.

See also: