Today I reviewed Google Analytics Platform Principles modules which explain how the Collection, Configuration, Processing and Reporting elements work in the Google Analytics platform. Compared to the Digital Analytics Fundamentals modules, which I blogged on a couple of days ago, the four units of Platform Principles provide a much more practical and technical explanation of Google Analytics. Google recommends taking reviewing the Fundamentals first as the the Platform Principles does assume some basic knowledge of Google Analytics.
Among all that is delivered in these units what I found most interesting is the use of Sampling in the reporting function. Google delivers what they call “standard reports” but you can customize these reports by adding things like additional segments or dimensions. When this is done, both the size of the data collection and reporting process grow. To ensure that these reports can still be generated quickly Google employs the method of statistical sampling, which takes a portion of a total population and assumes that it represents the population as a whole. This technique is used in all kinds of research from psychological studies to election polling.
Marketers using Google Analytics will likely want to customize their reports and therefor must confront the question: What is the right sample size?
Google does allow marketers to alter their sample size, though warns that larger sample sizes will take longer to produce reports from. In determining a sample size, marketers must first be aware of how many sessions will be occurring within a set of reporting specifications. Next, Marketers must determine how varied the actions that users take within sessions will be (variance). Finally, Marketers may consider how confident they must be in the accuracy of the data and the acceptable margin of error. An April 2013 blogpost from Qualtrics explains how these discrete values can be used to calculate a desired sample size. However, it is difficult to determine an accurate variance in session data (and therefore the standard deviation) so a simple theoretical consideration of these statistical principles will be enough to improve the accuracy of reports.
If time is not of value in the reporting process, it is tempting to simply sample the entire data set. However, Google does have a sampling limit and will impose sampling if a report exceeds a maximum number of sessions. If you encounter this, keep in mind that the greater your total number of sessions compared to the sample size and the greater your expected variance in sessions, the less accurate this sample will be in delivering the value of the whole.
If all of this back of the envelope statistics is seeming like it might be a bit more trouble than it’s worth, Google will happily put a price on it as their Premium set up of Google Analytics does not have a sampling limit.