Free Whitepaper: Statistical Outlier Analysis Using Box & Whisker Plots

These FREE Sample Box & Whisker Outlier Template Reports show how statistical outliers can impact forecasting, trend analysis, clustering analytics, and simple descriptive analytics.

Outliers Impact Trend Analysis

Outliers Distort Data Analysis

 

If you use even basic statistics to analyze data, you'll need to know whether you have outliers in your datasets. Outliers are statistically, unusually high or low data points in a column of data (eg. outstanding credit card balances for each customer, customer satisfaction). Forget about accurately forecasting, segmenting, reporting on, or modeling customer behaviour until you find and decide how to handle outliers in your data. The message alert to the right is the result of a trend analysis conducted using a box & whisker plot data analysis template. It summarizes whether the trend is positive or negative and if it is weak, moderate or strong. And the summary message above provides the corresponding Correlation (R) and Regression (R-Squared) co-efficients that help with the statistical interpretation of trend analysis. But the outliers in this data that you see in the Box & Whisker Plot will have an impact on how accurate and reliable the trend analysis is. And you will see this problem once you plot the data using a scatter diagram. And the box & whisker plot theory provides an excellent alternative to the K-Means or Hierarchical clustering algorithms commonly used for database segmentation analytics. It relies on the median for identifying segments; K-Means and Hierarchical clustering use the mean (average) to define segments making it sensitive to outliers and anomalies.

 

Outliers Impact Descriptive StatisticsOutliers Skew Descriptive Statistics

In any dataset, the average, median and mode descriptive statistics for a numerical variable (eg. $) must be the same in order for you to do any statistical analysis on it: regression, Analysis of Variance (ANOVA), t-test, or even simple correlations (R or R squared) with other variables. Unusually high or low outlier values in your dataset cause the average statistic to be too high or low versus the median and mode descriptive statistics.

 

 

Exclude or Replace Outliers

 

You can choose to treat outliers in your data a few different ways before doing an analysis. You can exclude outlier values in a variable from your analysis to improve the accuracy of forecasts or statistical analysis. Or, you can choose to replace these outlier values in a variable with its median descriptive statistic. But if you use the box & whisker plot to find and visualize outliers, you can replace outlier values with the upper or lower limit (whisker) values that be calculated by using the theory of the box & whisker plot.There is a great deal of research about outlier detection in datasets, and some of this research is posted in Outlier and Anomaly Detection Research. The FREE Outlier and Anomaly Detection Template will help you to automatically find and visualize data outliers using the theory of the box & whisker plot. But, what if you like pure numbers? Can you use simple descriptive statistics to quickly calculate if, in fact, you do have outliers? The answer lies in basic high school statistics class in which you discussed (if you were awake for this) three simple descriptive statistics: the mean, median, mode.Find Outliers by Hand

 

If you remember, the mean is the average value across a set of data points. You can find the average by adding up a set of numbers, then dividing this value by the count of the number of data points that you added. And, for this same set of numbers, you can identify the one number that occurs most frequently: this is your mode. Finally, sort your data points in descending order, from highest to lowest. Find the data point for which half the numbers in this column are below and the other half are above: you now have your median. If, after doing the above, these three simple descriptive statistics are the same in value, you can conclude that there are no outliers in this column of data. However, if they are not the same, then you must rely on two more advanced descriptive statistics - skewness and kurtosis - to tell you for certain whether you have outliers or not.

Box & Whisker Plot SegmentsFind Outliers Automatically

 

However, if you want to avoid these calculations and get right to the point of finding and visualizing outliers in your datasets, try the FREE Outlier and Anomaly Detection Template to automatically find and visualize data outliers using the theory of the box & whisker plot. The following web-based, interactive box plot outlier data analysis reports show how easy it is to transform data analysis using Box Plot Outlier Data Analysis Templates into shareable, interactive, online reports. Once you download and run one of the Box Plot Outlier Data Analysis Templates, save your output as an Excel workbook. And, then, place that workbook in any location on your OneDrive cloud storage account.Share Web-based Outlier Reports

 

Next, browse to that workbook on OneDrive, select it, and choose the Embed or Share option to generate a dynamic link (as you would with YouTube videos) to your output to embed your report in a website, blog or other online platform. This process uses the built-in XML and VBA capabilities for creating and deploying Excel projects as online reports with dynamic pivot tables, chart, slicers, and features such as trendlines. You can also just use the iframe code that you get when choosing to embed a YouTube video and then substitute the link to your outlier report for the video URL.So, if you want to Plot Outliers and Anomalies in your datasets and share your Box Plot Outlier Data Analysis with others, try this out. It is a great process for any data science student to forecast or find patterns in large datasets.