Plot Outliers and Anomalies

The box plot graph algorithm analyzes each value in a dataset and flags "High Outlier" and "Low Outlier" based on the following two formulas which are applied to the interquartile range (IQR) statistic of one column of data:

High Outlier > 75th percentile + 1.5(IQR)
Low Outlier < 25th percentile - 1.5(IQR)


Outlier values cause the Average to be either much higher or much lower than the Median and Mode statistics. And, as a result, when you plot a histogram of your data, it will not look symmetrical (not an evenly shaped bell curve). A box & whisker plot of your data will actually show you which data points are outlier values that are impacting your Average. Your other descriptive statistics shown above - such as Skewness, Kurtosis, UPPER and LOWER LIMIT - will confirm what you see with the Average, Median, Mode, histogram, and box & whisker plot.

Removing outlier values (or replacing them with the Average value) from an analysis can make forecasts, modeling and insights using your data more powerful. You can see in the output above how the Average without outliers is very close to the Median and Mode descriptive statistics. Ideally, we look for the Average, Median and Mode statistics to be identical - making our histogram symmetrical and causing our box & whisker plot to show no outlier data points. If this is true, and the data being analyzed is from a randomly generated sample of a larger population (eg. customers), we can then start to make conclusions about the population from analysis of the Excel Find Outliers AddIn output.