Data only makes sense, only has meaning, when it is understood in relation to something else. Four common forms of making sense of data are: percent change, normalization, standardization, and relative ranking. These forms are very useful for building trading systems, and many machine learning techniques do not work well unless the data has been normalized in some form.
Imagine a hypothetical stock that has a price of $100 when you buy it. It gains $10. Another stock has a price of $10 and also gains $10. The absolute gains are the same but the first stock gained 10% while the second gained 100%. Alternatively, one might imagine building a futures trading system to buy at the open when the market opens up above some minimum point threshold. Over time though, the base price will change and thus the meaning of any given point value will vary. One way to solve both problems is to calculate the percentage change. The decimal form for percent change is simply:
Percent Change = (New Number-Original Number) / Original Number
Note: An astute reader observed that for the most common used back-adjusted continuous contract method that percentage change may not work as expected. In fact, any multiplication or division of price may not work as expected. There are other methods for creating continuous contracts that do not suffer from these problems (but they suffer from other problems). I plan to seek a more comprehensive solution to this problem at some point in the future. However, as a sanity check one can develop and/or test against the cash or historical front months.
Normalization can be used to rescale a set of input values into a fixed or known set of output values. We will show you how normalization can be used to create an adaptive bounded indicator in a future article. Normalization tends to squash the data. The formula for min-max normalization is given below:
As an example, you take the current value, such as the close, minus the minimum close (over some look-back period) and divide that by the maximum close minus the minimum close. The unbounded close is remapped into a fixed range from 0 to 1. This is a known as a linear remapping. A more advanced form of normalization is to use non-linear normalization function such as the sigmoid.
Standardization, or z-score, can be thought of as remapping data to measure how many standard deviations a value differs from its mean. Imagine you have a trend system and you want to program it to buy or sell in the direction of the trend. Different futures products might differ in both their price and volatility; utilizing standardization we can program the system to buy when price deviates some number of standard units from its mean. Traders will be familiar with standardization from the famous Bollinger bands: which simply plots the prices that reflect, typically, 2 standard deviations above and below average price. The formula for standardization is X minus mu (average) divided by sigma (square root of variance). However, virtually all trading platforms have a standardize or z-score function built-in, so there is rarely a need to develop it:
Finally, the last type of normalization we will cover is percent rank, a form of relative ranking. I first became aware of percent rank for trading applications from some of David Varadi’s work. It is a simple concept that is based on the idea of a frequency distribution. A frequency distribution is simply an ordering of values based on their count or frequency. The percent rank of a value tells you what percentage of data were equal to or less then that value over some period. The percentile rank can be calculated from the z-score for normal distributions. However, market prices are not normally distributed. The basic formula for calculating the percent rank requires building a sorted data table and is computed by the following function:
The percentile rank formula is: R = P / 100 (N + 1). R represents the rank order of the score. P represents the percentile rank. N represents the number of scores in the distribution.
However, there is a fast version. The Easylanguage code for the fast version is provided below.
//Original source unknown
R = 0;
for X = 1 to Length begin
if DataSet[X-1] <= ValueToRank then
R = R + 1;
PercentRankFast = R/(Length-1);