Percent Change, Normalization, Standardization, Percent Rank

Data only makes sense, only has meaning, when it is understood in relation to something else. Four common forms of making sense of data are: percent change, normalization, standardization, and relative ranking. These forms are very useful for building trading systems, and many machine learning techniques do not work well unless the data has been normalized in some form.

Imagine a hypothetical stock that has a price of $100 when you buy it. It gains $10. Another stock has a price of $10 and also gains $10. The absolute gains are the same but the first stock gained 10% while the second gained 100%. Alternatively, one might imagine building a futures trading system to buy at the open when the market opens up above some minimum point threshold. Over time though, the base price will change and thus the meaning of any given point value will vary. One way to solve both problems is to calculate the percentage change. The decimal form for percent change is simply:

Percent Change = (New Number-Original Number) / Original Number

Note: An astute reader observed that for the most common used back-adjusted continuous contract method that percentage change may not work as expected. In fact, any multiplication or division of price may not work as expected. There are other methods for creating continuous contracts that do not suffer from these problems (but they suffer from other problems). I plan to seek a more comprehensive solution to this problem at some point in the future. However, as a sanity check one can develop and/or test against the cash or historical front months.

Normalization can be used to rescale a set of input values into a fixed or known set of output values. We will show you how normalization can be used to create an adaptive bounded indicator in a future article. Normalization tends to squash the data. The formula for min-max normalization is given below:

normalization

As an example, you take the current value, such as the close, minus the minimum close (over some look-back period) and divide that by the maximum close minus the minimum close. The unbounded close is remapped into a fixed range from 0 to 1. This is a known as a linear remapping. A more advanced form of normalization is to use non-linear normalization function such as the sigmoid.

Standardization, or z-score, can be thought of as remapping data to measure how many standard deviations a value differs from its mean. Imagine you have a trend system and you want to program it to buy or sell in the direction of the trend. Different futures products might differ in both their price and volatility; utilizing standardization we can program the system to buy when price deviates some number of standard units from its mean. Traders will be familiar with standardization from the famous Bollinger bands: which simply plots the prices that reflect, typically, 2 standard deviations above and below average price. The formula for standardization is X minus mu (average) divided by sigma (square root of variance). However, virtually all trading platforms have a standardize or z-score function built-in, so there is rarely a need to develop it:

z score2.png

Finally, the last type of normalization we will cover is percent rank, a form of relative ranking. I first became aware of percent rank for trading applications from some of David Varadi’s work. It is a simple concept that is based on the idea of a frequency distribution. A frequency distribution is simply an ordering of values based on their count or frequency. The percent rank of a value tells you what percentage of data were equal to or less then that value over some period. The percentile rank can be calculated from the z-score for normal distributions. However, market prices are not normally distributed. The basic formula for calculating the percent rank requires building a sorted data table and is computed by the following function:

The percentile rank formula is: R = P / 100 (N + 1). R represents the rank order of the score. P represents the percentile rank. N represents the number of scores in the distribution.
Source http://study.com/academy/lesson/percentile-rank-in-statistics-definition-formula-quiz.html

However, there is a fast version. The Easylanguage code for the fast version is provided below.


//Original source unknown
inputs:
ValueToRank(numericsimple),
DataSet(numericseries),
Length(numericsimple);
variables:
R(0),X(0);
R = 0;
for X = 1 to Length begin
if DataSet[X-1] <= ValueToRank then
R = R + 1;
end;
PercentRankFast = R/(Length-1);

References:
https://en.wikipedia.org/wiki/Frequency_distribution
https://stats.stackexchange.com/questions/10289/whats-the-difference-between-normalization-and-standardization

 

9 thoughts on “Percent Change, Normalization, Standardization, Percent Rank

  1. @Simon Sure, you can use percent change from the prior day close for futures systems or calculate the average percentage move over a recent lookback period. While there are no guarantees, it is reasonable to suspect such methods could be more robust then using raw point values. Sure, you can come up with scenarios where any method wouldn’t make sense, as well

    Like

    1. Why is it reasonable to suspect that percent changes are more robust with continuous contracts – you provide no proof. Good luck backtesting with such a method if the continuous contract instrument goes near zero — you will have completely invalid backtest. Cheers!

      Like

      1. @Simon I focus primarily on short term trading methods where the technicalities of trading continuous contracts over longer time horizons are not really relevant. The basic idea behind the percent change is that it is a relative measure and thus potentially more robust to changing markets. Please differentiate between a percent change chart and using percent change for calculating short term indicators or measures. But, yes you should test and you may find absolute values work fine or better for certain scenarios. There is more then one way to solve a problem: you might find using regular re-optimization and absolute values to work better.

        Like

  2. But this site is about backtesting right? And you are advocating a backtesting method: percent change with continuous contracts – that is completely invalid – especially on older trades. Plus, you do not even seem to understand the issue here based on your responses (regular reoptimization will not fix the issue). Colour me very concerned with this. I shall refrain from further comment as it does not seem to be sinking in for you.

    Like

    1. @Simon Sorry, I am not following. There are several ways that continuous futures contracts can be created (back adjusted, forward adjusted, ratio, percentage based, stitched, etc). As per my needs, I have not seen any real issues with the default methods used in Tradestation and honestly other then being aware of the methods, have not studied them. You are welcome to explain or provide an example of the scenario where you think calculating a percent change for a short term indicator would give erroneous results or whatever scenario you are thinking of.

      Like

      1. How do you hold yourself as an expert in backtesting? I don’t get this…

        Let’s say your rule is “buy if percent gain is greater than 5.%”

        Today the price is 104.9, yesterday was 100, so no buy was hit. (4.9 percent gain)

        Now, let’s say after a few months of contract rollover and back adjusting, the roll amount is 8. So the old price of 100 becomes 92, and old price of 104.9 becomes 96.9. Now you rerun the backtest: 96.9/92 = 5.3% and buy is signalled.

        2 datastreams supposedly of the same thing, but giving different results.

        I’ll leave it to you how to fix this.

        Like

  3. @Simon Thanks for the explanation. First, let us notice that the difference you created was very small. If your system is subject to very different results with such small changes then it probably will not work anyway. However, if you want to be exact then I will just provide some possible solutions below:

    1. Build against the continuous contract but then verify against non back adjusted contracts.
    2. Build against the index and then verify against the futures contracts(s) when possible.
    3. Use a different back adjustment algorithm, such as “percentage adjusted”. For more information, see
    http://www.futuresmag.com/2008/07/15/continuous-data-not-so-easy.
    4. See http://adamhgrimes.com/blog/how-to-calculate-futures-rolls/ for a good explanation, as well.

    For more information what Simon is referring too:
    “The only way around this pitfall is to take special care that division or multiplication is not used with price data in back-adjusted continuous contracts.”, From “Building Winning Algorithmic Trading Systems” Davey.

    “The Proportional back-adjustment principles offered here were inspired by Thomas Stridman, who discussed the idea in his article “Data Pros and Cons” in the June, 1998 issue of Futures Magazine.”

    “Ratio Adjusted Linking Method: This revolutionary new method of linking commodity contracts is described in detail in the June ’98 issue of “FUTURES” magazine. (See “Data: Pros & Cons” by Thomas Strictsman). Ratio adjusted series never go negative and back test far superior to other methods.”

    “When using continuous futures data, results cannot be expressed as percentage changes. Cash prices for underlying futures contracts are available on downloading services, such as Commodity Systems. (CSI), Boca Raton, Florida.”, Trading Systems and Methods Kaufman.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s