Pathway to Failure: Why most trading systems might be “setup” to fail

One possible reason that most trading systems fail is that they never had a chance in the first place. You see most retail trading development environments like Tradestation and Ninjatrader predominantly support defining a trading system using only one technique and that technique is basically hand crafting decision trees using structured programming language.

On the other hand, many of the recent advances in machine learning and AI that have enabled computers to do tasks that were previously thought impossible, such as recognizing faces, categorizing objects, and driving cars make use of deep learning.

Please note, I am not stating that it is impossible to build profitable trading systems using the traditional techniques. Most of my systems have been built just that way including the system I’m offering for sale. The real point is that the majority of system developers are led down a narrow path, a restricted set of possibilities, for achieving success. There are biases at work against you. That is unless you become keenly aware of these “availability biases” then you are more likely to be negatively influenced by them.

Traditional trading environments are severely limited when compared to machine learning environments today which support a dozens of machine learning methods such as neural networks, deep learning models, numerous decision tree generation algorithms, support vector, linear regression, etc.

To recap system developers have availability bias toward one technique for developing their trading systems but it gets worse, way worse. These environments have strong availability bias of primarily supporting backtesting of price data. Programming against other types of data such as order flow, order book, sentiment, reports, and news events/reports will require a lot more legwork and grit on the developer.

Finally, there is a distinct lack of powerful robustness and statistical testing. Everyone knows it is easy to create a backtest that looks good.

What about some of the retail environments that support generative techniques? These systems claim to auto-generate working systems for the developer. The reality is that without expert understanding the probability of success with most generative techniques is going to be even worse. A very experienced developer can use such techniques to come up with new ideas but in general they won’t work well unless is already keen to the numerous sorts of problems they are likely to encounter. Even though those environments do offer more robustness testing: the problem is that they primarily tend to try various combinations of indicators involving different lookback lengths. One thing we know is that markets are relatively efficient which means that the longer you go back in history, the less likely that the data will be relevant. As such, it doesn’t make a lot of sense to search for systems that might work by combing random indicators with long lookback periods. In other words, these sorts of environments might open new possibilities to experienced developers but would be especially ill-advised for new developers, who they are often marketed too, and who are least likely to be able to identify and correct the problem systems.

There is one last point on why most systems fail. Most environments can only reliably backtest market order systems, and most systematic tracking services only support market orders (to ensure all subscribers get similar fills). However, in my active discretionary day trading, I always attempt to capture at least one side of the spread. There is rarely a need to market both sides of the trade unless it has failed and one is trying to exit at market. I typically enter at market and exit with a limit order. Think about it, if you exit with a market order then you are stating that you can (1) pick the direction and (2) pick the peak of the move in real-time but not with a limit order. Sometimes exiting with a market order is required but for the majority of time, a limit order exit would be sufficient. Question then, why do most developers/tracking services still insist on exiting with market orders? Again, the only reason I can think is that when you have subscribers then you need to keep everything synchronized. I understand that but it makes it much more difficult to build a day trading system when you willy-nilly give up the spread or have to assume worst-case scenarios.

Now, let me add that if you want to create more of a “market making” system then I suspect you will need to layer in many orders and keep your risk/exposure low. The futures are not suitable products for that style of trading for retailers due to the large contract sizes.

Let’s recap then why most trading systems are setup to fail:

  1. Availability bias of technique limits most developers to handcrafting decision trees. Sure, one can add machine learning libraries or code their own libraries but that is hard work and not “built-in”.
  2. Availability bias of data will prevent most developers from incorporating order book, order flow, sentiment, and other sorts of data.
  3. Lack of built-in robustness testing makes it difficult to validate systems.
  4. Lack of realistic limit order testing makes it difficult to validate limit order exits.
  5. Large contract sizes make it difficult to deploy strategies that have multiple entries/exits (most-likely to benefit limit order strategies).
  6. Generative systems typically involve randomly searching long look backs or using older data. Old data is simply not likely to be useful, as most markets are efficient. These environments open up interesting creative possibilities for experienced developers but are probably require even more experience and skill to properly make use of.
  7. In addition to those problems, for small futures traders low granularity, i.e. large contract sizes and overnight margin make it more difficult for small traders to (1) hold positions over time and (2) to trade a portfolio of systems. Both of these things decrease probability of success.
  8. While not exclusive to backtesting environment, small traders also tend to exclusively day trade. The problem with this approach is that the markets are typically only offering great day trading opportunities a relatively small portion of the time.

And, we’ll end on a positive note of how one can change up the equation:

  1. Breaks free of availability bias of techniques by linking traditional environments to machine learning platforms/libraries.
  2. Does not limit themselves to price data exclusivity.
  3. Creates own robustness testing processes to ensure systems are robust.
  4. Models limit order fill probabilities and uses market/limit models where appropriate.
  5. If market making style trading is desired, will need to look at alternative markets that provide rebates for liquidity and/or offer very small position sizing.
  6. If using generative techniques, must be able to use market cognition to apply appropriate structure and constraints to such techniques.
  7. Builds from a few good systems and leverages them for greater return. Probably primarily focuses on day trading systems but keeps larger context in mind and seeks high return to risk opportunities.
  8. Again, probably focuses on day trades but looks for exceptional swing trading opportunities, as well. Diversifies beyond a single market to capture more day trading opportunity.

3 thoughts on “Pathway to Failure: Why most trading systems might be “setup” to fail

  1. From D. Washington.


    This is a very well written article. I have been developing day trading systems with NinjaTrader for about 1 year now and often ask myself “Should I be focuses more on backtesting using recent market data, maybe 1-2 years back instead of +10 years”. What are you thoughts. I am only interested in day trading systems.

    Also can you please how explain what you mean by “Old data is simply not likely to be useful, as most markets are efficient. ”

    Also, what are your recommendation of books or ideas on how to create my own robustness testing processes to ensure systems are robust. I am good programmer with NinjaTrader but I believe my work flow process and testing is not well defined.

    Thank you


    1. @D. Washington

      Regarding length of backtest, this is a very difficult and complex problem but one that can be solved or, at least, reasoned with. However, there is probably not a “one-sized answer”. First, when you backtest over a long history, you can be more certain that you found a real edge (or at minimum a stable set of parameters that made money). However, you test/develop routinely on the same set of data then you run the risk that you are merely data-mining. There is some truth to this. In order to beat the market, you have to know what it is more likely to do which is data mining. That’s why I said, don’t be afraid of data-mining provided you can do it “right”.

      The other problem though is that market regimes change. That’s because markets are non-stationary but also complex. The S&P 500 seems to have several 10 year or more periods where it changes regimes. And, there is another problem: it is very difficult to find anything that works “great” over a long history.

      That’s why some developers will argue why bother to test over say 2008 when the market is not like 2008 today. In other words, they argue that you should test over data that is similar to the current conditions. The real problem with a system developed on 1-2 years of data only is there’s going to be a real question of confidence if it can be trusted.

      The answer? I would suggest to either (1) backtest over a long history or (2) backtest over a shorter history and validate with Walk Forward Analysis. Some developers don’t like to WFA, myself, because there’s a little something unsettling about trading a system that is changing itself. I think to help with that, you might track all possible WFA variations over the entire history. You can see how the variations change and that might give you more confidence.

      Right, the “old data” is not likely to be useful, you have to understand the context. I am referring to generative systems. These systems primarily build systems by iterating lookbacks or even lagging various types of indicators. In other words, it isn’t the most intelligent thing in the world. Markets are forward looking and historically discounting. Efficient markets will argue that in general the current closing price (yesterday’s prices) are going to be the best predictors of today’s prices. In other words, yesterday’s close will be the best predictor of today’s close, yesterday’s high will predict today’s high, etc. However, there are some exceptions– of course. Statistically you need a certain number of minimum observations to be meaningful. In that sense, you have to give up the most immediate information in order to obtain a more complete answer. However, it should probably be viewed as a cost and the cost must be understood. There are always certain exceptions. I think the academic understanding of “momentum” is an effect that can take several months to register. I guess my point is if you’re using really old data then you need to understand why you would want to do that. Of course, these sorts of generative systems tend to develop systems that way because with more data, it is easier to curve-fit some result that looked okay in the past.

      In regards to robustness testing, there are several different sorts tests. You can do sensitivity analysis, basically vary the parameter range. You can skip trades or start trading on different dates. You can test your system on similar instruments. You can run monte-carlo analysis. Of course, the WFA can be used as well. Kevin Davey recommends an “incubation process”. This is probably the best is to track the system in real-time but paper trading only. If your system was the product of too much hindsight bias, there’s a pretty good chance it will break down quickly. Something not talked about as much is, also, not to trade any system at maximum leverage. In other words, scale the system down where your max DD is say 25% or even better 15% and then accept the return. Most of us do not have accounts to make that attractive but I do think a lot of the problems come from trying to over-leverage a working system.


      1. Thank you Curtis,

        I appreciate your detail response and time.

        Very interesting subject and not discuss as much in detail online. I am reading Kevin Davey book now and have not reached the incubation process discussion yet.

        I have thought about back testing over recent data 1-3 years as long as the system produced decent trade sample to be statistically significant and analyze performance metrics. I search for edge in intraday markets, so I develop systems for the intraday trading. I believe the trade sample is key here when picking a back testing periods for intraday systems.

        Thanks for comments on robust testing.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s