August 31st, 2016

The Problem With ETF Backtests

The fund industry is busy manufacturing new smart beta funds with backtested track records that “prove” their strategy. However, backtests can easily be massaged to gin up performance. When considering any new product, run it through this seven-point smell test to see if you are investing in the real deal.

A growing number of “smart beta” ETFs and mutual funds rely heavily, if not exclusively, on backtesting. Sometimes the results match expectations, but not always. For products that stumble, the trouble is usually related to dubious research. Backtesting risk, in other words, has become familiar terrain in the money game. As a result, evaluating smart beta products usually boils down to considering the merits of the underlying backtest, especially if the fund has a limited track record.

Ideally, you run a backtest and, if the output looks encouraging, you invest in the strategy (or launch an ETF) and reap the rewards. Reality, alas, isn’t always so easy.

A new study documents just how pervasive, and therefore hazardous, backtesting can be for a new generation of ETFs and other investment products. That doesn’t mean that every backtest is bogus. But as the authors of“Quantifying Backtest Overfitting in Alternative Beta Strategies” say, the concept of “trust but verify” should be standard practice for prudence when navigating the new world order of investment strategies.

The “alternative beta” focus in the paper, which comes from The Journal of Portfolio Management, covers a wider range of strategies compared with most smart beta funds. The use of leverage, shorting, and a broader set of asset classes encompass the realm of alternative beta.

By contrast, most smart beta ETFs and mutual funds are limited to long-only strategies in stocks and bonds. Nonetheless, the study’s caveats are no less relevant for conventional factor funds that are marketed as smart beta, since the common denominator is the use of backtesting for rationalizing a given strategy.

“Our results support the recent warnings in finance literature regarding ‘factor fishing,’ multiple testing, overfitting, as well as selection and reporting biases in financial research and product development,” the authors report. “The findings highlight the importance of detailed due diligence on quantitative strategies, and suggest that backtested performance and risk measures may offer limited value to practical alternative beta strategy selection and portfolio management.”

The rise of smart beta quants

Several developments have converged in recent years to create a fertile landscape for launching smart beta funds that owe some or all of their design features to backtesting. First and foremost is the combination of falling prices for computers with greater processing power. The the adoption of high-end quantitative analytics to nearly every aspect of money management has become standard practice.

Meantime, the demand for innovative ETF designs has increased—partly due to the rise of low-cost indexing for tracking conventional benchmarks, such as the S&P 500. Traditional active management is under pressure, inspiring a refocus on rules-based strategies that can be branded as smart beta, which is often labeled as a quasi-passive strategy.

The financial industry, in short, has become an assembly line for smart beta products. BlackRock, the biggest ETF manager, recently projected that assets in smart beta ETFs will reach $1 trillion globally by 2020 and $2.4 trillion by 2025. That compares with less than $500 billion in smart beta assets worldwide, in more than 800 exchange-traded products, at 2015’s midpoint, according to Morningstar.

Who let the dogs out?

There’s nothing inherently wrong with searching through market history for superior strategies. Investors have been doing no less for as long as there have been investors. The difference in the 21st century is that it’s easier than ever to sift through mountains of data, which can be a powerful tool for boosting performance, lowering risk, or both. But it can be a trap when backtesting is abused.

The pitfalls are well known to veteran quants. But for nefarious marketing reasons, or perhaps due to sloppy research habits, there’s no shortage of ETFs linked to questionable backtests. Smart beta, in other words, can be sidetracked by dumb backtests.

Even well-designed backtests are no guarantee of stellar results once the strategy goes live. Denys Glushkov, research director at the University of Pennsylvania’s Wharton Research Data Services, studied 164 smart beta ETFs for the 2003-2014 period and found “no conclusive empirical evidence to support the hypothesis that SB ETFs outperform their risk-adjusted benchmarks.” Some funds performed better than others, of course, but within the broad category of smart beta ETFs, there are plenty of dogs that drag down the average results.

A common snag is what’s known as data mining, which is an occupational hazard in the dark art of analyzing history for insights into the future. In other words, if you throw enough data at the wall, and test and retest it long enough, something will stick. The tests that fall flat have a habit of never seeing the light of day. As Dimitris Melas, global head of equity research at MSCI, once quipped, “I’ve never seen a bad backtest.”

Savvy investors are aware of the bias and act accordingly. For instance, as a precaution against confusing random results with genuine investment intelligence, it’s become industry practice to discount reported Sharpe ratios (risk-adjusted returns) in financial research by 50%, according to “Backtesting”in last year’s fall issue of The Journal of Portfolio Management.

Recognizing the central role of backtesting in the development of smart beta ETFs, and the potential for spurious results, is a key reason for maintaining a cautious view about the explosion of products in this niche. Some ETFs may be well-designed and more or less live up to expectations through time. But clear-cut winners are likely to be the exception to the rule, courtesy of market reality. There’s a finite supply of market-beating alpha to go around.

Professor Bill Sharpe laid out the hard facts in “The Arithmetic of Active Management” a quarter of a century ago in the Financial Analysts Journal. “Properly measured, the average actively managed dollar must underperform the average passively managed dollar, net of costs,” he wrote. In other words, not every smart beta product can deliver the goods.

In fact, it’s reasonable to expect that most will deliver average results at best. This isn’t an opinion, but a statement of mathematical fact. “The market return will be a weighted average of the returns on the securities within the market,” Sharpe explained, which implies that the smart beta losers will far outnumber the winners.

Looking for the real deal

Intelligently designed smart beta funds can still be useful, but to juice the odds of success, it’s essential to avoid the dogs. A good way to start is by keeping a wary eye open for funds that rely on shaky research. There are no shortcuts for identifying ETFs that rely on questionable backtesting strategies, but there are some general guidelines to consider that can help you spot trouble in advance.

Start with common sense

A new fund that promises the moon based on a backtest is probably an accident waiting to happen. It’s a cliché, but if something sounds too good to be true, it probably is. Warning signs include marketing hype that promises to outperform the market consistently and/or by a wide margin with low risk. No one’s that smart (or lucky), regardless of how many backtests are run.

Ask for a copy of the backtest

Even if you never read it, the response to your inquiry may speak volumes. Was the fund company forthcoming by offering a link to a relevant study? Or did you have to pass through Dante’s nine circles of hell to obtain the research or even get a straight answer? The latter could be an early warning that the underlying strategy may lead you down the road to investment perdition.

What are the anticipated trading costs?

There are many backtests that look great on paper, but descend into mediocrity or worse after adjusting for taxes and trading costs. It’s not always clear where to draw the line for estimating a reasonable maximum amount of trades in a given year. But it’s surely a warning sign if a backtest doesn’t offer some insight into how the trades are expected to impact results in the real world.

Are there too many rules?

All else equal, simpler is better in money management. All else is never equal, of course, but simplicity is still a tough act to beat. The greater the number of moving parts in a strategy, the greater the opportunity for something to go wrong. That’s part of the reason why buy-and-hold is usually competitive—even a monkey could execute the strategy flawlessly.

Granted, adding a degree of complication can be productive, perhaps even necessary, depending on what you’re trying to achieve, but at some point, more becomes less. The tipping point isn’t always obvious, but a strategy’s probably pushing its luck if the rules can’t be summarized in a paragraph or two—or if you need a PhD to translate the backtest.

Does the backtest suffer from overfitting?

Studies that are victimized by this oversight, intentional or otherwise, reflect random results rather than a fundamental relationship. A sure way to veer into the land of overfitting is with powerful computers at everyone’s disposal. The basic recipe: Tweak a model’s parameters until the historical results look spectacular. Almost any strategy can be modified to generate an impressive track record, but this is rarely a reliable way to uncover strategies that will succeed beyond the test period (also known as the out-of-sample results).

Unfortunately, it’s difficult to evaluate overfitting risk unless the researchers were candid about the number of trials that preceded the final test. The higher the number of trials, the higher the risk of overfitting. This information isn’t usually available, but you may be able to glean some insight by reading the study or asking the fund company for perspective. Once again, the response may be revealing—did it seem like they’re trying to hide something?

Has a third party verified the results?

Unless the backtest has been vetted by an independent source, don’t assume that the results are accurate. As one cautionary tale from the trenches, consider a study that appeared last year in the International Review of Finance: “Market Timing With Moving Averages,” by Paskalis Glabadanidis. The author outlined an impressive equity strategy using moving averages that produced “risk-adjusted returns of 3%-7% per year after transaction costs.”

Not too shabby. But another researcher—Professor Valeriy Zakamulin at the University of Agder—demonstrated that Glabadanidis’ impressive performance result was “due to simulating the trading with look-ahead bias.” After correcting for this bias using market signals that weren’t available in real time, the performance “is only marginally better than that of the corresponding buy-and-hold strategy.”

How does the strategy compare with a passive or semipassive index of the same assets?

This is standard procedure for any backtest worthy of the name. If a reasonable benchmark is missing—or designed poorly—the study’s conclusions may be flawed.

The list above is hardly exhaustive, but it offers a general sense of the concepts that play a role for determining whether a backtest is informative or just garbage. Even when a backtest passes the smell test, a degree of skepticism is still warranted. As many veteran quants will tell you, backtests are generally more reliable in identifying strategies that don’t work as opposed to shining a light on a sure thing.

Nonetheless, a robust backtest can tip the odds in our favor when it comes to sorting investment strategies. That said, let’s recognize that most of this advantage is likely to emerge in the negative—steering clear of the losers as opposed to detecting the winners.

To paraphrase a famous quote by the statistician George Box and applying it to backtests (he was referring to models): All backtests are wrong, but some are useful.

POSTED BY: IN General News