Today we are unpacking one of the most important and overlooked aspects of modern quantitative investing: backtesting—especially when machine learning is involved.
We explore key lessons from the influential 2018 paper "A Backtesting Protocol in the Era of Machine Learning", co-authored by Nobel Laureate Harry Markowitz. While a few years old, its relevance has only grown. This blog distills the insights of that paper into a practical, jargon-free roadmap for investors, researchers, and anyone working with data.
Why Backtesting Needs Rethinking
Backtesting is simple in concept: test an investment strategy on past data to see how it would have performed. But in today's world of massive datasets and easy access to machine learning, it's dangerously easy to fall into the trap of overfitting—finding patterns that look great on paper but won’t hold up in the real world.
Consider a real example from the paper: a strategy that selects stocks based on the letters in their ticker symbols—"S3U3." It showed an impressive 6% annual alpha over 50 years. But it was built on nonsense. This illustrates the power—and peril—of data mining.
Introducing the 7-Point Backtesting Protocol
This protocol acts as a checklist for conducting robust, reliable financial research in the age of AI.
1. Research Motivation
Start with a solid economic rationale before building models. Don't work backward by inventing a story after finding a pattern. A clear foundation rooted in real-world logic is essential.
2. Multiple Testing and Statistical Methods
If you try enough models, one will look good just by chance. Be transparent about how many tests you’ve run. Adjust statistical significance accordingly. More darts thrown means higher chances of hitting the bullseye—whether or not you're skilled.
3. Data Integrity
Bad data yields bad conclusions. Clearly define your data sample upfront. Avoid:
Backfilling
Cherry-picking time periods
Unjustified exclusion of outliers
Adjusting data treatments (like winsorization) after seeing results
Predefine your data handling choices and stick to them.
4. Cross-Validation
Out-of-sample testing is key—but even that has limits in finance. Our prior knowledge of events biases our model choices. Also, avoid tweaking models after seeing out-of-sample performance. That’s like cheating on a test with the answer key.
Include trading costs and execution realities in both in-sample and out-of-sample analysis.
5. Model Dynamics
Markets evolve. Your strategy might work now but fail later due to:
Regulatory changes
Tech innovations
Shifts in investor behavior
And remember the Heisenberg Principle of finance: exploiting a market inefficiency can cause it to vanish. As more people pile in, the edge fades.
6. Model Complexity
Machine learning loves complexity, but more variables mean higher risk of overfitting—especially with limited financial data. Prefer simpler, regularized models.
Also, interpretability matters. Understand how your model works. If it doesn’t make economic sense, it’s probably overfitted.
7. Research Culture
Rigorous research isn’t just about individuals—it’s about the culture.
Are researchers rewarded for robust strategies or flashy backtests?
Are failures treated as learning opportunities or buried?
Create an environment that prizes intellectual honesty, transparency, and robustness.
Real-World Applications
This protocol isn’t just for finance. Its lessons apply to any field using data for decision-making—from marketing to medicine.
Consumers of research, not just its creators, have a role. Demand clarity, logic, and honesty. Ask hard questions.
Key Takeaways for Investors
Don't be dazzled by complex models.
Start with strong economic reasoning.
Treat backtest results with skepticism—especially when they look too good.
Account for real-world factors like fees and market changes.
Foster a research culture that values truth over flash.
Final Thoughts
We live in a world where data is abundant and machine learning tools are powerful. But those fundamentals of good research—clarity, transparency, humility—have never been more important.
So next time you're tempted by a sexy backtest or flashy AI model, pause. Revisit this protocol. It could save you from chasing illusions—and help you build strategies that actually work.
FAQs
What is overfitting in backtesting? Overfitting occurs when a model captures noise instead of signal—performing well in backtests but failing in real-world applications.
Why is cross-validation tricky in finance? Because researchers often know what happened historically, which subtly biases their model development—even when using "out-of-sample" data.
What are trading costs, and why do they matter? Costs like commissions, bid-ask spreads, and slippage can eat away at returns. Ignoring them can make backtests unrealistically optimistic.
Can simple models outperform complex ones? Yes. Especially when the data is limited, simpler models often generalize better and are less prone to overfitting.
How can I spot bad research? Look for lack of transparency, unclear economic reasoning, ignored trading costs, and failure to test robustness. If it looks too good to be true, it probably is.
Hashtags
#Backtesting #MachineLearning #QuantResearch #InvestmentStrategy #FinancialData #DataScience #AIinFinance #ResearchIntegrity #TradingAlgorithms #DeepDive