Newsletter: The Role of Data in Financial Modeling and Risk Management

How Data Quality and Quantity Affect Model Accuracy and Stability

Oct 06, 2025

Much emphasis has been placed on developing accurate and robust financial models, whether for pricing, trading, or risk management. However, a crucial yet often overlooked component of any quantitative system is the reliability of the underlying data. In this edition, we explore some issues with financial data and how to address them.

Web-only posts Recap

Below is a summary of the web-only posts I published during last week.

Momentum in the Option Market, Part 3

Return and Variance Risk Premia in the Bitcoin Market

Using Hurst Exponent on the Volatility of Volatility Indices

Predicting Covariance Matrices of Returns

Trading Equity Indices Using Time Series Models

Incorporating Reflexivity into the Black–Scholes-Merton Framework

How to Deal with Missing Financial Data?

In the financial industry, data plays a critical role in enabling managers to make informed decisions and manage risk effectively. Despite the critical importance of financial data, it is often missing or incomplete. Financial data can be difficult to obtain due to a lack of standardization and regulatory requirements. Incomplete or inaccurate data can lead to flawed analysis, incorrect decision-making, and increased risk.

Reference [1] studied the missing data in firms’ fundamentals and proposed methods for imputing the missing data.

Findings

-Missing financial data affects more than 70% of firms, representing approximately half of total market capitalization.

-The authors find that missing firm fundamentals exhibit complex, systematic patterns rather than occurring randomly, making traditional ad-hoc imputation methods unreliable.

-They propose a novel imputation method that utilizes both time-series and cross-sectional dependencies in the data to estimate missing values.

-The method accommodates general systematic patterns of missingness and generates a fully observed panel of firm fundamentals.

-The paper demonstrates that addressing missing data properly has significant implications for estimating risk premia, identifying cross-sectional anomalies, and improving portfolio construction.

-The issue of missing data extends beyond firm fundamentals to other financial domains such as analyst forecasts (I/B/E/S), ESG ratings, and other large financial datasets.

-The problem is expected to be even more pronounced in international data and with the rapid expansion of Big Data in finance.

-The authors emphasize that as data sources grow in volume and complexity, developing robust imputation methods will become increasingly critical.

In summary, the paper provides foundational principles and general guidelines for handling missing data, offering a framework that can be applied to a wide range of financial research and practical applications.

We think that the proposed data imputation methods can be applied not only to fundamental data but also to financial derivatives data, such as options.

Reference

[1] Bryzgalova, Svetlana and Lerner, Sven and Lettau, Martin and Pelger, Markus, Missing Financial Data SSRN 4106794

Predicting Realized Volatility Using High-Frequency Data: Is More Data Always Better?

A common belief in strategy design is that ‘more data is better.’ But is this always true? Reference [2] examined the impact of the quantity of data in predicting realized volatility. Specifically, it focused on the accuracy of volatility forecasts as a function of data sampling frequency. The study was conducted on crude oil, and it used GARCH as the volatility forecast method.

Findings

-The research explores whether increased data availability through higher-frequency sampling leads to improved forecast precision.

-The study employs several GARCH models using Brent crude oil futures data to assess how sampling frequency influences forecasting performance.

-In-sample results show that higher sampling frequencies improve model fit, indicated by lower AIC/BIC values and higher log-likelihood scores.

-Out-of-sample analysis reveals a more complex picture—higher sampling frequencies do not consistently reduce forecast errors.

-Regression analysis demonstrates that variations in forecast errors are only marginally explained by sampling frequency changes.

-Both linear and polynomial regressions yield similar results, with low adjusted R² values and weak correlations between frequency and error metrics.

-The findings challenge the prevailing assumption that higher-frequency data necessarily enhance forecast precision.

-The study concludes that lower-frequency sampling may sometimes yield better forecasts, depending on model structure and data quality.

-The paper emphasizes the need to balance the benefits and drawbacks of high-frequency data collection in volatility prediction.

-It calls for further research across different assets, markets, and modeling approaches to identify optimal sampling frequencies.

In short, increasing the data sampling frequency improves in-sample prediction accuracy. However, higher sampling frequency actually decreases out-of-sample prediction accuracy.

This result is surprising, and the author provided some explanation for this counterintuitive outcome. In my opinion, financial time series are usually noisy, so using more data isn’t necessarily better because it can amplify the noise.

Another important insight from the article is the importance of performing out-of-sample testing, as the results can differ, sometimes even contradict the in-sample outcomes.

Additional reading

Also, see the discussion on the quantity of data used in machine learning in the following newsletter:

Machine Learning in Financial Markets: When It Works and When It Doesn’t

Reference

[2] Hervé N. Mugemana, Evaluating the impact of sampling frequency on volatility forecast accuracy, 2024, Inland Norway University of Applied Sciences

Closing Thoughts

Both studies underscore the central role of high-quality data in financial modeling, trading, and risk management. Whether it is the frequency at which data are sampled or the completeness of firm-level fundamentals, the integrity of input data directly determines the reliability of forecasts, model calibration, and investment decisions. As financial markets become increasingly data-driven, the ability to collect, process, and validate information with precision will remain a defining edge for both researchers and practitioners.

Leveraging Data Science for Robust Trading Strategies

In this video, Marti Castany discusses the integration of data science techniques into both market analysis and trading strategy development. He describes the process from generating trading hypotheses, conducting tests, constructing strategies, and using synthetic data, to backtesting and implementing strategies in live markets. Marti also identifies a single technique that he has found useful in the context of trading strategy development.

The video further details Marti’s background in electrical engineering, early work with discrete time series, and subsequent experience in data analytics through startups and consultancy projects. These roles involved processing large-scale datasets from brokerage firms to model client profitability and executing cloud-based data ingestion and analysis. The discussion illustrates the practical application of data science methods in trading and the development of systematic strategies.

Around the Quantosphere

-How hedge funds performed in September (reuters)

-The AI trade could rapidly unravel—and one hedge fund is preparing for the fallout (cnbc)

-BofA Has Options Play to Bet on Tech Rally as Hedge Funds Sell (yahoo finance)

-I left quant trading to work in AI—here’s why I gave up a secure career to join a startup (msn)

-Hedge funds have to be big (bloomberg)

-Eisler Capital to shut down after poor performance and high costs (reuters)

-Hedge funds and high-frequency traders are converging (ft)

-Pay at Citadel in London: £500k to £21m (efinancialcareers)

-Hedge fund stars making so much they’re hiring agents (msn)

Recent Newsletters

Below is a summary of the weekly newsletters I sent out recently

-Volatility Risk Premium Across Different Asset Classes (13 min)

-When Trading Systems Break Down: Causes of Decay and Stop Criteria (12 min)

-Volatility Targeting Across Asset Pricing Factors and Industry Portfolios (12 min)

-Tail Risk Hedging Using Option Signals and Bond ETFs (12 min)

-Stochastic Volatility Models for Capturing ETF Dynamics and Option Term Structures (11 min)

Refer a Friend

If you like this newsletter, then help us grow by referring a friend or two. As a token of appreciation, we’ll send you PDFs that include links to our blog posts about financial derivatives, time series analysis and trading strategies, along with the accompanying Excel files or Python codes.

1 referral:

https://harbourfronts.com/wp-content/uploads/2024/12/fin_deriv-1.gif

2 referrals:

https://harbourfronts.com/wp-content/uploads/2024/12/risk_trading.gif

Use the referral link below or the “Share” button on any post.

Refer a friend

Disclaimer

This newsletter is not investment advice. It is provided solely for entertainment and educational purposes. Always consult a financial professional before making any investment decisions.

We are not responsible for any outcomes arising from the use of the content and codes provided in the outbound links. By continuing to read this newsletter, you acknowledge and agree to this disclaimer.

Harbourfront Quantitative Finance

Discussion about this post

Harbourfront Quantitative Finance

Newsletter: The Role of Data in Financial Modeling and Risk Management

How Data Quality and Quantity Affect Model Accuracy and Stability

Web-only posts Recap

How to Deal with Missing Financial Data?

Findings

Predicting Realized Volatility Using High-Frequency Data: Is More Data Always Better?

Findings

Additional reading

Closing Thoughts

Educational Video

Leveraging Data Science for Robust Trading Strategies

Around the Quantosphere

Recent Newsletters

Refer a Friend

Discussion about this post