Close Navigation
Reproducible Quantitative Research – Beyond Pure MCP Workflows

Reproducible Quantitative Research – Beyond Pure MCP Workflows

Posted October 9, 2025 at 11:47 am

Deltaray

The article “Reproducible Quantitative Research – Beyond Pure MCP Workflows” was originally published on Deltaray blog.

Reproducibility is the cornerstone of credible quantitative research. In both academic papers and proprietary trading strategy development, results mean little if others cannot replicate them. Yet in quantitative finance, reproducibility remains challenging due to proprietary data, complex methodologies, and now, increasingly autonomous AI agents.

The latest AI coding assistants like Claude CodeGoogle Gemini and OpenAI’s Codex using Model Context Protocol (MCP) have revolutionized research workflows.

They can compress weeks of development into hours.

But this power comes with a hidden cost: when AI agents operate autonomously, they can undermine the very reproducibility that makes research credible.

In this post, we’ll explore how modern AI tools are transforming quantitative research, why pure agentic workflows threaten reproducibility, and a better approach to address these challenges.


The Evolution: From Text Generator to Active Researcher

AI assistants have rapidly evolved from simple code completion tools into active research partners. Early large language models could only suggest text based on their training.

Today’s AI Agents can autonomously:

  • Fetch and analyze historical market data
  • Execute complex multi-step research workflows
  • Run backtests and do statistical tests
  • Generate visualizations and reports
  • Commit results to version control

This transformation was enabled by giving LLMs tool-use capabilities. Claude Code was the first to achieve this: since its initial version, it has not just suggested code but actively taken actions on your behalf. It maintains project-wide awareness, navigates documentation, and performs complex tasks from natural language prompts.

By now, both OpenAI and Google have caught up—with Codex and Gemini Code—matching the functionality of Claude Code.

To generalize tool use, Anthropic introduced the Model Context Protocol (MCP) in late 2024


Understanding MCP: Power and Pitfalls

The Model Context Protocol (MCP) is Anthropic’s open-source standard for providing uniform API that allows AI models to interact with external tools and services. Instead of hard-coding specific integrations, MCP servers expose tools that AI agents can invoke as needed.

MCP in Quantitative Research

Common MCP applications include:

Zen-MCP: The Multi-Model Orchestrator

While not strictly related to quantitative finance, zen-mcp is worth menitioning (and using).

This open-source orchestrator extends agentic coding tools to enable multi-model AI workflows. What this means in practice is that you can use OpenAI’s, Google’s and Anthropic’s (and many other) models in the same session. For example, one model can design the task, the other can implement it and the third can review it.

The different models can collaborate and chat with each other, which is impressive to see in action.


The Reproducibility problem

While powerful, autonomous AI agents introduce several reproducibility challenges.

1. The Opacity Problem

When AI agents autonomously fetch data, perform analysis, and generate results, the exact steps often remain hidden. Unlike executing a script – where every transformation is visible – AI agent workflows can be black boxes. You might get results, but understanding how those results were obtained becomes difficult or impossible.

2. Non-Deterministic Execution

AI models may take different approaches to solving the same problem across runs. This non-determinism means:

  • The same research question might yield different methodologies
  • Data processing steps may vary
  • Rate limits or tier changes can trigger model fallbacks (e.g., Opus → Sonnet), altering tool choices and outputs

Note: Read more about defeating nondeterminism in LLM inference on the Thinking Machines blog

3. Hidden State and Dependencies

Similar to the notorious “hidden state” problem in Jupyter notebooks where cell execution order affects results, AI agents compound this by:

  • Dynamically choosing data sources without documentation
  • Using different libraries or methods without explicit tracking
  • Making assumptions that aren’t recorded

Danger-zone: Agentic Trading

Several open-source projects use MCP for quantitative analysis and trading:

  • Maverick MCP: “financial data analysis, technical indicators, and portfolio optimization tools directly to your Claude Desktop”
  • PrimoAgent: “multi agent AI stock analysis system … to provide comprehensive daily trading insights and next-day price predictions”
  • Alpaca’s example on building MCP-Based Trading workflow

As teaching demos, they are excellent: they reduce integration friction and demonstrate how quickly you can reach a working prototype. In practice, however, running trading strategies this way is too risky.

Danger Zone

In these projects, the trading logic relies on the model’s output, which is inherently non-deterministic and can change over time. To make things worse, you may get downgraded to a cheaper model mid-session due to rate limits or quota exhaustion.

This means the exact same results cannot be reproduced, even if the rules, data, and environment are fixed.

A simple solution

Instead of relying on MCP agents to research and trade, use them to generate code that you review, version-control, and run in a controlled environment. Over time, the accumulated code can be curated into a strategy or research library.

Based on our experience with hundreds of hours of AI-assisted strategy and product development, we recommend:

1. Treat AI as a Code Generator, Not an Autonomous Agent

  • Use AI to generate reproducible scripts and analysis code
  • Review AI-generated plans and code before execution
  • Maintain human oversight of critical decisions

2. Version Control Everything

  • Regularly commit all analysis scripts, strategies, and utilities to version control
  • Include the models and the prompts used to generate the code (e.g., in Pull Request description)
  • Document data sources, extraction timestamps, and filters in a data catalog
  • Persist backtest results, metrics, and visualizations in structured storage

3. Use the right model for the task

  • Use the most capable model to create a detailed plan for the task (e.g. Anthropic Opus or GPT-5 at the time of writing)
  • Review the plan using different models to gain confidence (e.g. OpenAI’s o3-pro, Gemini-2.5-pro)
  • Use the detailed implementation plan to generate the code. A less capable model can be used here. (e.g. Anthropic Sonnet)
  • Review the generated code using different models (e.g.: gemini-2.5-pro, r1 and o3-pro)
  • Conduct the final review of the generated code using human expertise

Note:

Over time, we hope every MCP server and agentic tool will generate audit logs of every tool call, including inputs, outputs, model IDs, and timestamps. This would resolve several reproducibility issues.


Conclusion

Pure MCP agentic workflows are productivity rockets—but but also reproducibility traps. For credible research, treat agents as compilers and planners rather than autonomous researchers. Generate code, pin environments and data, and log every run.

If a result can’t be reproduced from code, config, data snapshot, and a manifest, it’s not research—it’s a demo.

Join The Conversation

For specific platform feedback and suggestions, please submit it directly to our team using these instructions.

If you have an account-specific question or concern, please reach out to Client Services.

We encourage you to look through our FAQs before posting. Your question may already be covered!

Leave a Reply

Disclosure: Interactive Brokers Third Party

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from Deltaray and is being posted with its permission. The views expressed in this material are solely those of the author and/or Deltaray and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

Disclosure: API Examples Discussed

Throughout the lesson, please keep in mind that the examples discussed are purely for technical demonstration purposes, and do not constitute trading advice. Also, it is important to remember that placing trades in a paper account is recommended before any live trading.

IBKR Campus Newsletters

This website uses cookies to collect usage information in order to offer a better browsing experience. By browsing this site or by clicking on the "ACCEPT COOKIES" button you accept our Cookie Policy.