NVDA +312.4% AAPL +45.2% TSLA +89.7% MSFT +52.1% AMZN +67.3% GOOGL +38.9% JNJ -12.4% META +124.5% NVDA +312.4% AAPL +45.2% TSLA +89.7% MSFT +52.1% AMZN +67.3% GOOGL +38.9% JNJ -12.4% META +124.5%

Stanford AA 228 · Decision Making Under Uncertainty

Algorithmic Trading
with Markov Decision
Processes

A model-based reinforcement learning approach to stock trading that outperforms naive strategies using RSI-based state classification and value iteration.

View Code Read Paper

Project Overview

This project develops a low-frequency algorithmic trading strategy designed for investment firms with significant market influence. Unlike high-frequency trading, our approach operates on weekly timeframes, making decisions that can meaningfully impact stock prices.

We frame the trading problem as a Markov Decision Process, using the Relative Strength Index (RSI) to classify market states and computing optimal policies through value iteration. The model-based approach leverages historical data from 2000-2020 to construct explicit transition and reward matrices.

Key insight: While model-free methods like Q-learning require massive datasets, stock trading's limited data (~365 points/year) makes model-based approaches more practical and interpretable.

MDP State Space

OVERSOLD RSI < μ - σ

NEUTRAL μ - σ ≤ RSI ≤ μ + σ

OVERBOUGHT RSI > μ + σ

BUY

HOLD

SELL

Technical Approach

State Classification

States derived from RSI (Relative Strength Index), a momentum indicator measuring price change magnitude. Weekly RSI values categorized into three states based on statistical thresholds (mean ± std).

Action Classification

Actions labeled based on deviation from expected closing price using 12-week trend regression. Buy if price exceeds +0.5σ, Sell if below -0.5σ, Hold within ±0.25σ.

Transition Matrix T

Captures P(s'|s,a) — probability of transitioning between states given an action. Built from observed state-action-state frequencies in training data.

Reward Matrix R

Records expected % price change for each state-action pair. Computed as average observed reward for each (state, action) combination.

Value Iteration

Iteratively computes optimal utility U(s) for each state until convergence. Discount factor γ=0.9 balances immediate vs. future rewards.

Policy Extraction

Optimal policy π*(s) extracted by selecting action maximizing expected utility. Results in one of 3³ = 27 possible policies across the three states.

BELLMAN OPTIMALITY EQUATION

a*(s) = argmax_a [ R(s,a) + γ Σ_s' T(s,a,s') · U(s') ]

a*(s) Optimal action for state s

R(s,a) Expected reward for taking action a in state s

γ = 0.9 Discount factor balancing immediate vs. future rewards

T(s,a,s') Probability of transitioning to s' after taking a in s

U(s') Utility of transitioning to state s'

Σ_s' Sum over all possible successor states

Results & Analysis

NVDA (NVIDIA)

Optimal = Best

JNJ (Johnson & Johnson)

Optimal Underperformed

Aggregate Performance: Top 20 Stocks by Market Cap

Normalized returns averaged across AAPL, MSFT, GOOGL, AMZN, NVDA, TSLA, META, and 13 others

Optimal

0.52

sell-buy-hold

0.44

sell-hold-buy

0.43

buy-buy-buy

0.31

hold-sell-sell

0.08

✓ Momentum-Driven Stocks

High-volatility, momentum-driven stocks (NVDA, TSLA) showed strongest performance under optimal policy. RSI-based state classification captures short-term sentiment effectively for speculative assets.

✓ Aggregate Outperformance

Optimal policy ranked #1 across all 27 possible policies when averaged over 20 stocks, validating the model-based approach for portfolio-level decision making.

⚠ Stable Stock Limitations

Low-volatility stocks (JNJ, PG) showed weak or negative returns under optimal policy. Prices driven by fundamentals rather than momentum—RSI less predictive.

⚠ Data Constraints

Limited historical data (~1000 weekly observations) constrains transition matrix accuracy. COVID-era data excluded to avoid anomalous patterns.

Algorithmic Tradingwith Markov DecisionProcesses