| | ---
|
| | library_name: stable-baselines3
|
| | tags:
|
| | - reinforcement-learning
|
| | - finance
|
| | - stock-trading
|
| | - deep-reinforcement-learning
|
| | - dqn
|
| | - ppo
|
| | - a2c
|
| | model-index:
|
| | - name: RL-Trading-Agents
|
| | results:
|
| | - task:
|
| | type: reinforcement-learning
|
| | name: Stock Trading
|
| | metrics:
|
| | - type: sharpe_ratio
|
| | value: Variable
|
| | - type: total_return
|
| | value: Variable
|
| | ---
|
| |
|
| | # ๐ค Multi-Agent Reinforcement Learning Trading System
|
| |
|
| | This repository contains trained Deep Reinforcement Learning agents for automated stock trading. The agents were trained using `stable-baselines3` on a custom OpenAI Gym environment simulating the US Stock Market (AAPL, MSFT, GOOGL).
|
| |
|
| | ## ๐ง Models
|
| |
|
| | The following algorithms were used:
|
| | 1. **DQN (Deep Q-Network)**: Off-policy RL algorithm suitable for discrete action spaces.
|
| | 2. **PPO (Proximal Policy Optimization)**: On-policy gradient method known for stability.
|
| | 3. **A2C (Advantage Actor-Critic)**: Synchronous deterministic policy gradient method.
|
| | 4. **Ensemble**: A meta-voter that takes the majority decision from the above three.
|
| |
|
| | ## ๐๏ธ Training Data
|
| |
|
| | The models were trained on technical indicators derived from historical daily price data (2018-2024):
|
| | * **Returns**: Daily percentage change.
|
| | * **RSI (14)**: Relative Strength Index.
|
| | * **MACD**: Moving Average Convergence Divergence.
|
| | * **Bollinger Bands**: Volatility measure.
|
| | * **Volume Ratio**: Relative volume intensity.
|
| | * **Market Regime**: Bull/Bear trend classification.
|
| |
|
| | ## ๐ Related Data
|
| |
|
| | * **Dataset Repository**: [AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Data](https://huggingface.co/AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Data)
|
| | * **GitHub Repository**: [ADITYA-tp01/Multi-Agent-Reinforcement-Learning-Trading-System-Data](https://github.com/ADITYA-tp01/Multi-Agent-Reinforcement-Learning-Trading-System-Data)
|
| |
|
| |
|
| | ## ๐ฎ Environment (`TradingEnv`)
|
| |
|
| | * **Action Space**: Discrete(3) - `0: HOLD`, `1: BUY`, `2: SELL`.
|
| | * **Observation Space**: Box(10,) - Normalized technical features + portfolio state.
|
| | * **Reward**: Profit & Loss (PnL) minus transaction costs and drawdown penalties.
|
| |
|
| | ## ๐ Usage
|
| |
|
| | ```python
|
| | import gymnasium as gym
|
| | from stable_baselines3 import PPO
|
| |
|
| | # Load the environment (custom wrapper required)
|
| | # env = TradingEnv(df)
|
| |
|
| | # Load model
|
| | model = PPO.load("ppo_AAPL.zip")
|
| |
|
| | # Predict
|
| | action, _ = model.predict(obs, deterministic=True)
|
| | ```
|
| |
|
| | ## ๐ Performance
|
| |
|
| | Performance varies by ticker and market condition. See the generated `results/` CSVs for detailed Sharpe Ratios and Max Drawdown stats per agent.
|
| |
|
| | ## ๐ ๏ธ Credits
|
| |
|
| | Developed by **Adityaraj Suman** as part of the Multi-Agent RL Trading System project.
|
| |
|