Update README.md
Browse files
README.md
CHANGED
|
@@ -25,8 +25,10 @@ SOFTWARE.
|
|
| 25 |
# Background
|
| 26 |
|
| 27 |
```text
|
| 28 |
-
The models provided here were created using open source modeling techniques
|
| 29 |
-
|
|
|
|
|
|
|
| 30 |
```
|
| 31 |
|
| 32 |
# Build Strategy
|
|
@@ -37,21 +39,30 @@ This section outlines the strategy used to build the models.
|
|
| 37 |
|
| 38 |
## Understanding Dataset Used
|
| 39 |
```text
|
| 40 |
-
The dataset used to build the models can be generated using the
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
```
|
| 48 |
|
| 49 |
## Understanding Model Creation
|
| 50 |
```text
|
| 51 |
-
As of now, the TSPS infrastructure only provides close, high, low, and volume. It
|
| 52 |
-
|
|
|
|
| 53 |
|
| 54 |
-
The models were derived using a variety of windows and iterations through the June
|
|
|
|
| 55 |
|
| 56 |
base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
|
| 57 |
.set_neurons([[1024, 0]]) \
|
|
@@ -61,34 +72,45 @@ base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
|
|
| 61 |
.set_model_dir(f'mining_models/model1.h5')
|
| 62 |
base_mining_model.train(prep_dataset, epochs=25)
|
| 63 |
|
| 64 |
-
where an LSTM model was created by using a few or no stacked layers. Most of the
|
| 65 |
-
|
| 66 |
-
the
|
|
|
|
|
|
|
| 67 |
```
|
| 68 |
|
| 69 |
## Understanding Training Decisions
|
| 70 |
```text
|
| 71 |
-
Training the model used the previous 601 rows of data as an input. This is because
|
| 72 |
-
|
|
|
|
|
|
|
| 73 |
|
| 74 |
-
Each set of 601 rows was trained on 25 times, inside another loop which iterated on
|
| 75 |
-
|
|
|
|
|
|
|
| 76 |
|
| 77 |
for x in range(50):
|
| 78 |
-
|
| 79 |
-
|
| 80 |
```
|
| 81 |
|
| 82 |
## Strategy to Predict
|
| 83 |
```text
|
| 84 |
-
The strategy to predict 100 closes of data into the future was to use a 1 step
|
| 85 |
-
|
| 86 |
-
|
|
|
|
|
|
|
| 87 |
```
|
| 88 |
|
| 89 |
# Model V5
|
| 90 |
```text
|
| 91 |
-
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
1. Concentrate on more difficult moves
|
| 94 |
2. Get more granular data (1m)
|
|
@@ -97,37 +119,47 @@ Recommendations on how to perform better than V4 and what Model V5 will look lik
|
|
| 97 |
|
| 98 |
-- Concentrate on more difficult moves
|
| 99 |
|
| 100 |
-
The Time Series Prediction Subnet will reward models that are capable of predicting
|
| 101 |
-
|
| 102 |
-
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
-- Get more granular data (1m)
|
| 105 |
|
| 106 |
-
With these larger magnitude movements, a strategy to get more granular with the data
|
| 107 |
-
|
|
|
|
| 108 |
|
| 109 |
-- Get more data sources
|
| 110 |
|
| 111 |
-
Beyond using financial market indicators like RSI, MACD, and Bollinger Bands the
|
|
|
|
| 112 |
|
| 113 |
-
The TSPS infrastructure will be adding data scrapers and using those data scrapers
|
| 114 |
-
|
|
|
|
| 115 |
|
| 116 |
-
Bitcoin open interest
|
| 117 |
-
Bitcoin OHLCV data
|
| 118 |
-
Bitcoin funding rate
|
| 119 |
-
DXY OHLCV data
|
| 120 |
-
Gold OHLCV data
|
| 121 |
-
S&P 500 OHLCV data
|
| 122 |
-
Bitcoin dominance
|
| 123 |
-
Historical news data (sentiment analysis)
|
| 124 |
|
| 125 |
-
Using this information will provide models with information they can use to better
|
|
|
|
| 126 |
|
| 127 |
-- Use more predicted steps
|
| 128 |
|
| 129 |
-
Rather than only predicting a single step at the 100th predicted close in the future
|
| 130 |
-
|
| 131 |
-
|
|
|
|
|
|
|
|
|
|
| 132 |
```
|
| 133 |
|
|
|
|
| 25 |
# Background
|
| 26 |
|
| 27 |
```text
|
| 28 |
+
The models provided here were created using open source modeling techniques
|
| 29 |
+
provided in https://github.com/taoshidev/time-series-prediction-subnet (TSPS).
|
| 30 |
+
They were achieved using the runnable/miner_training.py, and tested against
|
| 31 |
+
existing models and dummy models in runnable/miner_testing.py.
|
| 32 |
```
|
| 33 |
|
| 34 |
# Build Strategy
|
|
|
|
| 39 |
|
| 40 |
## Understanding Dataset Used
|
| 41 |
```text
|
| 42 |
+
The dataset used to build the models can be generated using the
|
| 43 |
+
runnable/generate_historical_data.py. A lookback period between June 2022 and
|
| 44 |
+
July 2023 on the 5m interval was used to train the model. Through analysis, the
|
| 45 |
+
reason this dataset was used is because historical data beyond June 2022 provides
|
| 46 |
+
strongly trending price movement or data movement that is from a period where
|
| 47 |
+
Bitcoin's market cap was too small to be relevant to where Bitcoin is now.
|
| 48 |
+
|
| 49 |
+
Therefore, using more recent data was used which correlates to the current market
|
| 50 |
+
cap and macroeconomic conditions where its uncertain we'll continue to get highly
|
| 51 |
+
trending Bitcoin data.
|
| 52 |
+
|
| 53 |
+
Testing data was used between June 2023 and Nov 2023 to determine performance of
|
| 54 |
+
the models. This was tested using the runnable/miner_testing.py file with a
|
| 55 |
+
separately generated test dataset from runnable/generate_historical_data.py.
|
| 56 |
```
|
| 57 |
|
| 58 |
## Understanding Model Creation
|
| 59 |
```text
|
| 60 |
+
As of now, the TSPS infrastructure only provides close, high, low, and volume. It
|
| 61 |
+
also provides financial indicators such as RSI, MACD, and Bollinger Bands but they
|
| 62 |
+
were not used for the purposes of training these models.
|
| 63 |
|
| 64 |
+
The models were derived using a variety of windows and iterations through the June
|
| 65 |
+
2022 to June 2023 dataset. The strategy to derive the model was the following:
|
| 66 |
|
| 67 |
base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
|
| 68 |
.set_neurons([[1024, 0]]) \
|
|
|
|
| 72 |
.set_model_dir(f'mining_models/model1.h5')
|
| 73 |
base_mining_model.train(prep_dataset, epochs=25)
|
| 74 |
|
| 75 |
+
where an LSTM model was created by using a few or no stacked layers. Most of the
|
| 76 |
+
v4 models are actually not stacked as they performed better not being stacked for
|
| 77 |
+
the most part. This could very likely change as more feature inputs are added (this
|
| 78 |
+
is being worked on as part of the open source infra in TSPS). The window size of
|
| 79 |
+
100 helped best predict the outcome, derived in mining_objects/base_mining_model.py
|
| 80 |
```
|
| 81 |
|
| 82 |
## Understanding Training Decisions
|
| 83 |
```text
|
| 84 |
+
Training the model used the previous 601 rows of data as an input. This is because
|
| 85 |
+
500 rows were used to batch, and we are looking to predict 100 rows into the future
|
| 86 |
+
(the challenge presented in the Time Series Prediction Subnet). Measures were taken
|
| 87 |
+
to ensure all data was trained on in the training data.
|
| 88 |
|
| 89 |
+
Each set of 601 rows was trained on 25 times, inside another loop which iterated on
|
| 90 |
+
the entirety of the dataset from 6/22 to 6/23 50 times. This provided the model the
|
| 91 |
+
ability to get granular with details yet not overfit to any single set of rows at
|
| 92 |
+
once. Therefore, a multi-layered looping infrastructure was used to derive the models.
|
| 93 |
|
| 94 |
for x in range(50):
|
| 95 |
+
for i in range(25):
|
| 96 |
+
train_model()
|
| 97 |
```
|
| 98 |
|
| 99 |
## Strategy to Predict
|
| 100 |
```text
|
| 101 |
+
The strategy to predict 100 closes of data into the future was to use a 1 step
|
| 102 |
+
methodology of predicting 1 step at 100 intervals into the future and connect the
|
| 103 |
+
information by generating a line from the last close to the prediction 100 closes
|
| 104 |
+
into the future. By doing so, the model could learn to predict a single step rather
|
| 105 |
+
than all 100 where loss could continue to increase with each misstep.
|
| 106 |
```
|
| 107 |
|
| 108 |
# Model V5
|
| 109 |
```text
|
| 110 |
+
Here's the text spaced out for readability in a README file:
|
| 111 |
+
|
| 112 |
+
Recommendations on how to perform better than V4 and what Model V5 will look like
|
| 113 |
+
are outlined below:
|
| 114 |
|
| 115 |
1. Concentrate on more difficult moves
|
| 116 |
2. Get more granular data (1m)
|
|
|
|
| 119 |
|
| 120 |
-- Concentrate on more difficult moves
|
| 121 |
|
| 122 |
+
The Time Series Prediction Subnet will reward models that are capable of predicting
|
| 123 |
+
more "difficult" movements in the market more than those that are less difficult.
|
| 124 |
+
Therefore, taking a strategy to train your model on larger movements or bigger
|
| 125 |
+
magnitude movements would be a good consideration. Some additional details on how
|
| 126 |
+
difficulty is calculated will be released soon but it is a combination of the
|
| 127 |
+
magnitude of the movement with the std dev of the movement in the predicted interval.
|
| 128 |
|
| 129 |
-- Get more granular data (1m)
|
| 130 |
|
| 131 |
+
With these larger magnitude movements, a strategy to get more granular with the data
|
| 132 |
+
would be recommended. Using 1m data to train rather than 5m would help the models
|
| 133 |
+
better predict information.
|
| 134 |
|
| 135 |
-- Get more data sources
|
| 136 |
|
| 137 |
+
Beyond using financial market indicators like RSI, MACD, and Bollinger Bands, the
|
| 138 |
+
TSPS open source infra will gather information for miners to help train.
|
| 139 |
|
| 140 |
+
The TSPS infrastructure will be adding data scrapers and using those data scrapers
|
| 141 |
+
to automatically gather information for you. The following pieces of information will
|
| 142 |
+
be gathered & accessible through the open source infra:
|
| 143 |
|
| 144 |
+
- Bitcoin open interest
|
| 145 |
+
- Bitcoin OHLCV data
|
| 146 |
+
- Bitcoin funding rate
|
| 147 |
+
- DXY OHLCV data
|
| 148 |
+
- Gold OHLCV data
|
| 149 |
+
- S&P 500 OHLCV data
|
| 150 |
+
- Bitcoin dominance
|
| 151 |
+
- Historical news data (sentiment analysis)
|
| 152 |
|
| 153 |
+
Using this information will provide models with information they can use to better
|
| 154 |
+
predict prices as markets correlate in movement and Bitcoin responds to other markets.
|
| 155 |
|
| 156 |
-- Use more predicted steps
|
| 157 |
|
| 158 |
+
Rather than only predicting a single step at the 100th predicted close in the future,
|
| 159 |
+
predict more steps. This can be achieved by training multiple models, for example,
|
| 160 |
+
10 models each at 10 closes into the future (10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
|
| 161 |
+
or by using a multi-step model with 10 steps. Both will achieve more granularity when
|
| 162 |
+
it comes to predictions and therefore can achieve a much greater RMSE score.
|
| 163 |
+
|
| 164 |
```
|
| 165 |
|