Taoshi
/

model_v4

English

bittensor

Model card Files Files and versions

xet

Community

arrash commited on Nov 29, 2023

Commit

f5c5980

1 Parent(s): 03ecde4

Update README.md

Browse files

Files changed (1) hide show

README.md +77 -45

README.md CHANGED Viewed

@@ -25,8 +25,10 @@ SOFTWARE.
 # Background
 ```text
-The models provided here were created using open source modeling techniques provided in https://github.com/taoshidev/time-series-prediction-subnet (TSPS).
-They were achieved using the runnable/miner_training.py, and tested against existing models and dummy models in runnable/miner_testing.py
 ```
 # Build Strategy
@@ -37,21 +39,30 @@ This section outlines the strategy used to build the models.
 ## Understanding Dataset Used
 ```text
-The dataset used to build the models can be generated using the runnable/generate_historical_data.py. A lookback period between June 2022 and July 2023 on the 5m interval was used to
-train the model. Through analysis, the reason this dataset was used is because historical data beyond June 2022 provides strongly trending price movement or data movement that is from a
-period where Bitcoin's market cap was too small to be relevant to where Bitcoin is now. Therefore, using more recent data was used which correlates to the current market
-cap and macroeconomic conditions where its uncertain we'll continue to get highly trending Bitcoin data.
-Testing data was used between June 2023 and Nov 2023 to determine performance of the models. This was tested using the runnable/miner_testing.py file with a separately
-generated test dataset from runnable/generate_historical_data.py.
 ```
 ## Understanding Model Creation
 ```text
-As of now, the TSPS infrastructure only provides close, high, low, and volume. It also provides financial indicators such as RSI, MACD, and Bollinger Bands but they were not
-used for the purposes of training these models.
-The models were derived using a variety of windows and iterations through the June 2022 to June 2023 dataset. The strategy to derive the model was the following:
 base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
                             .set_neurons([[1024, 0]]) \
@@ -61,34 +72,45 @@ base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
                             .set_model_dir(f'mining_models/model1.h5')
                         base_mining_model.train(prep_dataset, epochs=25)
-where an LSTM model was created by using a few or no stacked layers. Most of the v4 models are actually not stacked as they performed better not being stacked for the most part.
-This could very likely change as more feature inputs are added (this is being worked on as part of the open source infra in TSPS). The window size of 100 helped best predict
-the outcome, derived in mining_objects/base_mining_model.py
 ```
 ## Understanding Training Decisions
 ```text
-Training the model used the previous 601 rows of data as an input. This is because 500 rows were used to batch, and we are looking to predict 100 rows into the future (the challenge
-presented in the Time Series Prediction Subnet). Measures were taken to ensure all data was trained on in the training data.
-Each set of 601 rows was trained on 25 times, inside another loop which iterated on the entirety of the dataset from 6/22 to 6/23 50 times. This provided the model the ability
-to get granular with details yet not overfit to any single set of rows at once. Therefore, a multi-layered looping infrastructure was used to derive the models.
 for x in range(50):
-  for i in range(25):
-    train_model()
 ```
 ## Strategy to Predict
 ```text
-The strategy to predict 100 closes of data into the future was to use a 1 step methodology of predicting 1 step at 100 intervals into the future and connect the information by
-generating a line from the last close to the prediction 100 closes into the future. By doing so, the model could learn to predict a single step rather than all 100 where loss
-could continue to increase with each misstep.
 ```
 # Model V5
 ```text
-Recommendations on how to perform better than V4 and what Model V5 will look like are outlined below:
 1. Concentrate on more difficult moves
 2. Get more granular data (1m)
@@ -97,37 +119,47 @@ Recommendations on how to perform better than V4 and what Model V5 will look lik
 -- Concentrate on more difficult moves
-The Time Series Prediction Subnet will reward models that are capable of predicting more "difficult" movements in the market more than those that are less difficult. Therefore,
-taking a strategy to train your model on larger movements or bigger magnitude movements would be a good consideration. Some additional details on how difficulty is calculated
-will be released soon but it is a combination of the magnitude of the movement with the std dev of the movement in the predicted interval.
 -- Get more granular data (1m)
-With these larger magnitude movements, a strategy to get more granular with the data would be recommended. Using 1m data to train rather than 5m would help the models better
-predict information.
 -- Get more data sources
-Beyond using financial market indicators like RSI, MACD, and Bollinger Bands the TSPS open source infra will gather information for miners to help train.
-The TSPS infrastructure will be adding data scrapers and using those data scrapers automatically gather information for you. The following pieces of information will be gathered & accessible
-through the open source infra:
-Bitcoin open interest
-Bitcoin OHLCV data
-Bitcoin funding rate
-DXY OHLCV data
-Gold OHLCV data
-S&P 500 OHLCV data
-Bitcoin dominance
-Historical news data (sentiment analysis)
-Using this information will provide models with information they can use to better predict prices as markets correlate in movement and Bitcoin responds to other markets.
 -- Use more predicted steps
-Rather than only predicting a single step at the 100th predicted close in the future predict more steps. This can be achieved by training multiple models, for example
-10 models each at 10 closes into the future (10, 20, 30, 40, 50, 60, 70, 80, 90, 100), or by using a multi-step model with 10 steps.
-Both will achieve more granularity when it comes to predictions and therefore can achieve a much greater RMSE score.
 ```

 # Background
 ```text
+The models provided here were created using open source modeling techniques
+provided in https://github.com/taoshidev/time-series-prediction-subnet (TSPS).
+They were achieved using the runnable/miner_training.py, and tested against
+existing models and dummy models in runnable/miner_testing.py.
 ```
 # Build Strategy
 ## Understanding Dataset Used
 ```text
+The dataset used to build the models can be generated using the
+runnable/generate_historical_data.py. A lookback period between June 2022 and
+July 2023 on the 5m interval was used to train the model. Through analysis, the
+reason this dataset was used is because historical data beyond June 2022 provides
+strongly trending price movement or data movement that is from a period where
+Bitcoin's market cap was too small to be relevant to where Bitcoin is now.
+Therefore, using more recent data was used which correlates to the current market
+cap and macroeconomic conditions where its uncertain we'll continue to get highly
+trending Bitcoin data.
+Testing data was used between June 2023 and Nov 2023 to determine performance of
+the models. This was tested using the runnable/miner_testing.py file with a
+separately generated test dataset from runnable/generate_historical_data.py.
 ```
 ## Understanding Model Creation
 ```text
+As of now, the TSPS infrastructure only provides close, high, low, and volume. It
+also provides financial indicators such as RSI, MACD, and Bollinger Bands but they
+were not used for the purposes of training these models.
+The models were derived using a variety of windows and iterations through the June
+2022 to June 2023 dataset. The strategy to derive the model was the following:
 base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
                             .set_neurons([[1024, 0]]) \
                             .set_model_dir(f'mining_models/model1.h5')
                         base_mining_model.train(prep_dataset, epochs=25)
+where an LSTM model was created by using a few or no stacked layers. Most of the
+v4 models are actually not stacked as they performed better not being stacked for
+the most part. This could very likely change as more feature inputs are added (this
+is being worked on as part of the open source infra in TSPS). The window size of
+100 helped best predict the outcome, derived in mining_objects/base_mining_model.py
 ```
 ## Understanding Training Decisions
 ```text
+Training the model used the previous 601 rows of data as an input. This is because
+500 rows were used to batch, and we are looking to predict 100 rows into the future
+(the challenge presented in the Time Series Prediction Subnet). Measures were taken
+to ensure all data was trained on in the training data.
+Each set of 601 rows was trained on 25 times, inside another loop which iterated on
+the entirety of the dataset from 6/22 to 6/23 50 times. This provided the model the
+ability to get granular with details yet not overfit to any single set of rows at
+once. Therefore, a multi-layered looping infrastructure was used to derive the models.
 for x in range(50):
+    for i in range(25):
+        train_model()
 ```
 ## Strategy to Predict
 ```text
+The strategy to predict 100 closes of data into the future was to use a 1 step
+methodology of predicting 1 step at 100 intervals into the future and connect the
+information by generating a line from the last close to the prediction 100 closes
+into the future. By doing so, the model could learn to predict a single step rather
+than all 100 where loss could continue to increase with each misstep.
 ```
 # Model V5
 ```text
+Here's the text spaced out for readability in a README file:
+Recommendations on how to perform better than V4 and what Model V5 will look like
+are outlined below:
 1. Concentrate on more difficult moves
 2. Get more granular data (1m)
 -- Concentrate on more difficult moves
+The Time Series Prediction Subnet will reward models that are capable of predicting
+more "difficult" movements in the market more than those that are less difficult.
+Therefore, taking a strategy to train your model on larger movements or bigger
+magnitude movements would be a good consideration. Some additional details on how
+difficulty is calculated will be released soon but it is a combination of the
+magnitude of the movement with the std dev of the movement in the predicted interval.
 -- Get more granular data (1m)
+With these larger magnitude movements, a strategy to get more granular with the data
+would be recommended. Using 1m data to train rather than 5m would help the models
+better predict information.
 -- Get more data sources
+Beyond using financial market indicators like RSI, MACD, and Bollinger Bands, the
+TSPS open source infra will gather information for miners to help train.
+The TSPS infrastructure will be adding data scrapers and using those data scrapers
+to automatically gather information for you. The following pieces of information will
+be gathered & accessible through the open source infra:
+- Bitcoin open interest
+- Bitcoin OHLCV data
+- Bitcoin funding rate
+- DXY OHLCV data
+- Gold OHLCV data
+- S&P 500 OHLCV data
+- Bitcoin dominance
+- Historical news data (sentiment analysis)
+Using this information will provide models with information they can use to better
+predict prices as markets correlate in movement and Bitcoin responds to other markets.
 -- Use more predicted steps
+Rather than only predicting a single step at the 100th predicted close in the future,
+predict more steps. This can be achieved by training multiple models, for example,
+10 models each at 10 closes into the future (10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
+or by using a multi-step model with 10 steps. Both will achieve more granularity when
+it comes to predictions and therefore can achieve a much greater RMSE score.
 ```