This article was published as a part of the Data Science Blogathon
This is a tutorial on how to create a deep learning model for predicting stock prices using the TensorFlow framework. This is an advanced project of Tensorflow which means you should be very clear with the basics of Stock Prices. You can also check one of my favourite AnalyticsVidhya articles for the same.
Stock Price Data import and preparation
Heinz exported stock data to a CSV file. Its dataset contained n = 41,266 minutes of data covering 500 stocks traded from April to August 2017, as well as information on the price of the S&P 500.
# Data import
data = pd.read_csv ('data_stocks.csv') # Resetting the date variable data = data.drop (['DATE'], 1) # Dataset dimension n = data.shape [0] p = data.shape [1] # Forming data into a numpy array data = data.values This is the S&P time series plotted with pyplot.plot (data ['SP500']):
Image 1
Interesting point: Since the ultimate goal is to “predict” the value of the index in the near future, it moves one minute ahead.
Preparing data for testing and training
The dataset was split into two, one for testing and one for training. At the same time, data for training accounted for 80% of their total volume and covered the period from April to approximately the end of July 2017, data for testing ended in August 2017.
# Data for testing and training train_start = 0 train_end = int(np.floor(0.8*n)) test_start = train_end test_end = n data_train = data[np.arange(train_start, train_end), :] data_test = data[np.arange(test_start, test_end), :]
There are many approaches to time series cross-validation, from generating forecasts with or without refitting to more complex concepts like bootstrap time series resampling. In the latter case, the data is split into repeated samples starting from the beginning of the seasonal decomposition of the time series – this allows simulating samples that follow the same seasonal pattern as the original time series, but do not completely copy its values.
Data scaling
Most neural network architectures use input (and sometimes output) scaling. The reason is that most neuron activation functions like sigmoid or hyperbolic tangent are defined at intervals [-1, 1] or [0, 1], respectively. Currently, rectified linear unit (ReLU) activations are most commonly used. Heinz decided to scale the inputs and targets by using MinMaxScaler in Python for this:
# Data scaling
from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler () scaler.fit (data_train) data_train = scaler.transform (data_train) data_test = scaler.transform (data_test) # Plotting X and y X_train = data_train[:, 1:] y_train = data_train[:, 0] X_test = data_test[:, 1:] y_test = data_test[:, 0]
Note: You should be careful when choosing the data part and the time for scaling. A common mistake here is to scale the entire dataset before splitting it into test and training data. This is an error because scaling triggers the calculation of statistics, that is, the minimums/maximums of the variables. When doing time series forecasting in real life, at the time they are generated, you may not have information from future observations. Therefore, the statistics must be calculated on the training data, and then the result obtained is applied to the test data. Taking information “from the future” (that is, from the test sample) to generate predictions, the model will produce predictions with “system bias” (bias).
Introduction to TensorFlow
TensorFlow is a great product, currently the most popular framework for solving machine learning problems and creating neural networks. The backend of the product is based on C ++, however, Python is usually used for control. TensorFlow uses the concept of graphing computational tasks. This approach allows users to define mathematical operations as elements of data graphs, variables, and operators. Since neural networks are, in fact, graphs of data and mathematical operations, TensorFlow is great for working with them and machine learning. The example below shows a graph that solves the problem of adding two numbers:
Image 2
The picture above shows two numbers that need to be added. In variables a and b, they actually find their place gets recorded. The values travel through the graph and arrive at the node represented by the square, where the addition takes place. The result of the operation is written to another variable c. The variables used can be considered as placeholders. Any numbers that fall into a and b are added, and the result is written to c.
This is the real working of TensorFlow – the user declares an abstract representation of the model through holders and variables. After that, the first ones are filled with real data, and calculations take place. The test case above is described by the following code in TensorFlow:
# Import TensorFlow
import tensorflow as tf # Defining a and b as placeholders a = tf.placeholder (dtype = tf.int8) b = tf.placeholder (dtype = tf.int8) # Definition of addition c = tf.add (a, b) # Graph initialization graph = tf.Session () # Run graph graph.run (c, feed_dict = {a: 5, b: 4})
After importing the TensorFlow library using tf.placeholder (), two placeholders are defined. They correspond to the two blue circles on the left side of the image above. After that, the addition operation is defined using tf.add (). The result of the operation is c = 9. With configured placeholders, the graph can be executed for any integer values of a and b. It is clear that this example is extremely simple, and neural networks in real life are much more complicated, but it allows you to understand the principles of the framework.
Placeholders
As stated above, it all starts with placeholders. In order to implement the model, you need two such elements: X contains the input data for the network (stock prices of all S&P 500 elements at time T = t) and output data Y (the value of the S&P 500 index at time T = t + 1).
The shape of placeholders is like [None, n_stocks], in which the word [None] means the input is a 2-D matrix and the output is a 1-D vector. It is important to understand what form of input and output data the neural network needs and organize them accordingly.
#Placeholder
X = tf.placeholder(dtype=tf.float32, shape=[None, n_stocks]) Y = tf.placeholder(dtype=tf.float32, shape=[None])
The None argument means that at this point we do not yet know the number of observations that will pass through the neural network graph during each run, so it remains flexible. Later, the batch_size variable will be defined, which controls the number of observations during the training run.
Variables
In addition to placeholders, there is another important element in the TensorFlow universe – variables. If the use of placeholders is to store input and expected data in a graph format, then variables act as flexible containers within the graph. They are allowed to change during the execution of the graph. Weights and biases are presented as variables in order to facilitate adaptation during training. Variables must be initialized before starting training.
The model consists of four hidden levels. The first contains 1,024 neurons, which is slightly more than twice the size of the input data. Subsequent hidden levels are always half the size of the previous one – they combine 512, 256, and 128 neurons. Reducing the number of neurons at each level compresses the information that the network processed at the previous levels. There are other neuron architectures and configurations, but this tutorial uses this model:
# Model architecture parameters
n_stocks = 500 n_neurons_1 = 1024 n_neurons_2 = 512 n_neurons_3 = 256 n_neurons_4 = 128 n_target = 1 # Level 1: Variables for hidden weights and biases W_hidden_1 = tf.Variable (weight_initializer ([n_stocks, n_neurons_1])) bias_hidden_1 = tf.Variable (bias_initializer ([n_neurons_1])) # Level 2: Variables for hidden weights and biases W_hidden_2 = tf.Variable (weight_initializer ([n_neurons_1, n_neurons_2])) bias_hidden_2 = tf.Variable (bias_initializer ([n_neurons_2])) # Level 3: Variables for hidden weights and biases W_hidden_3 = tf.Variable (weight_initializer ([n_neurons_2, n_neurons_3])) bias_hidden_3 = tf.Variable (bias_initializer ([n_neurons_3])) # Level 4: Variables for hidden weights and biases W_hidden_4 = tf.Variable (weight_initializer ([n_neurons_3, n_neurons_4])) bias_hidden_4 = tf.Variable (bias_initializer ([n_neurons_4])) # Output level: Variables for hidden weights and biases W_out = tf.Variable (weight_initializer ([n_neurons_4, n_target])) bias_out = tf.Variable (bias_initializer ([n_target]))
It is important to understand what variable sizes are required for different levels. As a rule of thumb for multilevel perceptrons, the size of the previous level is the first size of the current level for the weight matrices. It sounds complicated, but the bottom line is that each layer passes its output as input to the next layer. The displacement sizes are equal to the second size of the weight matrix of the current level, which corresponds to the number of neurons in the level.
Network architecture development
After determining the required weights and biases of variables, network topology, it is necessary to determine the architecture of the network. Thus, data as placeholders and weights and biases as variables need to be combined into a system of sequential matrix multiplications. Activation functions are responsible for the transformation of hidden layers. These functions are important elements of the network infrastructure because they introduce non-linearity into the system. There are dozens of activation functions, and one of the most common is the rectified linear unit (ReLU). This guide uses it:
# Hidden level
hidden_1 = tf.nn.relu(tf.add(tf.matmul(X, W_hidden_1), bias_hidden_1)) hidden_2 = tf.nn.relu(tf.add(tf.matmul(hidden_1, W_hidden_2), bias_hidden_2)) hidden_3 = tf.nn.relu(tf.add(tf.matmul(hidden_2, W_hidden_3), bias_hidden_3)) hidden_4 = tf.nn.relu(tf.add(tf.matmul(hidden_3, W_hidden_4), bias_hidden_4)) # Output level (must be transposed) out = tf.transpose(tf.add(tf.matmul(hidden_4, W_out), bias_out))
The image below illustrates the architecture of the network. The model consists of three main blocks. Input data level, hidden levels, and output level. This infrastructure is called the feed-forward network. This means that chunks of data move strictly from left to right in the structure. In other implementations, for example, in the case of recurrent neural networks, data can flow inside the network in different directions.
Image 3
Cost function
The use of the network cost function is to generate an estimate of the deviation between network predictions and actual observations at the time of training. To solve regression problems, the mean squared error (MSE) function is used. This function calculates the standard deviation between predictions and targets, but in general, any differentiable function can be used to calculate the deviation between.
# Cost function
mse = tf.reduce_mean(tf.squared_difference(out, Y))
In doing so, MSE displays specific entities that are useful for solving a general optimization problem.
Optimizer
The optimizer takes care of the necessary calculations required to adapt the weights and variable deviations of the neural network during training. These calculations lead to the calculation of the so-called gradients, which indicate the direction of the necessary changes in the deviations and weights to minimize the cost function. The development of a stable and fast optimizer is one of the main tasks of the creators of neural networks.
# Optimizer
opt = tf.train.AdamOptimizer().minimize(mse)
In this case, one of the most common machine learning optimizers, Adam Optimizer, is used. Adam is an acronym for Adaptive Moment Estimation and is a cross between the other two popular optimizers AdaGrad and RMSProp.
Initializers
Initializers are used to initialize variables before starting training. Since neural networks are trained using numerical optimization techniques, the starting point of an optimization problem is one of the most important factors in finding a good solution. There are various initializers in TensorFlow, each of which takes a different approach. This tutorial uses tf.variance_scaling_initializer (), which implements one of the standard initialization strategies.
# Initializers
sigma = 1 weight_initializer = tf.variance_scaling_initializer(mode="fan_avg", distribution="uniform", scale=sigma) bias_initializer = tf.zeros_initializer()
Note: TensorFlow can define multiple initialization functions for different variables within the graph. However, in most cases, uniform initialization is sufficient.
Setting up a neural network
Finally, the model needs to be trained, and this is usually done using a mini-batch training approach. During such training, random data samples of size n = batch_size are selected from the training dataset and loaded into the neural network. The training dataset is divided into n / batch_size chunks, which are then sequentially sent to the network. At this point, placeholders X and Y come into play. They store the input and target data and send it to the neural network.
Sampled data X travels through the network until it reaches the output level. In the current “run” TensorFlow compares the model-generated predictions with the actually observed one’s Y targets. After that, TensorFlow performs the optimization stage and updates the network parameters, after updating the weights and deviations, the process is repeated again for a new piece of data. The procedure is repeated until all the “sliced” pieces of data are sent to the neural network. The complete cycle of such processing is called an “epoch”.
The network training stops when the maximum number of epochs is reached or when another predefined stopping criterion is triggered.
# Create session
net = tf.Session () # Running the initializer net.run (tf.global_variables_initializer ()) # Setting up an interactive chart plt.ion () fig = plt.figure() ax1 = fig.add_subplot(111) line1, = ax1.plot(y_test) line2, = ax1.plot(y_test*0.5) plt.show()
# The number of epochs and the size of the data chunk
epochs = 10 batch_size = 256 for e in range(epochs): # Shuffling data for training shuffle_indices = np.random.permutation(np.arange(len(y_train))) X_train = X_train[shuffle_indices] y_train = y_train[shuffle_indices]
# Learning by mini-batch
for i in range(0, len(y_tr) // batch_size): start = i * batch_size batch_x = X_train[start:start + batch_size] batch_y = y_train[start:start + batch_size] # Run optimizer with batch net.run(opt, feed_dict={X: batch_x, Y: batch_y})
# Show progress
if np.mod(i, 5) == 0: # Prediction pred = net.run(out, feed_dict={X: X_test}) line2.set_ydata(pred) plt.tit('Epoch ' + str(e) + ', Batch ' + str(i)) f_name = 'img' + str(e) + '_batch_' + str(i) + '.jpg' plt.savefig(file_name) plt.pause(0.01)
# Output the final MSE function after training
mse_final = net.run(mse, feed_dict={X: X_test, Y: y_test}) print(mse_final)
In the course of training, the predictions generated by the network on the test set were evaluated, then visualization was carried out. In addition, the images were uploaded to disk, and later a video animation of the learning process was created from them:
As you can see, the neural network quickly adapts to the basic form of the time series and continues to search for the best data patterns. After 10 epochs have passed, we get results that are very close to the test data. The final value of the MSE function is 0.00078 (a very small value due to the targets being scaled). The average absolute percentage forecast error on the test set is 5.31% – a very good result. It is important to understand that this is just a coincidence with test data, not real data.
Image 4
Scatter plot between predicted and real prices of the S&P
CONCLUSION
This result can be further improved in many ways, from working out the levels and neurons to choosing other schemes of initialization and activation. In addition, various types of deep learning models such as recurrent neural networks can be used – this can also lead to better results.
References
Image 1 – https://www.programmersought.com/images/67/c6bc6001c81422682c8e76284365ef73.JPEG
Image 2 – https://www.programmersought.com/images/67/c6bc6001c81422682c8e76284365ef73.JPEG
Image 3 – https://www.programmersought.com/images/67/c6bc6001c81422682c8e76284365ef73.JPEG
Image 4 – https://www.programmersought.com/images/67/c6bc6001c81422682c8e76284365ef73.JPEG
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
Source: https://www.analyticsvidhya.com/blog/2021/09/predicting-stock-prices-with-tensorflow/
- '
- 9
- Absolute
- adapts
- All
- analytics
- animation
- April
- architecture
- article
- articles
- AUGUST
- Basics
- BEST
- care
- cases
- change
- code
- Common
- Containers
- continues
- Creating
- creators
- Current
- data
- deep learning
- Development
- Dimension
- Drop
- execution
- FAST
- Fig
- First
- fit
- flow
- follow
- form
- format
- Framework
- function
- future
- General
- good
- great
- guide
- here
- How
- How To
- HTTPS
- image
- importing
- index
- information
- Infrastructure
- interactive
- IT
- July
- lead
- learning
- Level
- level 4
- Library
- Line
- machine learning
- Media
- model
- Most Popular
- move
- moves
- Near
- net
- network
- networks
- Neural
- neural network
- neural networks
- numbers
- Operations
- order
- Other
- Pattern
- picture
- Popular
- prediction
- Predictions
- price
- Product
- project
- Python
- regression
- Results
- Run
- running
- S&P 500
- Scale
- scaling
- Science
- Search
- selected
- Series
- set
- setting
- Simple
- Size
- small
- So
- SOLVE
- split
- square
- Stage
- start
- statistics
- stock
- Stocks
- store
- system
- Target
- tensorflow
- test
- Testing
- The Basics
- The Graph
- time
- Training
- Transformation
- tutorial
- Updates
- users
- value
- Video
- visualization
- volume
- within
- working out
- X