STOCK MARKET PREDICTION USING NEURAL NETWORKS

An example for time-series prediction

by Dr. Valentin Steinhauer

 

Short description

 

Time series prediction plays a big role in economics. The stock market courses, as well as the consumption of energy can be predicted to be able to make decisions. This tutorial shows one possible approach how neural networks can be used for this kind of prediction. It extends the Neuroph tutorial called "Time Series Prediction", that gives a good theoretical base for prediction. To show how it works, we trained the network with the DAX (German stock index) data – for a month (03.2009: from 02th to 30) - to predict the value at 31.03.2009. As a strategy we take the sequences from 4 days to predict each 5th day. In the training set 5th day is the supervised value. The data DAX can be downloaded from the following url (one of the possibilities):  http://download.finance.yahoo.com/d/quotes.csv?s=^GDAXI&f=sl1d1t1c1ohgv&e=.cs

TrainingSet Generator (StockFileReader, StockSocketReader and TrainingData) is available for download as a part of NetBeans project, however it is not integrated in the main program to simplify the source code. Test dataset:

double[ ][ ] days = {{2,3,2009,3710.07}, {3,3,2009,3690.72}, {4,3,2009,3890.94}, {5,3,2009,3695.49}, {6,3,2009,3666.41}, {9,3,2009,3692.03}, {10,3,2009,3886.98}, {11,3,2009,3914.1}, {12,3,2009,3956.22}, {13,3,2009,3953.6}, {16,3,2009,4044.54}, {17,3,2009,3987.77}, {18,3,2009,3996.32}, {19,3,2009,4043.46}, {20,3,2009,4068.74}, {23,3,2009,4176.37}, {24,3,2009,4187.36}, {25,3,2009,4223.29}, {26,3,2009,4259.37}, {27,3,2009,4203.55}, {30,3,2009,3989.23}, {31,3,2009,4084.76}};

 

Each of the first 3 values in every record shows the date for DAX level. The last value in the records is DAX level. The next is the normalization of the training data in area (0-1). The following formula offers it in two steps:

  1. To find the max value of DAX : maxDax = max(days [k], k =0, days.length-1))
  2. To calculate normalized values:
    daxnorm [i] = (days [i] [3] / maxDax)*0.8+0.1, where 0.8 and 0.1 will be used to avoid the very small (0.0...) and very big (0.9999) values. We have carried out a simplification, have simply divided on 10000.

Next, the network topology is defined: what type of network, how many layers and how many neurons per layer are used. Actually, there is no rule for this, and usually it is determined experimentaly. However the common type of network used for prediction is a multi layer perceptron. A recommendation is to have 2n+1 nodes for hidden-layer, where n is the number of the input nodes. The output layer has only one node in this case. The good results were obtained with the following topology and parameter set: maxIteration=10000, learningRate=0.7,maxerror=0.0001 and the training set is organized  as follows:

 

 

TrainingSet trainingSet = new TrainingSet();
double[] in = new double[4];
double[] out = new double[1];
for (int i = 0; i < daxnorm.length - 5; i++) {
for (int j = i; j < i + 4; j++) {
in[j - i] = daxnorm[j]; }
out[0] = daxnorm[i + 4];
trainingSet.addElement(new SupervisedTrainingElement( in, out ));
}

3710,07 3690,72 3890,94 3695,49 3666,41
3690,72 3890,94 3695,49 3666,41 3692,03
3890,94 3695,49 3666,41 3692,03 3886,98
3695,49 3666,41 3692,03 3886,98 3914,10
3666,41 3692,03 3886,98 3914,10 3956,22
3692,03 3886,98 3914,10 3956,22 3953,60
3886,98 3914,10 3956,22 3953,60 4044,54
3914,10 3956,22 3953,60 4044,54 3987,77
3956,22 3953,60 4044,54 3987,77 3996,32
3953,60 4044,54 3987,77 3996,32 4043,46
4044,54 3987,77 3996,32 4043,46 4068,74
3987,77 3996,32 4043,46 4068,74 4176,37
3996,32 4043,46 4068,74 4176,37 4187,36
4043,46 4068,74 4176,37 4187,36 4223,29
4068,74 4176,37 4187,36 4223,29 4259,37
4176,37 4187,36 4223,29 4259,37 4203,55
4187,36 4223,29 4259,37 4203,55 3989,23

 

At this point we are ready to train and test the network. For testing we'll use prepared data set in which the DAX data are given from the 27,28,29 and 30.03.09 to predict the value at 31.03.09.

 

neuralNet.learnInSameThread(trainingSet);
TrainingSet testSet = new TrainingSet();
testSet.addElement(new TrainingElement(new double[]{4223.0D / 10000.0D, 4259.0D / 10000.0D, 4203.0D / 10000.0D, 3989.0D / 10000.0D}));
for (TrainingElement testElement : testSet.trainingElements())
{
  neuralNet.setInput(testElement.getInput());
  neuralNet.calculate();
  Vector<Double> networkOutput = neuralNet.getOutput();
}

 

Since the network is initialised with random weight values, the test results will differ from a calculation to calculation. After five tests it came out with the following prediction - results for 03.31.2009: 4084.61; 4081.28; 4073.08; 4075.22; 4087.42.
That is so called a committee - a collection of different neural networks, that together present the example. It gives a much better result compared to other neural networks procedures. The value which was official announced on that day is 4084.76. We are far from the usable result, although the calculations may look good with Neuroph allready. Good results were also obtained with Neuroph package in several other marketing predictions

The next step in direction of obtaining better quantative results is changing the sequencie of calculations, which we carried out in previous example. We can use concurrent calculations to create the committee. The committee tends not only to a stability but it also allows an effective relative control of training conditions. Relative scattering of the results from committee is the figure of merit in this case. To create the concurrency we used the jetlang package. The next table was produced with 10 "members" of the committee.

 

Topology Max Error Learning Rate Scattering % Predicted value Max Iterations
4,2,1 0,0001 0,6 0,04 4029 10000
4,3,1 0,0001 0,6 0,06 4041 10000
4,4,1 0,0001 0,6 0,08 4047 10000
4,9,1 0,0001 0,6 0,15 4084 10000
4,15,1 0,0001 0,6 0,09 4123 10000
4,31,1 0,0001 0,6 0,03 4145 10000
The value with 0,15% is to be interpreted in such a way, as a maximum sensitivity of the network (multilayer perceptron) to given training set. The network topology is 4 input neurons, 9 hidden neurons and 1 output neuron.

 

 

Source Code

 

Sources for this tutorial is available at package

org.neuroph.samples.stockmarket

or download complete NetBeans projects from

http://neuroph.sourceforge.net/samples/StockMarketPrediction.zip
http://neuroph.sourceforge.net/samples/StockMarketCommittee.zip

 

See also:

Time Series Prediction Tutorial
Multi Layer Perceptron Tutorial