Multivariate Natural Evolution Strategy The Next CEO of Stack OverflowApplying Darwinian evolution to programmingEvolution strategy with individual stepsizesDoes Python have a built in function for string natural sort?Explain the Differential Evolution methodWhat are the differences between genetic algorithms and evolution strategies?Evolution StrategiesHow is reproduce processing in (μ,λ) evolution strategy algorithm?How is the equation in “Evolution Strategies as a Scalable Alternative to Reinforcement Learning” derived?Get evolution log from a pygmo archipelagoDifference between Evolutionary Strategies and Reinforcement Learning?

How do I go from 300 unfinished/half written blog posts, to published posts?

Bold, vivid family

How do I make a variable always equal to the result of some calculations?

How to solve a differential equation with a term to a power?

How powerful is the invisibility granted by the Gloom Stalker ranger's Umbral Sight feature?

Return the Closest Prime Number

Interfacing a button to MCU (and PC) with 50m long cable

What benefits would be gained by using human laborers instead of drones in deep sea mining?

Elegant way to replace substring in a regex with optional groups in Python?

What was the first Unix version to run on a microcomputer?

Are there any unintended negative consequences to allowing PCs to gain multiple levels at once in a short milestone-XP game?

Are there any limitations on attacking while grappling?

What happened in Rome, when the western empire "fell"?

Won the lottery - how do I keep the money?

Does it take more energy to get to Venus or to Mars?

How to start emacs in "nothing" mode (`fundamental-mode`)

Novel about a guy who is possessed by the divine essence and the world ends?

How did people program for Consoles with multiple CPUs?

Why didn't Khan get resurrected in the Genesis Explosion?

Make solar eclipses exceedingly rare, but still have new moons

Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis

If a black hole is created from light, can this black hole then move at speed of light?

How to count occurrences of text in a file?

Calculus II Question

Multivariate Natural Evolution Strategy

The Next CEO of Stack OverflowApplying Darwinian evolution to programmingEvolution strategy with individual stepsizesDoes Python have a built in function for string natural sort?Explain the Differential Evolution methodWhat are the differences between genetic algorithms and evolution strategies?Evolution StrategiesHow is reproduce processing in (μ,λ) evolution strategy algorithm?How is the equation in “Evolution Strategies as a Scalable Alternative to Reinforcement Learning” derived?Get evolution log from a pygmo archipelagoDifference between Evolutionary Strategies and Reinforcement Learning?

I've just started to play around with Reinforcement Learning these days and I found the Natural Evolution Strategy, I kind of understand how it works, but I'm very new with Python and I found this code which basically implements the NES algorithm

https://github.com/huseinzol05/Stock-Prediction-Models/blob/master/agent/updated-NES-google.ipynb

import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns
import random
sns.set()

# CSV containing the TSLA stock predictions in the form of
# [Date, Open, High, Low, Close, Adj Close, Volume] from
# Yahoo! Finance
df = pd.read_csv('TSLA.csv')
df.head()


def get_state(data, t, n):
 d = t - n + 1
 block = data[d : t + 1] if d >= 0 else -d * [data[0]] + data[0 : t + 1]
 res = []
 for i in range(n - 1):
 res.append(block[i + 1] - block[i])
 return np.array([res])

close = df.Close.values.tolist()
window_size = 30
skip = 1
l = len(close) - 1

class Deep_Evolution_Strategy:

 inputs = None

 def __init__(
 self, weights, reward_function, population_size, sigma, learning_rate
 ):
 self.weights = weights
 self.reward_function = reward_function
 self.population_size = population_size
 self.sigma = sigma
 self.learning_rate = learning_rate

 def _get_weight_from_population(self, weights, population):
 weights_population = []
 for index, i in enumerate(population):
 jittered = self.sigma * i
 weights_population.append(weights[index] + jittered)
 return weights_population

 def get_weights(self):
 return self.weights

 def train(self, epoch = 100, print_every = 1):
 lasttime = time.time()
 for i in range(epoch):
 population = []
 rewards = np.zeros(self.population_size)
 for k in range(self.population_size):
 x = []
 for w in self.weights:
 x.append(np.random.randn(*w.shape))
 population.append(x)
 for k in range(self.population_size):
 weights_population = self._get_weight_from_population(self.weights, population[k])
 rewards[k] = self.reward_function(weights_population)
 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
 for index, w in enumerate(self.weights):
 A = np.array([p[index] for p in population])
 self.weights[index] = (
 w
 + self.learning_rate
 / (self.population_size * self.sigma)
 * np.dot(A.T, rewards).T
 )


class Model:
 def __init__(self, input_size, layer_size, output_size):
 self.weights = [
 np.random.randn(input_size, layer_size),
 np.random.randn(layer_size, output_size),
 np.random.randn(layer_size, 1),
 np.random.randn(1, layer_size),
 ]

 def predict(self, inputs):
 feed = np.dot(inputs, self.weights[0]) + self.weights[-1]
 decision = np.dot(feed, self.weights[1])
 buy = np.dot(feed, self.weights[2])
 return decision, buy

 def get_weights(self):
 return self.weights

 def set_weights(self, weights):
 self.weights = weights


class Agent:

 POPULATION_SIZE = 15
 SIGMA = 0.1
 LEARNING_RATE = 0.03

 def __init__(self, model, money, max_buy, max_sell):
 self.model = model
 self.initial_money = money
 self.max_buy = max_buy
 self.max_sell = max_sell
 self.es = Deep_Evolution_Strategy(
 self.model.get_weights(),
 self.get_reward,
 self.POPULATION_SIZE,
 self.SIGMA,
 self.LEARNING_RATE,
 )

 def act(self, sequence):
 decision, buy = self.model.predict(np.array(sequence))
 return np.argmax(decision[0]), int(buy[0])

 def get_reward(self, weights):
 initial_money = self.initial_money
 starting_money = initial_money
 self.model.weights = weights
 state = get_state(close, 0, window_size + 1)
 inventory = []
 quantity = 0
 for t in range(0, l, skip):
 action, buy = self.act(state)
 next_state = get_state(close, t + 1, window_size + 1)
 if action == 1 and initial_money >= close[t]:
 if buy < 0:
 buy = 1
 if buy > self.max_buy:
 buy_units = self.max_buy
 else:
 buy_units = buy
 total_buy = buy_units * close[t]
 initial_money -= total_buy
 inventory.append(total_buy)
 quantity += buy_units
 elif action == 2 and len(inventory) > 0:
 if quantity > self.max_sell:
 sell_units = self.max_sell
 else:
 sell_units = quantity
 quantity -= sell_units
 total_sell = sell_units * close[t]
 initial_money += total_sell

 state = next_state
 return ((initial_money - starting_money) / starting_money) * 100

 def fit(self, iterations, checkpoint):
 self.es.train(iterations, print_every = checkpoint)

 def buy(self):
 initial_money = self.initial_money
 state = get_state(close, 0, window_size + 1)
 starting_money = initial_money
 states_sell = []
 states_buy = []
 inventory = []
 quantity = 0
 for t in range(0, l, skip):
 action, buy = self.act(state)
 next_state = get_state(close, t + 1, window_size + 1)
 if action == 1 and initial_money >= close[t]:
 if buy < 0:
 buy = 1
 if buy > self.max_buy:
 buy_units = self.max_buy
 else:
 buy_units = buy
 total_buy = buy_units * close[t]
 initial_money -= total_buy
 inventory.append(total_buy)
 quantity += buy_units
 states_buy.append(t)
 elif action == 2 and len(inventory) > 0:
 bought_price = inventory.pop(0)
 if quantity > self.max_sell:
 sell_units = self.max_sell
 else:
 sell_units = quantity
 if sell_units < 1:
 continue
 quantity -= sell_units
 total_sell = sell_units * close[t]
 initial_money += total_sell
 states_sell.append(t)
 try:
 invest = ((total_sell - bought_price) / bought_price) * 100
 except:
 invest = 0
 state = next_state

 invest = ((initial_money - starting_money) / starting_money) * 100

model = Model(window_size, 500, 3)
agent = Agent(model, 10000, 5, 5)
agent.fit(500, 10)
agent.buy()

As you can see, it is being used for stock prediction and it only uses the Close column, but I would like to try it with more parameters, let's say High and Low.

I'm struggling when I need to change it to use this 2 dimensional list. I've tried a simple change:

close = df.loc[:,['Close','Open']].values.tolist()

Which adds one more property at every row of the list. But when I run the code I start to see errors when I execute the agent.fit() call:

agent.fit(iterations = 500, checkpoint = 10)

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-225-d97697984016> in <module>()
----> 1 agent.fit(iterations = 500, checkpoint = 10)

<ipython-input-223-35d9fbba5756> in fit(self, iterations, checkpoint)
 66 
 67 def fit(self, iterations, checkpoint):
---> 68 self.es.train(iterations, print_every = checkpoint)
 69 
 70 def buy(self):

<ipython-input-220-84ca345091f4> in train(self, epoch, print_every)
 33 self.weights, population[k]
 34 )
---> 35 rewards[k] = self.reward_function(weights_population)
 36 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
 37 

<ipython-input-223-35d9fbba5756> in get_reward(self, weights)
 36 
 37 self.model.weights = weights
---> 38 state = get_state(self.close, 0, self.window_size + 1)
 39 inventory = []
 40 quantity = 0

<ipython-input-219-0df8d8be24a9> in get_state(data, t, n)
 4 res = []
 5 for i in range(n - 1):
----> 6 res.append(block[i + 1] - block[i])
 7 return np.array([res])

TypeError: unsupported operand type(s) for -: 'list' and 'list'

I assume that the first step is that I need to update my Model class to use a different input_size parameter right?

Any help would be appreciated! Thanks

edited Mar 8 at 16:22

asked Mar 8 at 14:30

lucaswerner

547

add a comment |

https://github.com/huseinzol05/Stock-Prediction-Models/blob/master/agent/updated-NES-google.ipynb

import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns
import random
sns.set()

# CSV containing the TSLA stock predictions in the form of
# [Date, Open, High, Low, Close, Adj Close, Volume] from
# Yahoo! Finance
df = pd.read_csv('TSLA.csv')
df.head()


def get_state(data, t, n):
 d = t - n + 1
 block = data[d : t + 1] if d >= 0 else -d * [data[0]] + data[0 : t + 1]
 res = []
 for i in range(n - 1):
 res.append(block[i + 1] - block[i])
 return np.array([res])

close = df.Close.values.tolist()
window_size = 30
skip = 1
l = len(close) - 1

class Deep_Evolution_Strategy:

 inputs = None

 def __init__(
 self, weights, reward_function, population_size, sigma, learning_rate
 ):
 self.weights = weights
 self.reward_function = reward_function
 self.population_size = population_size
 self.sigma = sigma
 self.learning_rate = learning_rate

 def _get_weight_from_population(self, weights, population):
 weights_population = []
 for index, i in enumerate(population):
 jittered = self.sigma * i
 weights_population.append(weights[index] + jittered)
 return weights_population

 def get_weights(self):
 return self.weights

 def train(self, epoch = 100, print_every = 1):
 lasttime = time.time()
 for i in range(epoch):
 population = []
 rewards = np.zeros(self.population_size)
 for k in range(self.population_size):
 x = []
 for w in self.weights:
 x.append(np.random.randn(*w.shape))
 population.append(x)
 for k in range(self.population_size):
 weights_population = self._get_weight_from_population(self.weights, population[k])
 rewards[k] = self.reward_function(weights_population)
 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
 for index, w in enumerate(self.weights):
 A = np.array([p[index] for p in population])
 self.weights[index] = (
 w
 + self.learning_rate
 / (self.population_size * self.sigma)
 * np.dot(A.T, rewards).T
 )


class Model:
 def __init__(self, input_size, layer_size, output_size):
 self.weights = [
 np.random.randn(input_size, layer_size),
 np.random.randn(layer_size, output_size),
 np.random.randn(layer_size, 1),
 np.random.randn(1, layer_size),
 ]

 def predict(self, inputs):
 feed = np.dot(inputs, self.weights[0]) + self.weights[-1]
 decision = np.dot(feed, self.weights[1])
 buy = np.dot(feed, self.weights[2])
 return decision, buy

 def get_weights(self):
 return self.weights

 def set_weights(self, weights):
 self.weights = weights


class Agent:

 POPULATION_SIZE = 15
 SIGMA = 0.1
 LEARNING_RATE = 0.03

 def __init__(self, model, money, max_buy, max_sell):
 self.model = model
 self.initial_money = money
 self.max_buy = max_buy
 self.max_sell = max_sell
 self.es = Deep_Evolution_Strategy(
 self.model.get_weights(),
 self.get_reward,
 self.POPULATION_SIZE,
 self.SIGMA,
 self.LEARNING_RATE,
 )

 def act(self, sequence):
 decision, buy = self.model.predict(np.array(sequence))
 return np.argmax(decision[0]), int(buy[0])

 def get_reward(self, weights):
 initial_money = self.initial_money
 starting_money = initial_money
 self.model.weights = weights
 state = get_state(close, 0, window_size + 1)
 inventory = []
 quantity = 0
 for t in range(0, l, skip):
 action, buy = self.act(state)
 next_state = get_state(close, t + 1, window_size + 1)
 if action == 1 and initial_money >= close[t]:
 if buy < 0:
 buy = 1
 if buy > self.max_buy:
 buy_units = self.max_buy
 else:
 buy_units = buy
 total_buy = buy_units * close[t]
 initial_money -= total_buy
 inventory.append(total_buy)
 quantity += buy_units
 elif action == 2 and len(inventory) > 0:
 if quantity > self.max_sell:
 sell_units = self.max_sell
 else:
 sell_units = quantity
 quantity -= sell_units
 total_sell = sell_units * close[t]
 initial_money += total_sell

 state = next_state
 return ((initial_money - starting_money) / starting_money) * 100

 def fit(self, iterations, checkpoint):
 self.es.train(iterations, print_every = checkpoint)

 def buy(self):
 initial_money = self.initial_money
 state = get_state(close, 0, window_size + 1)
 starting_money = initial_money
 states_sell = []
 states_buy = []
 inventory = []
 quantity = 0
 for t in range(0, l, skip):
 action, buy = self.act(state)
 next_state = get_state(close, t + 1, window_size + 1)
 if action == 1 and initial_money >= close[t]:
 if buy < 0:
 buy = 1
 if buy > self.max_buy:
 buy_units = self.max_buy
 else:
 buy_units = buy
 total_buy = buy_units * close[t]
 initial_money -= total_buy
 inventory.append(total_buy)
 quantity += buy_units
 states_buy.append(t)
 elif action == 2 and len(inventory) > 0:
 bought_price = inventory.pop(0)
 if quantity > self.max_sell:
 sell_units = self.max_sell
 else:
 sell_units = quantity
 if sell_units < 1:
 continue
 quantity -= sell_units
 total_sell = sell_units * close[t]
 initial_money += total_sell
 states_sell.append(t)
 try:
 invest = ((total_sell - bought_price) / bought_price) * 100
 except:
 invest = 0
 state = next_state

 invest = ((initial_money - starting_money) / starting_money) * 100

model = Model(window_size, 500, 3)
agent = Agent(model, 10000, 5, 5)
agent.fit(500, 10)
agent.buy()

As you can see, it is being used for stock prediction and it only uses the Close column, but I would like to try it with more parameters, let's say High and Low.

I'm struggling when I need to change it to use this 2 dimensional list. I've tried a simple change:

close = df.loc[:,['Close','Open']].values.tolist()

Which adds one more property at every row of the list. But when I run the code I start to see errors when I execute the agent.fit() call:

agent.fit(iterations = 500, checkpoint = 10)

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-225-d97697984016> in <module>()
----> 1 agent.fit(iterations = 500, checkpoint = 10)

<ipython-input-223-35d9fbba5756> in fit(self, iterations, checkpoint)
 66 
 67 def fit(self, iterations, checkpoint):
---> 68 self.es.train(iterations, print_every = checkpoint)
 69 
 70 def buy(self):

<ipython-input-220-84ca345091f4> in train(self, epoch, print_every)
 33 self.weights, population[k]
 34 )
---> 35 rewards[k] = self.reward_function(weights_population)
 36 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
 37 

<ipython-input-223-35d9fbba5756> in get_reward(self, weights)
 36 
 37 self.model.weights = weights
---> 38 state = get_state(self.close, 0, self.window_size + 1)
 39 inventory = []
 40 quantity = 0

<ipython-input-219-0df8d8be24a9> in get_state(data, t, n)
 4 res = []
 5 for i in range(n - 1):
----> 6 res.append(block[i + 1] - block[i])
 7 return np.array([res])

TypeError: unsupported operand type(s) for -: 'list' and 'list'

I assume that the first step is that I need to update my Model class to use a different input_size parameter right?

Any help would be appreciated! Thanks

edited Mar 8 at 16:22

asked Mar 8 at 14:30

lucaswerner

547

add a comment |

https://github.com/huseinzol05/Stock-Prediction-Models/blob/master/agent/updated-NES-google.ipynb

import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns
import random
sns.set()

# CSV containing the TSLA stock predictions in the form of
# [Date, Open, High, Low, Close, Adj Close, Volume] from
# Yahoo! Finance
df = pd.read_csv('TSLA.csv')
df.head()


def get_state(data, t, n):
 d = t - n + 1
 block = data[d : t + 1] if d >= 0 else -d * [data[0]] + data[0 : t + 1]
 res = []
 for i in range(n - 1):
 res.append(block[i + 1] - block[i])
 return np.array([res])

close = df.Close.values.tolist()
window_size = 30
skip = 1
l = len(close) - 1

class Deep_Evolution_Strategy:

 inputs = None

 def __init__(
 self, weights, reward_function, population_size, sigma, learning_rate
 ):
 self.weights = weights
 self.reward_function = reward_function
 self.population_size = population_size
 self.sigma = sigma
 self.learning_rate = learning_rate

 def _get_weight_from_population(self, weights, population):
 weights_population = []
 for index, i in enumerate(population):
 jittered = self.sigma * i
 weights_population.append(weights[index] + jittered)
 return weights_population

 def get_weights(self):
 return self.weights

 def train(self, epoch = 100, print_every = 1):
 lasttime = time.time()
 for i in range(epoch):
 population = []
 rewards = np.zeros(self.population_size)
 for k in range(self.population_size):
 x = []
 for w in self.weights:
 x.append(np.random.randn(*w.shape))
 population.append(x)
 for k in range(self.population_size):
 weights_population = self._get_weight_from_population(self.weights, population[k])
 rewards[k] = self.reward_function(weights_population)
 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
 for index, w in enumerate(self.weights):
 A = np.array([p[index] for p in population])
 self.weights[index] = (
 w
 + self.learning_rate
 / (self.population_size * self.sigma)
 * np.dot(A.T, rewards).T
 )


class Model:
 def __init__(self, input_size, layer_size, output_size):
 self.weights = [
 np.random.randn(input_size, layer_size),
 np.random.randn(layer_size, output_size),
 np.random.randn(layer_size, 1),
 np.random.randn(1, layer_size),
 ]

 def predict(self, inputs):
 feed = np.dot(inputs, self.weights[0]) + self.weights[-1]
 decision = np.dot(feed, self.weights[1])
 buy = np.dot(feed, self.weights[2])
 return decision, buy

 def get_weights(self):
 return self.weights

 def set_weights(self, weights):
 self.weights = weights


class Agent:

 POPULATION_SIZE = 15
 SIGMA = 0.1
 LEARNING_RATE = 0.03

 def __init__(self, model, money, max_buy, max_sell):
 self.model = model
 self.initial_money = money
 self.max_buy = max_buy
 self.max_sell = max_sell
 self.es = Deep_Evolution_Strategy(
 self.model.get_weights(),
 self.get_reward,
 self.POPULATION_SIZE,
 self.SIGMA,
 self.LEARNING_RATE,
 )

 def act(self, sequence):
 decision, buy = self.model.predict(np.array(sequence))
 return np.argmax(decision[0]), int(buy[0])

 def get_reward(self, weights):
 initial_money = self.initial_money
 starting_money = initial_money
 self.model.weights = weights
 state = get_state(close, 0, window_size + 1)
 inventory = []
 quantity = 0
 for t in range(0, l, skip):
 action, buy = self.act(state)
 next_state = get_state(close, t + 1, window_size + 1)
 if action == 1 and initial_money >= close[t]:
 if buy < 0:
 buy = 1
 if buy > self.max_buy:
 buy_units = self.max_buy
 else:
 buy_units = buy
 total_buy = buy_units * close[t]
 initial_money -= total_buy
 inventory.append(total_buy)
 quantity += buy_units
 elif action == 2 and len(inventory) > 0:
 if quantity > self.max_sell:
 sell_units = self.max_sell
 else:
 sell_units = quantity
 quantity -= sell_units
 total_sell = sell_units * close[t]
 initial_money += total_sell

 state = next_state
 return ((initial_money - starting_money) / starting_money) * 100

 def fit(self, iterations, checkpoint):
 self.es.train(iterations, print_every = checkpoint)

 def buy(self):
 initial_money = self.initial_money
 state = get_state(close, 0, window_size + 1)
 starting_money = initial_money
 states_sell = []
 states_buy = []
 inventory = []
 quantity = 0
 for t in range(0, l, skip):
 action, buy = self.act(state)
 next_state = get_state(close, t + 1, window_size + 1)
 if action == 1 and initial_money >= close[t]:
 if buy < 0:
 buy = 1
 if buy > self.max_buy:
 buy_units = self.max_buy
 else:
 buy_units = buy
 total_buy = buy_units * close[t]
 initial_money -= total_buy
 inventory.append(total_buy)
 quantity += buy_units
 states_buy.append(t)
 elif action == 2 and len(inventory) > 0:
 bought_price = inventory.pop(0)
 if quantity > self.max_sell:
 sell_units = self.max_sell
 else:
 sell_units = quantity
 if sell_units < 1:
 continue
 quantity -= sell_units
 total_sell = sell_units * close[t]
 initial_money += total_sell
 states_sell.append(t)
 try:
 invest = ((total_sell - bought_price) / bought_price) * 100
 except:
 invest = 0
 state = next_state

 invest = ((initial_money - starting_money) / starting_money) * 100

model = Model(window_size, 500, 3)
agent = Agent(model, 10000, 5, 5)
agent.fit(500, 10)
agent.buy()

As you can see, it is being used for stock prediction and it only uses the Close column, but I would like to try it with more parameters, let's say High and Low.

I'm struggling when I need to change it to use this 2 dimensional list. I've tried a simple change:

close = df.loc[:,['Close','Open']].values.tolist()

Which adds one more property at every row of the list. But when I run the code I start to see errors when I execute the agent.fit() call:

agent.fit(iterations = 500, checkpoint = 10)

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-225-d97697984016> in <module>()
----> 1 agent.fit(iterations = 500, checkpoint = 10)

<ipython-input-223-35d9fbba5756> in fit(self, iterations, checkpoint)
 66 
 67 def fit(self, iterations, checkpoint):
---> 68 self.es.train(iterations, print_every = checkpoint)
 69 
 70 def buy(self):

<ipython-input-220-84ca345091f4> in train(self, epoch, print_every)
 33 self.weights, population[k]
 34 )
---> 35 rewards[k] = self.reward_function(weights_population)
 36 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
 37 

<ipython-input-223-35d9fbba5756> in get_reward(self, weights)
 36 
 37 self.model.weights = weights
---> 38 state = get_state(self.close, 0, self.window_size + 1)
 39 inventory = []
 40 quantity = 0

<ipython-input-219-0df8d8be24a9> in get_state(data, t, n)
 4 res = []
 5 for i in range(n - 1):
----> 6 res.append(block[i + 1] - block[i])
 7 return np.array([res])

TypeError: unsupported operand type(s) for -: 'list' and 'list'

I assume that the first step is that I need to update my Model class to use a different input_size parameter right?

Any help would be appreciated! Thanks

edited Mar 8 at 16:22

asked Mar 8 at 14:30

lucaswerner

547

https://github.com/huseinzol05/Stock-Prediction-Models/blob/master/agent/updated-NES-google.ipynb

import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns
import random
sns.set()

# CSV containing the TSLA stock predictions in the form of
# [Date, Open, High, Low, Close, Adj Close, Volume] from
# Yahoo! Finance
df = pd.read_csv('TSLA.csv')
df.head()


def get_state(data, t, n):
 d = t - n + 1
 block = data[d : t + 1] if d >= 0 else -d * [data[0]] + data[0 : t + 1]
 res = []
 for i in range(n - 1):
 res.append(block[i + 1] - block[i])
 return np.array([res])

close = df.Close.values.tolist()
window_size = 30
skip = 1
l = len(close) - 1

class Deep_Evolution_Strategy:

 inputs = None

 def __init__(
 self, weights, reward_function, population_size, sigma, learning_rate
 ):
 self.weights = weights
 self.reward_function = reward_function
 self.population_size = population_size
 self.sigma = sigma
 self.learning_rate = learning_rate

 def _get_weight_from_population(self, weights, population):
 weights_population = []
 for index, i in enumerate(population):
 jittered = self.sigma * i
 weights_population.append(weights[index] + jittered)
 return weights_population

 def get_weights(self):
 return self.weights

 def train(self, epoch = 100, print_every = 1):
 lasttime = time.time()
 for i in range(epoch):
 population = []
 rewards = np.zeros(self.population_size)
 for k in range(self.population_size):
 x = []
 for w in self.weights:
 x.append(np.random.randn(*w.shape))
 population.append(x)
 for k in range(self.population_size):
 weights_population = self._get_weight_from_population(self.weights, population[k])
 rewards[k] = self.reward_function(weights_population)
 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
 for index, w in enumerate(self.weights):
 A = np.array([p[index] for p in population])
 self.weights[index] = (
 w
 + self.learning_rate
 / (self.population_size * self.sigma)
 * np.dot(A.T, rewards).T
 )


class Model:
 def __init__(self, input_size, layer_size, output_size):
 self.weights = [
 np.random.randn(input_size, layer_size),
 np.random.randn(layer_size, output_size),
 np.random.randn(layer_size, 1),
 np.random.randn(1, layer_size),
 ]

 def predict(self, inputs):
 feed = np.dot(inputs, self.weights[0]) + self.weights[-1]
 decision = np.dot(feed, self.weights[1])
 buy = np.dot(feed, self.weights[2])
 return decision, buy

 def get_weights(self):
 return self.weights

 def set_weights(self, weights):
 self.weights = weights


class Agent:

 POPULATION_SIZE = 15
 SIGMA = 0.1
 LEARNING_RATE = 0.03

 def __init__(self, model, money, max_buy, max_sell):
 self.model = model
 self.initial_money = money
 self.max_buy = max_buy
 self.max_sell = max_sell
 self.es = Deep_Evolution_Strategy(
 self.model.get_weights(),
 self.get_reward,
 self.POPULATION_SIZE,
 self.SIGMA,
 self.LEARNING_RATE,
 )

 def act(self, sequence):
 decision, buy = self.model.predict(np.array(sequence))
 return np.argmax(decision[0]), int(buy[0])

 def get_reward(self, weights):
 initial_money = self.initial_money
 starting_money = initial_money
 self.model.weights = weights
 state = get_state(close, 0, window_size + 1)
 inventory = []
 quantity = 0
 for t in range(0, l, skip):
 action, buy = self.act(state)
 next_state = get_state(close, t + 1, window_size + 1)
 if action == 1 and initial_money >= close[t]:
 if buy < 0:
 buy = 1
 if buy > self.max_buy:
 buy_units = self.max_buy
 else:
 buy_units = buy
 total_buy = buy_units * close[t]
 initial_money -= total_buy
 inventory.append(total_buy)
 quantity += buy_units
 elif action == 2 and len(inventory) > 0:
 if quantity > self.max_sell:
 sell_units = self.max_sell
 else:
 sell_units = quantity
 quantity -= sell_units
 total_sell = sell_units * close[t]
 initial_money += total_sell

 state = next_state
 return ((initial_money - starting_money) / starting_money) * 100

 def fit(self, iterations, checkpoint):
 self.es.train(iterations, print_every = checkpoint)

 def buy(self):
 initial_money = self.initial_money
 state = get_state(close, 0, window_size + 1)
 starting_money = initial_money
 states_sell = []
 states_buy = []
 inventory = []
 quantity = 0
 for t in range(0, l, skip):
 action, buy = self.act(state)
 next_state = get_state(close, t + 1, window_size + 1)
 if action == 1 and initial_money >= close[t]:
 if buy < 0:
 buy = 1
 if buy > self.max_buy:
 buy_units = self.max_buy
 else:
 buy_units = buy
 total_buy = buy_units * close[t]
 initial_money -= total_buy
 inventory.append(total_buy)
 quantity += buy_units
 states_buy.append(t)
 elif action == 2 and len(inventory) > 0:
 bought_price = inventory.pop(0)
 if quantity > self.max_sell:
 sell_units = self.max_sell
 else:
 sell_units = quantity
 if sell_units < 1:
 continue
 quantity -= sell_units
 total_sell = sell_units * close[t]
 initial_money += total_sell
 states_sell.append(t)
 try:
 invest = ((total_sell - bought_price) / bought_price) * 100
 except:
 invest = 0
 state = next_state

 invest = ((initial_money - starting_money) / starting_money) * 100

model = Model(window_size, 500, 3)
agent = Agent(model, 10000, 5, 5)
agent.fit(500, 10)
agent.buy()

As you can see, it is being used for stock prediction and it only uses the Close column, but I would like to try it with more parameters, let's say High and Low.

I'm struggling when I need to change it to use this 2 dimensional list. I've tried a simple change:

close = df.loc[:,['Close','Open']].values.tolist()

Which adds one more property at every row of the list. But when I run the code I start to see errors when I execute the agent.fit() call:

agent.fit(iterations = 500, checkpoint = 10)

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-225-d97697984016> in <module>()
----> 1 agent.fit(iterations = 500, checkpoint = 10)

<ipython-input-223-35d9fbba5756> in fit(self, iterations, checkpoint)
 66 
 67 def fit(self, iterations, checkpoint):
---> 68 self.es.train(iterations, print_every = checkpoint)
 69 
 70 def buy(self):

<ipython-input-220-84ca345091f4> in train(self, epoch, print_every)
 33 self.weights, population[k]
 34 )
---> 35 rewards[k] = self.reward_function(weights_population)
 36 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
 37 

<ipython-input-223-35d9fbba5756> in get_reward(self, weights)
 36 
 37 self.model.weights = weights
---> 38 state = get_state(self.close, 0, self.window_size + 1)
 39 inventory = []
 40 quantity = 0

<ipython-input-219-0df8d8be24a9> in get_state(data, t, n)
 4 res = []
 5 for i in range(n - 1):
----> 6 res.append(block[i + 1] - block[i])
 7 return np.array([res])

TypeError: unsupported operand type(s) for -: 'list' and 'list'

I assume that the first step is that I need to update my Model class to use a different input_size parameter right?

Any help would be appreciated! Thanks

python-3.x deep-learning artificial-intelligence reinforcement-learning evolutionary-algorithm

edited Mar 8 at 16:22

asked Mar 8 at 14:30

lucaswerner

547

edited Mar 8 at 16:22

asked Mar 8 at 14:30

lucaswerner

547

edited Mar 8 at 16:22

asked Mar 8 at 14:30

lucaswerner

547

asked Mar 8 at 14:30

lucaswerner

547

asked Mar 8 at 14:30

lucaswerner

547

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55065285%2fmultivariate-natural-evolution-strategy%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggtcf

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme