Multivariate Natural Evolution Strategy The Next CEO of Stack OverflowApplying Darwinian evolution to programmingEvolution strategy with individual stepsizesDoes Python have a built in function for string natural sort?Explain the Differential Evolution methodWhat are the differences between genetic algorithms and evolution strategies?Evolution StrategiesHow is reproduce processing in (μ,λ) evolution strategy algorithm?How is the equation in “Evolution Strategies as a Scalable Alternative to Reinforcement Learning” derived?Get evolution log from a pygmo archipelagoDifference between Evolutionary Strategies and Reinforcement Learning?
How do I go from 300 unfinished/half written blog posts, to published posts?
Bold, vivid family
How do I make a variable always equal to the result of some calculations?
How to solve a differential equation with a term to a power?
How powerful is the invisibility granted by the Gloom Stalker ranger's Umbral Sight feature?
Return the Closest Prime Number
Interfacing a button to MCU (and PC) with 50m long cable
What benefits would be gained by using human laborers instead of drones in deep sea mining?
Elegant way to replace substring in a regex with optional groups in Python?
What was the first Unix version to run on a microcomputer?
Are there any unintended negative consequences to allowing PCs to gain multiple levels at once in a short milestone-XP game?
Are there any limitations on attacking while grappling?
What happened in Rome, when the western empire "fell"?
Won the lottery - how do I keep the money?
Does it take more energy to get to Venus or to Mars?
How to start emacs in "nothing" mode (`fundamental-mode`)
Novel about a guy who is possessed by the divine essence and the world ends?
How did people program for Consoles with multiple CPUs?
Why didn't Khan get resurrected in the Genesis Explosion?
Make solar eclipses exceedingly rare, but still have new moons
Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis
If a black hole is created from light, can this black hole then move at speed of light?
How to count occurrences of text in a file?
Calculus II Question
Multivariate Natural Evolution Strategy
The Next CEO of Stack OverflowApplying Darwinian evolution to programmingEvolution strategy with individual stepsizesDoes Python have a built in function for string natural sort?Explain the Differential Evolution methodWhat are the differences between genetic algorithms and evolution strategies?Evolution StrategiesHow is reproduce processing in (μ,λ) evolution strategy algorithm?How is the equation in “Evolution Strategies as a Scalable Alternative to Reinforcement Learning” derived?Get evolution log from a pygmo archipelagoDifference between Evolutionary Strategies and Reinforcement Learning?
I've just started to play around with Reinforcement Learning these days and I found the Natural Evolution Strategy, I kind of understand how it works, but I'm very new with Python and I found this code which basically implements the NES algorithm
https://github.com/huseinzol05/Stock-Prediction-Models/blob/master/agent/updated-NES-google.ipynb
import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns
import random
sns.set()
# CSV containing the TSLA stock predictions in the form of
# [Date, Open, High, Low, Close, Adj Close, Volume] from
# Yahoo! Finance
df = pd.read_csv('TSLA.csv')
df.head()
def get_state(data, t, n):
d = t - n + 1
block = data[d : t + 1] if d >= 0 else -d * [data[0]] + data[0 : t + 1]
res = []
for i in range(n - 1):
res.append(block[i + 1] - block[i])
return np.array([res])
close = df.Close.values.tolist()
window_size = 30
skip = 1
l = len(close) - 1
class Deep_Evolution_Strategy:
inputs = None
def __init__(
self, weights, reward_function, population_size, sigma, learning_rate
):
self.weights = weights
self.reward_function = reward_function
self.population_size = population_size
self.sigma = sigma
self.learning_rate = learning_rate
def _get_weight_from_population(self, weights, population):
weights_population = []
for index, i in enumerate(population):
jittered = self.sigma * i
weights_population.append(weights[index] + jittered)
return weights_population
def get_weights(self):
return self.weights
def train(self, epoch = 100, print_every = 1):
lasttime = time.time()
for i in range(epoch):
population = []
rewards = np.zeros(self.population_size)
for k in range(self.population_size):
x = []
for w in self.weights:
x.append(np.random.randn(*w.shape))
population.append(x)
for k in range(self.population_size):
weights_population = self._get_weight_from_population(self.weights, population[k])
rewards[k] = self.reward_function(weights_population)
rewards = (rewards - np.mean(rewards)) / np.std(rewards)
for index, w in enumerate(self.weights):
A = np.array([p[index] for p in population])
self.weights[index] = (
w
+ self.learning_rate
/ (self.population_size * self.sigma)
* np.dot(A.T, rewards).T
)
class Model:
def __init__(self, input_size, layer_size, output_size):
self.weights = [
np.random.randn(input_size, layer_size),
np.random.randn(layer_size, output_size),
np.random.randn(layer_size, 1),
np.random.randn(1, layer_size),
]
def predict(self, inputs):
feed = np.dot(inputs, self.weights[0]) + self.weights[-1]
decision = np.dot(feed, self.weights[1])
buy = np.dot(feed, self.weights[2])
return decision, buy
def get_weights(self):
return self.weights
def set_weights(self, weights):
self.weights = weights
class Agent:
POPULATION_SIZE = 15
SIGMA = 0.1
LEARNING_RATE = 0.03
def __init__(self, model, money, max_buy, max_sell):
self.model = model
self.initial_money = money
self.max_buy = max_buy
self.max_sell = max_sell
self.es = Deep_Evolution_Strategy(
self.model.get_weights(),
self.get_reward,
self.POPULATION_SIZE,
self.SIGMA,
self.LEARNING_RATE,
)
def act(self, sequence):
decision, buy = self.model.predict(np.array(sequence))
return np.argmax(decision[0]), int(buy[0])
def get_reward(self, weights):
initial_money = self.initial_money
starting_money = initial_money
self.model.weights = weights
state = get_state(close, 0, window_size + 1)
inventory = []
quantity = 0
for t in range(0, l, skip):
action, buy = self.act(state)
next_state = get_state(close, t + 1, window_size + 1)
if action == 1 and initial_money >= close[t]:
if buy < 0:
buy = 1
if buy > self.max_buy:
buy_units = self.max_buy
else:
buy_units = buy
total_buy = buy_units * close[t]
initial_money -= total_buy
inventory.append(total_buy)
quantity += buy_units
elif action == 2 and len(inventory) > 0:
if quantity > self.max_sell:
sell_units = self.max_sell
else:
sell_units = quantity
quantity -= sell_units
total_sell = sell_units * close[t]
initial_money += total_sell
state = next_state
return ((initial_money - starting_money) / starting_money) * 100
def fit(self, iterations, checkpoint):
self.es.train(iterations, print_every = checkpoint)
def buy(self):
initial_money = self.initial_money
state = get_state(close, 0, window_size + 1)
starting_money = initial_money
states_sell = []
states_buy = []
inventory = []
quantity = 0
for t in range(0, l, skip):
action, buy = self.act(state)
next_state = get_state(close, t + 1, window_size + 1)
if action == 1 and initial_money >= close[t]:
if buy < 0:
buy = 1
if buy > self.max_buy:
buy_units = self.max_buy
else:
buy_units = buy
total_buy = buy_units * close[t]
initial_money -= total_buy
inventory.append(total_buy)
quantity += buy_units
states_buy.append(t)
elif action == 2 and len(inventory) > 0:
bought_price = inventory.pop(0)
if quantity > self.max_sell:
sell_units = self.max_sell
else:
sell_units = quantity
if sell_units < 1:
continue
quantity -= sell_units
total_sell = sell_units * close[t]
initial_money += total_sell
states_sell.append(t)
try:
invest = ((total_sell - bought_price) / bought_price) * 100
except:
invest = 0
state = next_state
invest = ((initial_money - starting_money) / starting_money) * 100
model = Model(window_size, 500, 3)
agent = Agent(model, 10000, 5, 5)
agent.fit(500, 10)
agent.buy()
As you can see, it is being used for stock prediction and it only uses the Close column, but I would like to try it with more parameters, let's say High and Low.
I'm struggling when I need to change it to use this 2 dimensional list. I've tried a simple change:
close = df.loc[:,['Close','Open']].values.tolist()
Which adds one more property at every row of the list. But when I run the code I start to see errors when I execute the agent.fit() call:
agent.fit(iterations = 500, checkpoint = 10)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-225-d97697984016> in <module>()
----> 1 agent.fit(iterations = 500, checkpoint = 10)
<ipython-input-223-35d9fbba5756> in fit(self, iterations, checkpoint)
66
67 def fit(self, iterations, checkpoint):
---> 68 self.es.train(iterations, print_every = checkpoint)
69
70 def buy(self):
<ipython-input-220-84ca345091f4> in train(self, epoch, print_every)
33 self.weights, population[k]
34 )
---> 35 rewards[k] = self.reward_function(weights_population)
36 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
37
<ipython-input-223-35d9fbba5756> in get_reward(self, weights)
36
37 self.model.weights = weights
---> 38 state = get_state(self.close, 0, self.window_size + 1)
39 inventory = []
40 quantity = 0
<ipython-input-219-0df8d8be24a9> in get_state(data, t, n)
4 res = []
5 for i in range(n - 1):
----> 6 res.append(block[i + 1] - block[i])
7 return np.array([res])
TypeError: unsupported operand type(s) for -: 'list' and 'list'
I assume that the first step is that I need to update my Model class to use a different input_size parameter right?
Any help would be appreciated! Thanks
python-3.x deep-learning artificial-intelligence reinforcement-learning evolutionary-algorithm
add a comment |
I've just started to play around with Reinforcement Learning these days and I found the Natural Evolution Strategy, I kind of understand how it works, but I'm very new with Python and I found this code which basically implements the NES algorithm
https://github.com/huseinzol05/Stock-Prediction-Models/blob/master/agent/updated-NES-google.ipynb
import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns
import random
sns.set()
# CSV containing the TSLA stock predictions in the form of
# [Date, Open, High, Low, Close, Adj Close, Volume] from
# Yahoo! Finance
df = pd.read_csv('TSLA.csv')
df.head()
def get_state(data, t, n):
d = t - n + 1
block = data[d : t + 1] if d >= 0 else -d * [data[0]] + data[0 : t + 1]
res = []
for i in range(n - 1):
res.append(block[i + 1] - block[i])
return np.array([res])
close = df.Close.values.tolist()
window_size = 30
skip = 1
l = len(close) - 1
class Deep_Evolution_Strategy:
inputs = None
def __init__(
self, weights, reward_function, population_size, sigma, learning_rate
):
self.weights = weights
self.reward_function = reward_function
self.population_size = population_size
self.sigma = sigma
self.learning_rate = learning_rate
def _get_weight_from_population(self, weights, population):
weights_population = []
for index, i in enumerate(population):
jittered = self.sigma * i
weights_population.append(weights[index] + jittered)
return weights_population
def get_weights(self):
return self.weights
def train(self, epoch = 100, print_every = 1):
lasttime = time.time()
for i in range(epoch):
population = []
rewards = np.zeros(self.population_size)
for k in range(self.population_size):
x = []
for w in self.weights:
x.append(np.random.randn(*w.shape))
population.append(x)
for k in range(self.population_size):
weights_population = self._get_weight_from_population(self.weights, population[k])
rewards[k] = self.reward_function(weights_population)
rewards = (rewards - np.mean(rewards)) / np.std(rewards)
for index, w in enumerate(self.weights):
A = np.array([p[index] for p in population])
self.weights[index] = (
w
+ self.learning_rate
/ (self.population_size * self.sigma)
* np.dot(A.T, rewards).T
)
class Model:
def __init__(self, input_size, layer_size, output_size):
self.weights = [
np.random.randn(input_size, layer_size),
np.random.randn(layer_size, output_size),
np.random.randn(layer_size, 1),
np.random.randn(1, layer_size),
]
def predict(self, inputs):
feed = np.dot(inputs, self.weights[0]) + self.weights[-1]
decision = np.dot(feed, self.weights[1])
buy = np.dot(feed, self.weights[2])
return decision, buy
def get_weights(self):
return self.weights
def set_weights(self, weights):
self.weights = weights
class Agent:
POPULATION_SIZE = 15
SIGMA = 0.1
LEARNING_RATE = 0.03
def __init__(self, model, money, max_buy, max_sell):
self.model = model
self.initial_money = money
self.max_buy = max_buy
self.max_sell = max_sell
self.es = Deep_Evolution_Strategy(
self.model.get_weights(),
self.get_reward,
self.POPULATION_SIZE,
self.SIGMA,
self.LEARNING_RATE,
)
def act(self, sequence):
decision, buy = self.model.predict(np.array(sequence))
return np.argmax(decision[0]), int(buy[0])
def get_reward(self, weights):
initial_money = self.initial_money
starting_money = initial_money
self.model.weights = weights
state = get_state(close, 0, window_size + 1)
inventory = []
quantity = 0
for t in range(0, l, skip):
action, buy = self.act(state)
next_state = get_state(close, t + 1, window_size + 1)
if action == 1 and initial_money >= close[t]:
if buy < 0:
buy = 1
if buy > self.max_buy:
buy_units = self.max_buy
else:
buy_units = buy
total_buy = buy_units * close[t]
initial_money -= total_buy
inventory.append(total_buy)
quantity += buy_units
elif action == 2 and len(inventory) > 0:
if quantity > self.max_sell:
sell_units = self.max_sell
else:
sell_units = quantity
quantity -= sell_units
total_sell = sell_units * close[t]
initial_money += total_sell
state = next_state
return ((initial_money - starting_money) / starting_money) * 100
def fit(self, iterations, checkpoint):
self.es.train(iterations, print_every = checkpoint)
def buy(self):
initial_money = self.initial_money
state = get_state(close, 0, window_size + 1)
starting_money = initial_money
states_sell = []
states_buy = []
inventory = []
quantity = 0
for t in range(0, l, skip):
action, buy = self.act(state)
next_state = get_state(close, t + 1, window_size + 1)
if action == 1 and initial_money >= close[t]:
if buy < 0:
buy = 1
if buy > self.max_buy:
buy_units = self.max_buy
else:
buy_units = buy
total_buy = buy_units * close[t]
initial_money -= total_buy
inventory.append(total_buy)
quantity += buy_units
states_buy.append(t)
elif action == 2 and len(inventory) > 0:
bought_price = inventory.pop(0)
if quantity > self.max_sell:
sell_units = self.max_sell
else:
sell_units = quantity
if sell_units < 1:
continue
quantity -= sell_units
total_sell = sell_units * close[t]
initial_money += total_sell
states_sell.append(t)
try:
invest = ((total_sell - bought_price) / bought_price) * 100
except:
invest = 0
state = next_state
invest = ((initial_money - starting_money) / starting_money) * 100
model = Model(window_size, 500, 3)
agent = Agent(model, 10000, 5, 5)
agent.fit(500, 10)
agent.buy()
As you can see, it is being used for stock prediction and it only uses the Close column, but I would like to try it with more parameters, let's say High and Low.
I'm struggling when I need to change it to use this 2 dimensional list. I've tried a simple change:
close = df.loc[:,['Close','Open']].values.tolist()
Which adds one more property at every row of the list. But when I run the code I start to see errors when I execute the agent.fit() call:
agent.fit(iterations = 500, checkpoint = 10)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-225-d97697984016> in <module>()
----> 1 agent.fit(iterations = 500, checkpoint = 10)
<ipython-input-223-35d9fbba5756> in fit(self, iterations, checkpoint)
66
67 def fit(self, iterations, checkpoint):
---> 68 self.es.train(iterations, print_every = checkpoint)
69
70 def buy(self):
<ipython-input-220-84ca345091f4> in train(self, epoch, print_every)
33 self.weights, population[k]
34 )
---> 35 rewards[k] = self.reward_function(weights_population)
36 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
37
<ipython-input-223-35d9fbba5756> in get_reward(self, weights)
36
37 self.model.weights = weights
---> 38 state = get_state(self.close, 0, self.window_size + 1)
39 inventory = []
40 quantity = 0
<ipython-input-219-0df8d8be24a9> in get_state(data, t, n)
4 res = []
5 for i in range(n - 1):
----> 6 res.append(block[i + 1] - block[i])
7 return np.array([res])
TypeError: unsupported operand type(s) for -: 'list' and 'list'
I assume that the first step is that I need to update my Model class to use a different input_size parameter right?
Any help would be appreciated! Thanks
python-3.x deep-learning artificial-intelligence reinforcement-learning evolutionary-algorithm
add a comment |
I've just started to play around with Reinforcement Learning these days and I found the Natural Evolution Strategy, I kind of understand how it works, but I'm very new with Python and I found this code which basically implements the NES algorithm
https://github.com/huseinzol05/Stock-Prediction-Models/blob/master/agent/updated-NES-google.ipynb
import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns
import random
sns.set()
# CSV containing the TSLA stock predictions in the form of
# [Date, Open, High, Low, Close, Adj Close, Volume] from
# Yahoo! Finance
df = pd.read_csv('TSLA.csv')
df.head()
def get_state(data, t, n):
d = t - n + 1
block = data[d : t + 1] if d >= 0 else -d * [data[0]] + data[0 : t + 1]
res = []
for i in range(n - 1):
res.append(block[i + 1] - block[i])
return np.array([res])
close = df.Close.values.tolist()
window_size = 30
skip = 1
l = len(close) - 1
class Deep_Evolution_Strategy:
inputs = None
def __init__(
self, weights, reward_function, population_size, sigma, learning_rate
):
self.weights = weights
self.reward_function = reward_function
self.population_size = population_size
self.sigma = sigma
self.learning_rate = learning_rate
def _get_weight_from_population(self, weights, population):
weights_population = []
for index, i in enumerate(population):
jittered = self.sigma * i
weights_population.append(weights[index] + jittered)
return weights_population
def get_weights(self):
return self.weights
def train(self, epoch = 100, print_every = 1):
lasttime = time.time()
for i in range(epoch):
population = []
rewards = np.zeros(self.population_size)
for k in range(self.population_size):
x = []
for w in self.weights:
x.append(np.random.randn(*w.shape))
population.append(x)
for k in range(self.population_size):
weights_population = self._get_weight_from_population(self.weights, population[k])
rewards[k] = self.reward_function(weights_population)
rewards = (rewards - np.mean(rewards)) / np.std(rewards)
for index, w in enumerate(self.weights):
A = np.array([p[index] for p in population])
self.weights[index] = (
w
+ self.learning_rate
/ (self.population_size * self.sigma)
* np.dot(A.T, rewards).T
)
class Model:
def __init__(self, input_size, layer_size, output_size):
self.weights = [
np.random.randn(input_size, layer_size),
np.random.randn(layer_size, output_size),
np.random.randn(layer_size, 1),
np.random.randn(1, layer_size),
]
def predict(self, inputs):
feed = np.dot(inputs, self.weights[0]) + self.weights[-1]
decision = np.dot(feed, self.weights[1])
buy = np.dot(feed, self.weights[2])
return decision, buy
def get_weights(self):
return self.weights
def set_weights(self, weights):
self.weights = weights
class Agent:
POPULATION_SIZE = 15
SIGMA = 0.1
LEARNING_RATE = 0.03
def __init__(self, model, money, max_buy, max_sell):
self.model = model
self.initial_money = money
self.max_buy = max_buy
self.max_sell = max_sell
self.es = Deep_Evolution_Strategy(
self.model.get_weights(),
self.get_reward,
self.POPULATION_SIZE,
self.SIGMA,
self.LEARNING_RATE,
)
def act(self, sequence):
decision, buy = self.model.predict(np.array(sequence))
return np.argmax(decision[0]), int(buy[0])
def get_reward(self, weights):
initial_money = self.initial_money
starting_money = initial_money
self.model.weights = weights
state = get_state(close, 0, window_size + 1)
inventory = []
quantity = 0
for t in range(0, l, skip):
action, buy = self.act(state)
next_state = get_state(close, t + 1, window_size + 1)
if action == 1 and initial_money >= close[t]:
if buy < 0:
buy = 1
if buy > self.max_buy:
buy_units = self.max_buy
else:
buy_units = buy
total_buy = buy_units * close[t]
initial_money -= total_buy
inventory.append(total_buy)
quantity += buy_units
elif action == 2 and len(inventory) > 0:
if quantity > self.max_sell:
sell_units = self.max_sell
else:
sell_units = quantity
quantity -= sell_units
total_sell = sell_units * close[t]
initial_money += total_sell
state = next_state
return ((initial_money - starting_money) / starting_money) * 100
def fit(self, iterations, checkpoint):
self.es.train(iterations, print_every = checkpoint)
def buy(self):
initial_money = self.initial_money
state = get_state(close, 0, window_size + 1)
starting_money = initial_money
states_sell = []
states_buy = []
inventory = []
quantity = 0
for t in range(0, l, skip):
action, buy = self.act(state)
next_state = get_state(close, t + 1, window_size + 1)
if action == 1 and initial_money >= close[t]:
if buy < 0:
buy = 1
if buy > self.max_buy:
buy_units = self.max_buy
else:
buy_units = buy
total_buy = buy_units * close[t]
initial_money -= total_buy
inventory.append(total_buy)
quantity += buy_units
states_buy.append(t)
elif action == 2 and len(inventory) > 0:
bought_price = inventory.pop(0)
if quantity > self.max_sell:
sell_units = self.max_sell
else:
sell_units = quantity
if sell_units < 1:
continue
quantity -= sell_units
total_sell = sell_units * close[t]
initial_money += total_sell
states_sell.append(t)
try:
invest = ((total_sell - bought_price) / bought_price) * 100
except:
invest = 0
state = next_state
invest = ((initial_money - starting_money) / starting_money) * 100
model = Model(window_size, 500, 3)
agent = Agent(model, 10000, 5, 5)
agent.fit(500, 10)
agent.buy()
As you can see, it is being used for stock prediction and it only uses the Close column, but I would like to try it with more parameters, let's say High and Low.
I'm struggling when I need to change it to use this 2 dimensional list. I've tried a simple change:
close = df.loc[:,['Close','Open']].values.tolist()
Which adds one more property at every row of the list. But when I run the code I start to see errors when I execute the agent.fit() call:
agent.fit(iterations = 500, checkpoint = 10)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-225-d97697984016> in <module>()
----> 1 agent.fit(iterations = 500, checkpoint = 10)
<ipython-input-223-35d9fbba5756> in fit(self, iterations, checkpoint)
66
67 def fit(self, iterations, checkpoint):
---> 68 self.es.train(iterations, print_every = checkpoint)
69
70 def buy(self):
<ipython-input-220-84ca345091f4> in train(self, epoch, print_every)
33 self.weights, population[k]
34 )
---> 35 rewards[k] = self.reward_function(weights_population)
36 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
37
<ipython-input-223-35d9fbba5756> in get_reward(self, weights)
36
37 self.model.weights = weights
---> 38 state = get_state(self.close, 0, self.window_size + 1)
39 inventory = []
40 quantity = 0
<ipython-input-219-0df8d8be24a9> in get_state(data, t, n)
4 res = []
5 for i in range(n - 1):
----> 6 res.append(block[i + 1] - block[i])
7 return np.array([res])
TypeError: unsupported operand type(s) for -: 'list' and 'list'
I assume that the first step is that I need to update my Model class to use a different input_size parameter right?
Any help would be appreciated! Thanks
python-3.x deep-learning artificial-intelligence reinforcement-learning evolutionary-algorithm
I've just started to play around with Reinforcement Learning these days and I found the Natural Evolution Strategy, I kind of understand how it works, but I'm very new with Python and I found this code which basically implements the NES algorithm
https://github.com/huseinzol05/Stock-Prediction-Models/blob/master/agent/updated-NES-google.ipynb
import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns
import random
sns.set()
# CSV containing the TSLA stock predictions in the form of
# [Date, Open, High, Low, Close, Adj Close, Volume] from
# Yahoo! Finance
df = pd.read_csv('TSLA.csv')
df.head()
def get_state(data, t, n):
d = t - n + 1
block = data[d : t + 1] if d >= 0 else -d * [data[0]] + data[0 : t + 1]
res = []
for i in range(n - 1):
res.append(block[i + 1] - block[i])
return np.array([res])
close = df.Close.values.tolist()
window_size = 30
skip = 1
l = len(close) - 1
class Deep_Evolution_Strategy:
inputs = None
def __init__(
self, weights, reward_function, population_size, sigma, learning_rate
):
self.weights = weights
self.reward_function = reward_function
self.population_size = population_size
self.sigma = sigma
self.learning_rate = learning_rate
def _get_weight_from_population(self, weights, population):
weights_population = []
for index, i in enumerate(population):
jittered = self.sigma * i
weights_population.append(weights[index] + jittered)
return weights_population
def get_weights(self):
return self.weights
def train(self, epoch = 100, print_every = 1):
lasttime = time.time()
for i in range(epoch):
population = []
rewards = np.zeros(self.population_size)
for k in range(self.population_size):
x = []
for w in self.weights:
x.append(np.random.randn(*w.shape))
population.append(x)
for k in range(self.population_size):
weights_population = self._get_weight_from_population(self.weights, population[k])
rewards[k] = self.reward_function(weights_population)
rewards = (rewards - np.mean(rewards)) / np.std(rewards)
for index, w in enumerate(self.weights):
A = np.array([p[index] for p in population])
self.weights[index] = (
w
+ self.learning_rate
/ (self.population_size * self.sigma)
* np.dot(A.T, rewards).T
)
class Model:
def __init__(self, input_size, layer_size, output_size):
self.weights = [
np.random.randn(input_size, layer_size),
np.random.randn(layer_size, output_size),
np.random.randn(layer_size, 1),
np.random.randn(1, layer_size),
]
def predict(self, inputs):
feed = np.dot(inputs, self.weights[0]) + self.weights[-1]
decision = np.dot(feed, self.weights[1])
buy = np.dot(feed, self.weights[2])
return decision, buy
def get_weights(self):
return self.weights
def set_weights(self, weights):
self.weights = weights
class Agent:
POPULATION_SIZE = 15
SIGMA = 0.1
LEARNING_RATE = 0.03
def __init__(self, model, money, max_buy, max_sell):
self.model = model
self.initial_money = money
self.max_buy = max_buy
self.max_sell = max_sell
self.es = Deep_Evolution_Strategy(
self.model.get_weights(),
self.get_reward,
self.POPULATION_SIZE,
self.SIGMA,
self.LEARNING_RATE,
)
def act(self, sequence):
decision, buy = self.model.predict(np.array(sequence))
return np.argmax(decision[0]), int(buy[0])
def get_reward(self, weights):
initial_money = self.initial_money
starting_money = initial_money
self.model.weights = weights
state = get_state(close, 0, window_size + 1)
inventory = []
quantity = 0
for t in range(0, l, skip):
action, buy = self.act(state)
next_state = get_state(close, t + 1, window_size + 1)
if action == 1 and initial_money >= close[t]:
if buy < 0:
buy = 1
if buy > self.max_buy:
buy_units = self.max_buy
else:
buy_units = buy
total_buy = buy_units * close[t]
initial_money -= total_buy
inventory.append(total_buy)
quantity += buy_units
elif action == 2 and len(inventory) > 0:
if quantity > self.max_sell:
sell_units = self.max_sell
else:
sell_units = quantity
quantity -= sell_units
total_sell = sell_units * close[t]
initial_money += total_sell
state = next_state
return ((initial_money - starting_money) / starting_money) * 100
def fit(self, iterations, checkpoint):
self.es.train(iterations, print_every = checkpoint)
def buy(self):
initial_money = self.initial_money
state = get_state(close, 0, window_size + 1)
starting_money = initial_money
states_sell = []
states_buy = []
inventory = []
quantity = 0
for t in range(0, l, skip):
action, buy = self.act(state)
next_state = get_state(close, t + 1, window_size + 1)
if action == 1 and initial_money >= close[t]:
if buy < 0:
buy = 1
if buy > self.max_buy:
buy_units = self.max_buy
else:
buy_units = buy
total_buy = buy_units * close[t]
initial_money -= total_buy
inventory.append(total_buy)
quantity += buy_units
states_buy.append(t)
elif action == 2 and len(inventory) > 0:
bought_price = inventory.pop(0)
if quantity > self.max_sell:
sell_units = self.max_sell
else:
sell_units = quantity
if sell_units < 1:
continue
quantity -= sell_units
total_sell = sell_units * close[t]
initial_money += total_sell
states_sell.append(t)
try:
invest = ((total_sell - bought_price) / bought_price) * 100
except:
invest = 0
state = next_state
invest = ((initial_money - starting_money) / starting_money) * 100
model = Model(window_size, 500, 3)
agent = Agent(model, 10000, 5, 5)
agent.fit(500, 10)
agent.buy()
As you can see, it is being used for stock prediction and it only uses the Close column, but I would like to try it with more parameters, let's say High and Low.
I'm struggling when I need to change it to use this 2 dimensional list. I've tried a simple change:
close = df.loc[:,['Close','Open']].values.tolist()
Which adds one more property at every row of the list. But when I run the code I start to see errors when I execute the agent.fit() call:
agent.fit(iterations = 500, checkpoint = 10)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-225-d97697984016> in <module>()
----> 1 agent.fit(iterations = 500, checkpoint = 10)
<ipython-input-223-35d9fbba5756> in fit(self, iterations, checkpoint)
66
67 def fit(self, iterations, checkpoint):
---> 68 self.es.train(iterations, print_every = checkpoint)
69
70 def buy(self):
<ipython-input-220-84ca345091f4> in train(self, epoch, print_every)
33 self.weights, population[k]
34 )
---> 35 rewards[k] = self.reward_function(weights_population)
36 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
37
<ipython-input-223-35d9fbba5756> in get_reward(self, weights)
36
37 self.model.weights = weights
---> 38 state = get_state(self.close, 0, self.window_size + 1)
39 inventory = []
40 quantity = 0
<ipython-input-219-0df8d8be24a9> in get_state(data, t, n)
4 res = []
5 for i in range(n - 1):
----> 6 res.append(block[i + 1] - block[i])
7 return np.array([res])
TypeError: unsupported operand type(s) for -: 'list' and 'list'
I assume that the first step is that I need to update my Model class to use a different input_size parameter right?
Any help would be appreciated! Thanks
python-3.x deep-learning artificial-intelligence reinforcement-learning evolutionary-algorithm
python-3.x deep-learning artificial-intelligence reinforcement-learning evolutionary-algorithm
edited Mar 8 at 16:22
lucaswerner
asked Mar 8 at 14:30
lucaswernerlucaswerner
547
547
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55065285%2fmultivariate-natural-evolution-strategy%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55065285%2fmultivariate-natural-evolution-strategy%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown