Multivariate Natural Evolution Strategy The Next CEO of Stack OverflowApplying Darwinian evolution to programmingEvolution strategy with individual stepsizesDoes Python have a built in function for string natural sort?Explain the Differential Evolution methodWhat are the differences between genetic algorithms and evolution strategies?Evolution StrategiesHow is reproduce processing in (μ,λ) evolution strategy algorithm?How is the equation in “Evolution Strategies as a Scalable Alternative to Reinforcement Learning” derived?Get evolution log from a pygmo archipelagoDifference between Evolutionary Strategies and Reinforcement Learning?

How do I go from 300 unfinished/half written blog posts, to published posts?

Bold, vivid family

How do I make a variable always equal to the result of some calculations?

How to solve a differential equation with a term to a power?

How powerful is the invisibility granted by the Gloom Stalker ranger's Umbral Sight feature?

Return the Closest Prime Number

Interfacing a button to MCU (and PC) with 50m long cable

What benefits would be gained by using human laborers instead of drones in deep sea mining?

Elegant way to replace substring in a regex with optional groups in Python?

What was the first Unix version to run on a microcomputer?

Are there any unintended negative consequences to allowing PCs to gain multiple levels at once in a short milestone-XP game?

Are there any limitations on attacking while grappling?

What happened in Rome, when the western empire "fell"?

Won the lottery - how do I keep the money?

Does it take more energy to get to Venus or to Mars?

How to start emacs in "nothing" mode (`fundamental-mode`)

Novel about a guy who is possessed by the divine essence and the world ends?

How did people program for Consoles with multiple CPUs?

Why didn't Khan get resurrected in the Genesis Explosion?

Make solar eclipses exceedingly rare, but still have new moons

Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis

If a black hole is created from light, can this black hole then move at speed of light?

How to count occurrences of text in a file?

Calculus II Question



Multivariate Natural Evolution Strategy



The Next CEO of Stack OverflowApplying Darwinian evolution to programmingEvolution strategy with individual stepsizesDoes Python have a built in function for string natural sort?Explain the Differential Evolution methodWhat are the differences between genetic algorithms and evolution strategies?Evolution StrategiesHow is reproduce processing in (μ,λ) evolution strategy algorithm?How is the equation in “Evolution Strategies as a Scalable Alternative to Reinforcement Learning” derived?Get evolution log from a pygmo archipelagoDifference between Evolutionary Strategies and Reinforcement Learning?










1















I've just started to play around with Reinforcement Learning these days and I found the Natural Evolution Strategy, I kind of understand how it works, but I'm very new with Python and I found this code which basically implements the NES algorithm



https://github.com/huseinzol05/Stock-Prediction-Models/blob/master/agent/updated-NES-google.ipynb



import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns
import random
sns.set()

# CSV containing the TSLA stock predictions in the form of
# [Date, Open, High, Low, Close, Adj Close, Volume] from
# Yahoo! Finance
df = pd.read_csv('TSLA.csv')
df.head()


def get_state(data, t, n):
d = t - n + 1
block = data[d : t + 1] if d >= 0 else -d * [data[0]] + data[0 : t + 1]
res = []
for i in range(n - 1):
res.append(block[i + 1] - block[i])
return np.array([res])

close = df.Close.values.tolist()
window_size = 30
skip = 1
l = len(close) - 1

class Deep_Evolution_Strategy:

inputs = None

def __init__(
self, weights, reward_function, population_size, sigma, learning_rate
):
self.weights = weights
self.reward_function = reward_function
self.population_size = population_size
self.sigma = sigma
self.learning_rate = learning_rate

def _get_weight_from_population(self, weights, population):
weights_population = []
for index, i in enumerate(population):
jittered = self.sigma * i
weights_population.append(weights[index] + jittered)
return weights_population

def get_weights(self):
return self.weights

def train(self, epoch = 100, print_every = 1):
lasttime = time.time()
for i in range(epoch):
population = []
rewards = np.zeros(self.population_size)
for k in range(self.population_size):
x = []
for w in self.weights:
x.append(np.random.randn(*w.shape))
population.append(x)
for k in range(self.population_size):
weights_population = self._get_weight_from_population(self.weights, population[k])
rewards[k] = self.reward_function(weights_population)
rewards = (rewards - np.mean(rewards)) / np.std(rewards)
for index, w in enumerate(self.weights):
A = np.array([p[index] for p in population])
self.weights[index] = (
w
+ self.learning_rate
/ (self.population_size * self.sigma)
* np.dot(A.T, rewards).T
)


class Model:
def __init__(self, input_size, layer_size, output_size):
self.weights = [
np.random.randn(input_size, layer_size),
np.random.randn(layer_size, output_size),
np.random.randn(layer_size, 1),
np.random.randn(1, layer_size),
]

def predict(self, inputs):
feed = np.dot(inputs, self.weights[0]) + self.weights[-1]
decision = np.dot(feed, self.weights[1])
buy = np.dot(feed, self.weights[2])
return decision, buy

def get_weights(self):
return self.weights

def set_weights(self, weights):
self.weights = weights


class Agent:

POPULATION_SIZE = 15
SIGMA = 0.1
LEARNING_RATE = 0.03

def __init__(self, model, money, max_buy, max_sell):
self.model = model
self.initial_money = money
self.max_buy = max_buy
self.max_sell = max_sell
self.es = Deep_Evolution_Strategy(
self.model.get_weights(),
self.get_reward,
self.POPULATION_SIZE,
self.SIGMA,
self.LEARNING_RATE,
)

def act(self, sequence):
decision, buy = self.model.predict(np.array(sequence))
return np.argmax(decision[0]), int(buy[0])

def get_reward(self, weights):
initial_money = self.initial_money
starting_money = initial_money
self.model.weights = weights
state = get_state(close, 0, window_size + 1)
inventory = []
quantity = 0
for t in range(0, l, skip):
action, buy = self.act(state)
next_state = get_state(close, t + 1, window_size + 1)
if action == 1 and initial_money >= close[t]:
if buy < 0:
buy = 1
if buy > self.max_buy:
buy_units = self.max_buy
else:
buy_units = buy
total_buy = buy_units * close[t]
initial_money -= total_buy
inventory.append(total_buy)
quantity += buy_units
elif action == 2 and len(inventory) > 0:
if quantity > self.max_sell:
sell_units = self.max_sell
else:
sell_units = quantity
quantity -= sell_units
total_sell = sell_units * close[t]
initial_money += total_sell

state = next_state
return ((initial_money - starting_money) / starting_money) * 100

def fit(self, iterations, checkpoint):
self.es.train(iterations, print_every = checkpoint)

def buy(self):
initial_money = self.initial_money
state = get_state(close, 0, window_size + 1)
starting_money = initial_money
states_sell = []
states_buy = []
inventory = []
quantity = 0
for t in range(0, l, skip):
action, buy = self.act(state)
next_state = get_state(close, t + 1, window_size + 1)
if action == 1 and initial_money >= close[t]:
if buy < 0:
buy = 1
if buy > self.max_buy:
buy_units = self.max_buy
else:
buy_units = buy
total_buy = buy_units * close[t]
initial_money -= total_buy
inventory.append(total_buy)
quantity += buy_units
states_buy.append(t)
elif action == 2 and len(inventory) > 0:
bought_price = inventory.pop(0)
if quantity > self.max_sell:
sell_units = self.max_sell
else:
sell_units = quantity
if sell_units < 1:
continue
quantity -= sell_units
total_sell = sell_units * close[t]
initial_money += total_sell
states_sell.append(t)
try:
invest = ((total_sell - bought_price) / bought_price) * 100
except:
invest = 0
state = next_state

invest = ((initial_money - starting_money) / starting_money) * 100

model = Model(window_size, 500, 3)
agent = Agent(model, 10000, 5, 5)
agent.fit(500, 10)
agent.buy()


As you can see, it is being used for stock prediction and it only uses the Close column, but I would like to try it with more parameters, let's say High and Low.



I'm struggling when I need to change it to use this 2 dimensional list. I've tried a simple change:



close = df.loc[:,['Close','Open']].values.tolist()


Which adds one more property at every row of the list. But when I run the code I start to see errors when I execute the agent.fit() call:



agent.fit(iterations = 500, checkpoint = 10)

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-225-d97697984016> in <module>()
----> 1 agent.fit(iterations = 500, checkpoint = 10)

<ipython-input-223-35d9fbba5756> in fit(self, iterations, checkpoint)
66
67 def fit(self, iterations, checkpoint):
---> 68 self.es.train(iterations, print_every = checkpoint)
69
70 def buy(self):

<ipython-input-220-84ca345091f4> in train(self, epoch, print_every)
33 self.weights, population[k]
34 )
---> 35 rewards[k] = self.reward_function(weights_population)
36 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
37

<ipython-input-223-35d9fbba5756> in get_reward(self, weights)
36
37 self.model.weights = weights
---> 38 state = get_state(self.close, 0, self.window_size + 1)
39 inventory = []
40 quantity = 0

<ipython-input-219-0df8d8be24a9> in get_state(data, t, n)
4 res = []
5 for i in range(n - 1):
----> 6 res.append(block[i + 1] - block[i])
7 return np.array([res])

TypeError: unsupported operand type(s) for -: 'list' and 'list'


I assume that the first step is that I need to update my Model class to use a different input_size parameter right?



Any help would be appreciated! Thanks










share|improve this question




























    1















    I've just started to play around with Reinforcement Learning these days and I found the Natural Evolution Strategy, I kind of understand how it works, but I'm very new with Python and I found this code which basically implements the NES algorithm



    https://github.com/huseinzol05/Stock-Prediction-Models/blob/master/agent/updated-NES-google.ipynb



    import numpy as np
    import pandas as pd
    import time
    import matplotlib.pyplot as plt
    import seaborn as sns
    import random
    sns.set()

    # CSV containing the TSLA stock predictions in the form of
    # [Date, Open, High, Low, Close, Adj Close, Volume] from
    # Yahoo! Finance
    df = pd.read_csv('TSLA.csv')
    df.head()


    def get_state(data, t, n):
    d = t - n + 1
    block = data[d : t + 1] if d >= 0 else -d * [data[0]] + data[0 : t + 1]
    res = []
    for i in range(n - 1):
    res.append(block[i + 1] - block[i])
    return np.array([res])

    close = df.Close.values.tolist()
    window_size = 30
    skip = 1
    l = len(close) - 1

    class Deep_Evolution_Strategy:

    inputs = None

    def __init__(
    self, weights, reward_function, population_size, sigma, learning_rate
    ):
    self.weights = weights
    self.reward_function = reward_function
    self.population_size = population_size
    self.sigma = sigma
    self.learning_rate = learning_rate

    def _get_weight_from_population(self, weights, population):
    weights_population = []
    for index, i in enumerate(population):
    jittered = self.sigma * i
    weights_population.append(weights[index] + jittered)
    return weights_population

    def get_weights(self):
    return self.weights

    def train(self, epoch = 100, print_every = 1):
    lasttime = time.time()
    for i in range(epoch):
    population = []
    rewards = np.zeros(self.population_size)
    for k in range(self.population_size):
    x = []
    for w in self.weights:
    x.append(np.random.randn(*w.shape))
    population.append(x)
    for k in range(self.population_size):
    weights_population = self._get_weight_from_population(self.weights, population[k])
    rewards[k] = self.reward_function(weights_population)
    rewards = (rewards - np.mean(rewards)) / np.std(rewards)
    for index, w in enumerate(self.weights):
    A = np.array([p[index] for p in population])
    self.weights[index] = (
    w
    + self.learning_rate
    / (self.population_size * self.sigma)
    * np.dot(A.T, rewards).T
    )


    class Model:
    def __init__(self, input_size, layer_size, output_size):
    self.weights = [
    np.random.randn(input_size, layer_size),
    np.random.randn(layer_size, output_size),
    np.random.randn(layer_size, 1),
    np.random.randn(1, layer_size),
    ]

    def predict(self, inputs):
    feed = np.dot(inputs, self.weights[0]) + self.weights[-1]
    decision = np.dot(feed, self.weights[1])
    buy = np.dot(feed, self.weights[2])
    return decision, buy

    def get_weights(self):
    return self.weights

    def set_weights(self, weights):
    self.weights = weights


    class Agent:

    POPULATION_SIZE = 15
    SIGMA = 0.1
    LEARNING_RATE = 0.03

    def __init__(self, model, money, max_buy, max_sell):
    self.model = model
    self.initial_money = money
    self.max_buy = max_buy
    self.max_sell = max_sell
    self.es = Deep_Evolution_Strategy(
    self.model.get_weights(),
    self.get_reward,
    self.POPULATION_SIZE,
    self.SIGMA,
    self.LEARNING_RATE,
    )

    def act(self, sequence):
    decision, buy = self.model.predict(np.array(sequence))
    return np.argmax(decision[0]), int(buy[0])

    def get_reward(self, weights):
    initial_money = self.initial_money
    starting_money = initial_money
    self.model.weights = weights
    state = get_state(close, 0, window_size + 1)
    inventory = []
    quantity = 0
    for t in range(0, l, skip):
    action, buy = self.act(state)
    next_state = get_state(close, t + 1, window_size + 1)
    if action == 1 and initial_money >= close[t]:
    if buy < 0:
    buy = 1
    if buy > self.max_buy:
    buy_units = self.max_buy
    else:
    buy_units = buy
    total_buy = buy_units * close[t]
    initial_money -= total_buy
    inventory.append(total_buy)
    quantity += buy_units
    elif action == 2 and len(inventory) > 0:
    if quantity > self.max_sell:
    sell_units = self.max_sell
    else:
    sell_units = quantity
    quantity -= sell_units
    total_sell = sell_units * close[t]
    initial_money += total_sell

    state = next_state
    return ((initial_money - starting_money) / starting_money) * 100

    def fit(self, iterations, checkpoint):
    self.es.train(iterations, print_every = checkpoint)

    def buy(self):
    initial_money = self.initial_money
    state = get_state(close, 0, window_size + 1)
    starting_money = initial_money
    states_sell = []
    states_buy = []
    inventory = []
    quantity = 0
    for t in range(0, l, skip):
    action, buy = self.act(state)
    next_state = get_state(close, t + 1, window_size + 1)
    if action == 1 and initial_money >= close[t]:
    if buy < 0:
    buy = 1
    if buy > self.max_buy:
    buy_units = self.max_buy
    else:
    buy_units = buy
    total_buy = buy_units * close[t]
    initial_money -= total_buy
    inventory.append(total_buy)
    quantity += buy_units
    states_buy.append(t)
    elif action == 2 and len(inventory) > 0:
    bought_price = inventory.pop(0)
    if quantity > self.max_sell:
    sell_units = self.max_sell
    else:
    sell_units = quantity
    if sell_units < 1:
    continue
    quantity -= sell_units
    total_sell = sell_units * close[t]
    initial_money += total_sell
    states_sell.append(t)
    try:
    invest = ((total_sell - bought_price) / bought_price) * 100
    except:
    invest = 0
    state = next_state

    invest = ((initial_money - starting_money) / starting_money) * 100

    model = Model(window_size, 500, 3)
    agent = Agent(model, 10000, 5, 5)
    agent.fit(500, 10)
    agent.buy()


    As you can see, it is being used for stock prediction and it only uses the Close column, but I would like to try it with more parameters, let's say High and Low.



    I'm struggling when I need to change it to use this 2 dimensional list. I've tried a simple change:



    close = df.loc[:,['Close','Open']].values.tolist()


    Which adds one more property at every row of the list. But when I run the code I start to see errors when I execute the agent.fit() call:



    agent.fit(iterations = 500, checkpoint = 10)

    ---------------------------------------------------------------------------
    TypeError Traceback (most recent call last)
    <ipython-input-225-d97697984016> in <module>()
    ----> 1 agent.fit(iterations = 500, checkpoint = 10)

    <ipython-input-223-35d9fbba5756> in fit(self, iterations, checkpoint)
    66
    67 def fit(self, iterations, checkpoint):
    ---> 68 self.es.train(iterations, print_every = checkpoint)
    69
    70 def buy(self):

    <ipython-input-220-84ca345091f4> in train(self, epoch, print_every)
    33 self.weights, population[k]
    34 )
    ---> 35 rewards[k] = self.reward_function(weights_population)
    36 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
    37

    <ipython-input-223-35d9fbba5756> in get_reward(self, weights)
    36
    37 self.model.weights = weights
    ---> 38 state = get_state(self.close, 0, self.window_size + 1)
    39 inventory = []
    40 quantity = 0

    <ipython-input-219-0df8d8be24a9> in get_state(data, t, n)
    4 res = []
    5 for i in range(n - 1):
    ----> 6 res.append(block[i + 1] - block[i])
    7 return np.array([res])

    TypeError: unsupported operand type(s) for -: 'list' and 'list'


    I assume that the first step is that I need to update my Model class to use a different input_size parameter right?



    Any help would be appreciated! Thanks










    share|improve this question


























      1












      1








      1








      I've just started to play around with Reinforcement Learning these days and I found the Natural Evolution Strategy, I kind of understand how it works, but I'm very new with Python and I found this code which basically implements the NES algorithm



      https://github.com/huseinzol05/Stock-Prediction-Models/blob/master/agent/updated-NES-google.ipynb



      import numpy as np
      import pandas as pd
      import time
      import matplotlib.pyplot as plt
      import seaborn as sns
      import random
      sns.set()

      # CSV containing the TSLA stock predictions in the form of
      # [Date, Open, High, Low, Close, Adj Close, Volume] from
      # Yahoo! Finance
      df = pd.read_csv('TSLA.csv')
      df.head()


      def get_state(data, t, n):
      d = t - n + 1
      block = data[d : t + 1] if d >= 0 else -d * [data[0]] + data[0 : t + 1]
      res = []
      for i in range(n - 1):
      res.append(block[i + 1] - block[i])
      return np.array([res])

      close = df.Close.values.tolist()
      window_size = 30
      skip = 1
      l = len(close) - 1

      class Deep_Evolution_Strategy:

      inputs = None

      def __init__(
      self, weights, reward_function, population_size, sigma, learning_rate
      ):
      self.weights = weights
      self.reward_function = reward_function
      self.population_size = population_size
      self.sigma = sigma
      self.learning_rate = learning_rate

      def _get_weight_from_population(self, weights, population):
      weights_population = []
      for index, i in enumerate(population):
      jittered = self.sigma * i
      weights_population.append(weights[index] + jittered)
      return weights_population

      def get_weights(self):
      return self.weights

      def train(self, epoch = 100, print_every = 1):
      lasttime = time.time()
      for i in range(epoch):
      population = []
      rewards = np.zeros(self.population_size)
      for k in range(self.population_size):
      x = []
      for w in self.weights:
      x.append(np.random.randn(*w.shape))
      population.append(x)
      for k in range(self.population_size):
      weights_population = self._get_weight_from_population(self.weights, population[k])
      rewards[k] = self.reward_function(weights_population)
      rewards = (rewards - np.mean(rewards)) / np.std(rewards)
      for index, w in enumerate(self.weights):
      A = np.array([p[index] for p in population])
      self.weights[index] = (
      w
      + self.learning_rate
      / (self.population_size * self.sigma)
      * np.dot(A.T, rewards).T
      )


      class Model:
      def __init__(self, input_size, layer_size, output_size):
      self.weights = [
      np.random.randn(input_size, layer_size),
      np.random.randn(layer_size, output_size),
      np.random.randn(layer_size, 1),
      np.random.randn(1, layer_size),
      ]

      def predict(self, inputs):
      feed = np.dot(inputs, self.weights[0]) + self.weights[-1]
      decision = np.dot(feed, self.weights[1])
      buy = np.dot(feed, self.weights[2])
      return decision, buy

      def get_weights(self):
      return self.weights

      def set_weights(self, weights):
      self.weights = weights


      class Agent:

      POPULATION_SIZE = 15
      SIGMA = 0.1
      LEARNING_RATE = 0.03

      def __init__(self, model, money, max_buy, max_sell):
      self.model = model
      self.initial_money = money
      self.max_buy = max_buy
      self.max_sell = max_sell
      self.es = Deep_Evolution_Strategy(
      self.model.get_weights(),
      self.get_reward,
      self.POPULATION_SIZE,
      self.SIGMA,
      self.LEARNING_RATE,
      )

      def act(self, sequence):
      decision, buy = self.model.predict(np.array(sequence))
      return np.argmax(decision[0]), int(buy[0])

      def get_reward(self, weights):
      initial_money = self.initial_money
      starting_money = initial_money
      self.model.weights = weights
      state = get_state(close, 0, window_size + 1)
      inventory = []
      quantity = 0
      for t in range(0, l, skip):
      action, buy = self.act(state)
      next_state = get_state(close, t + 1, window_size + 1)
      if action == 1 and initial_money >= close[t]:
      if buy < 0:
      buy = 1
      if buy > self.max_buy:
      buy_units = self.max_buy
      else:
      buy_units = buy
      total_buy = buy_units * close[t]
      initial_money -= total_buy
      inventory.append(total_buy)
      quantity += buy_units
      elif action == 2 and len(inventory) > 0:
      if quantity > self.max_sell:
      sell_units = self.max_sell
      else:
      sell_units = quantity
      quantity -= sell_units
      total_sell = sell_units * close[t]
      initial_money += total_sell

      state = next_state
      return ((initial_money - starting_money) / starting_money) * 100

      def fit(self, iterations, checkpoint):
      self.es.train(iterations, print_every = checkpoint)

      def buy(self):
      initial_money = self.initial_money
      state = get_state(close, 0, window_size + 1)
      starting_money = initial_money
      states_sell = []
      states_buy = []
      inventory = []
      quantity = 0
      for t in range(0, l, skip):
      action, buy = self.act(state)
      next_state = get_state(close, t + 1, window_size + 1)
      if action == 1 and initial_money >= close[t]:
      if buy < 0:
      buy = 1
      if buy > self.max_buy:
      buy_units = self.max_buy
      else:
      buy_units = buy
      total_buy = buy_units * close[t]
      initial_money -= total_buy
      inventory.append(total_buy)
      quantity += buy_units
      states_buy.append(t)
      elif action == 2 and len(inventory) > 0:
      bought_price = inventory.pop(0)
      if quantity > self.max_sell:
      sell_units = self.max_sell
      else:
      sell_units = quantity
      if sell_units < 1:
      continue
      quantity -= sell_units
      total_sell = sell_units * close[t]
      initial_money += total_sell
      states_sell.append(t)
      try:
      invest = ((total_sell - bought_price) / bought_price) * 100
      except:
      invest = 0
      state = next_state

      invest = ((initial_money - starting_money) / starting_money) * 100

      model = Model(window_size, 500, 3)
      agent = Agent(model, 10000, 5, 5)
      agent.fit(500, 10)
      agent.buy()


      As you can see, it is being used for stock prediction and it only uses the Close column, but I would like to try it with more parameters, let's say High and Low.



      I'm struggling when I need to change it to use this 2 dimensional list. I've tried a simple change:



      close = df.loc[:,['Close','Open']].values.tolist()


      Which adds one more property at every row of the list. But when I run the code I start to see errors when I execute the agent.fit() call:



      agent.fit(iterations = 500, checkpoint = 10)

      ---------------------------------------------------------------------------
      TypeError Traceback (most recent call last)
      <ipython-input-225-d97697984016> in <module>()
      ----> 1 agent.fit(iterations = 500, checkpoint = 10)

      <ipython-input-223-35d9fbba5756> in fit(self, iterations, checkpoint)
      66
      67 def fit(self, iterations, checkpoint):
      ---> 68 self.es.train(iterations, print_every = checkpoint)
      69
      70 def buy(self):

      <ipython-input-220-84ca345091f4> in train(self, epoch, print_every)
      33 self.weights, population[k]
      34 )
      ---> 35 rewards[k] = self.reward_function(weights_population)
      36 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
      37

      <ipython-input-223-35d9fbba5756> in get_reward(self, weights)
      36
      37 self.model.weights = weights
      ---> 38 state = get_state(self.close, 0, self.window_size + 1)
      39 inventory = []
      40 quantity = 0

      <ipython-input-219-0df8d8be24a9> in get_state(data, t, n)
      4 res = []
      5 for i in range(n - 1):
      ----> 6 res.append(block[i + 1] - block[i])
      7 return np.array([res])

      TypeError: unsupported operand type(s) for -: 'list' and 'list'


      I assume that the first step is that I need to update my Model class to use a different input_size parameter right?



      Any help would be appreciated! Thanks










      share|improve this question
















      I've just started to play around with Reinforcement Learning these days and I found the Natural Evolution Strategy, I kind of understand how it works, but I'm very new with Python and I found this code which basically implements the NES algorithm



      https://github.com/huseinzol05/Stock-Prediction-Models/blob/master/agent/updated-NES-google.ipynb



      import numpy as np
      import pandas as pd
      import time
      import matplotlib.pyplot as plt
      import seaborn as sns
      import random
      sns.set()

      # CSV containing the TSLA stock predictions in the form of
      # [Date, Open, High, Low, Close, Adj Close, Volume] from
      # Yahoo! Finance
      df = pd.read_csv('TSLA.csv')
      df.head()


      def get_state(data, t, n):
      d = t - n + 1
      block = data[d : t + 1] if d >= 0 else -d * [data[0]] + data[0 : t + 1]
      res = []
      for i in range(n - 1):
      res.append(block[i + 1] - block[i])
      return np.array([res])

      close = df.Close.values.tolist()
      window_size = 30
      skip = 1
      l = len(close) - 1

      class Deep_Evolution_Strategy:

      inputs = None

      def __init__(
      self, weights, reward_function, population_size, sigma, learning_rate
      ):
      self.weights = weights
      self.reward_function = reward_function
      self.population_size = population_size
      self.sigma = sigma
      self.learning_rate = learning_rate

      def _get_weight_from_population(self, weights, population):
      weights_population = []
      for index, i in enumerate(population):
      jittered = self.sigma * i
      weights_population.append(weights[index] + jittered)
      return weights_population

      def get_weights(self):
      return self.weights

      def train(self, epoch = 100, print_every = 1):
      lasttime = time.time()
      for i in range(epoch):
      population = []
      rewards = np.zeros(self.population_size)
      for k in range(self.population_size):
      x = []
      for w in self.weights:
      x.append(np.random.randn(*w.shape))
      population.append(x)
      for k in range(self.population_size):
      weights_population = self._get_weight_from_population(self.weights, population[k])
      rewards[k] = self.reward_function(weights_population)
      rewards = (rewards - np.mean(rewards)) / np.std(rewards)
      for index, w in enumerate(self.weights):
      A = np.array([p[index] for p in population])
      self.weights[index] = (
      w
      + self.learning_rate
      / (self.population_size * self.sigma)
      * np.dot(A.T, rewards).T
      )


      class Model:
      def __init__(self, input_size, layer_size, output_size):
      self.weights = [
      np.random.randn(input_size, layer_size),
      np.random.randn(layer_size, output_size),
      np.random.randn(layer_size, 1),
      np.random.randn(1, layer_size),
      ]

      def predict(self, inputs):
      feed = np.dot(inputs, self.weights[0]) + self.weights[-1]
      decision = np.dot(feed, self.weights[1])
      buy = np.dot(feed, self.weights[2])
      return decision, buy

      def get_weights(self):
      return self.weights

      def set_weights(self, weights):
      self.weights = weights


      class Agent:

      POPULATION_SIZE = 15
      SIGMA = 0.1
      LEARNING_RATE = 0.03

      def __init__(self, model, money, max_buy, max_sell):
      self.model = model
      self.initial_money = money
      self.max_buy = max_buy
      self.max_sell = max_sell
      self.es = Deep_Evolution_Strategy(
      self.model.get_weights(),
      self.get_reward,
      self.POPULATION_SIZE,
      self.SIGMA,
      self.LEARNING_RATE,
      )

      def act(self, sequence):
      decision, buy = self.model.predict(np.array(sequence))
      return np.argmax(decision[0]), int(buy[0])

      def get_reward(self, weights):
      initial_money = self.initial_money
      starting_money = initial_money
      self.model.weights = weights
      state = get_state(close, 0, window_size + 1)
      inventory = []
      quantity = 0
      for t in range(0, l, skip):
      action, buy = self.act(state)
      next_state = get_state(close, t + 1, window_size + 1)
      if action == 1 and initial_money >= close[t]:
      if buy < 0:
      buy = 1
      if buy > self.max_buy:
      buy_units = self.max_buy
      else:
      buy_units = buy
      total_buy = buy_units * close[t]
      initial_money -= total_buy
      inventory.append(total_buy)
      quantity += buy_units
      elif action == 2 and len(inventory) > 0:
      if quantity > self.max_sell:
      sell_units = self.max_sell
      else:
      sell_units = quantity
      quantity -= sell_units
      total_sell = sell_units * close[t]
      initial_money += total_sell

      state = next_state
      return ((initial_money - starting_money) / starting_money) * 100

      def fit(self, iterations, checkpoint):
      self.es.train(iterations, print_every = checkpoint)

      def buy(self):
      initial_money = self.initial_money
      state = get_state(close, 0, window_size + 1)
      starting_money = initial_money
      states_sell = []
      states_buy = []
      inventory = []
      quantity = 0
      for t in range(0, l, skip):
      action, buy = self.act(state)
      next_state = get_state(close, t + 1, window_size + 1)
      if action == 1 and initial_money >= close[t]:
      if buy < 0:
      buy = 1
      if buy > self.max_buy:
      buy_units = self.max_buy
      else:
      buy_units = buy
      total_buy = buy_units * close[t]
      initial_money -= total_buy
      inventory.append(total_buy)
      quantity += buy_units
      states_buy.append(t)
      elif action == 2 and len(inventory) > 0:
      bought_price = inventory.pop(0)
      if quantity > self.max_sell:
      sell_units = self.max_sell
      else:
      sell_units = quantity
      if sell_units < 1:
      continue
      quantity -= sell_units
      total_sell = sell_units * close[t]
      initial_money += total_sell
      states_sell.append(t)
      try:
      invest = ((total_sell - bought_price) / bought_price) * 100
      except:
      invest = 0
      state = next_state

      invest = ((initial_money - starting_money) / starting_money) * 100

      model = Model(window_size, 500, 3)
      agent = Agent(model, 10000, 5, 5)
      agent.fit(500, 10)
      agent.buy()


      As you can see, it is being used for stock prediction and it only uses the Close column, but I would like to try it with more parameters, let's say High and Low.



      I'm struggling when I need to change it to use this 2 dimensional list. I've tried a simple change:



      close = df.loc[:,['Close','Open']].values.tolist()


      Which adds one more property at every row of the list. But when I run the code I start to see errors when I execute the agent.fit() call:



      agent.fit(iterations = 500, checkpoint = 10)

      ---------------------------------------------------------------------------
      TypeError Traceback (most recent call last)
      <ipython-input-225-d97697984016> in <module>()
      ----> 1 agent.fit(iterations = 500, checkpoint = 10)

      <ipython-input-223-35d9fbba5756> in fit(self, iterations, checkpoint)
      66
      67 def fit(self, iterations, checkpoint):
      ---> 68 self.es.train(iterations, print_every = checkpoint)
      69
      70 def buy(self):

      <ipython-input-220-84ca345091f4> in train(self, epoch, print_every)
      33 self.weights, population[k]
      34 )
      ---> 35 rewards[k] = self.reward_function(weights_population)
      36 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
      37

      <ipython-input-223-35d9fbba5756> in get_reward(self, weights)
      36
      37 self.model.weights = weights
      ---> 38 state = get_state(self.close, 0, self.window_size + 1)
      39 inventory = []
      40 quantity = 0

      <ipython-input-219-0df8d8be24a9> in get_state(data, t, n)
      4 res = []
      5 for i in range(n - 1):
      ----> 6 res.append(block[i + 1] - block[i])
      7 return np.array([res])

      TypeError: unsupported operand type(s) for -: 'list' and 'list'


      I assume that the first step is that I need to update my Model class to use a different input_size parameter right?



      Any help would be appreciated! Thanks







      python-3.x deep-learning artificial-intelligence reinforcement-learning evolutionary-algorithm






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 8 at 16:22







      lucaswerner

















      asked Mar 8 at 14:30









      lucaswernerlucaswerner

      547




      547






















          0






          active

          oldest

          votes












          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55065285%2fmultivariate-natural-evolution-strategy%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55065285%2fmultivariate-natural-evolution-strategy%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Identity Server 4 is not redirecting to Angular app after login2019 Community Moderator ElectionIdentity Server 4 and dockerIdentityserver implicit flow unauthorized_clientIdentityServer Hybrid Flow - Access Token is null after user successful loginIdentity Server to MVC client : Page Redirect After loginLogin with Steam OpenId(oidc-client-js)Identity Server 4+.NET Core 2.0 + IdentityIdentityServer4 post-login redirect not working in Edge browserCall to IdentityServer4 generates System.NullReferenceException: Object reference not set to an instance of an objectIdentityServer4 without HTTPS not workingHow to get Authorization code from identity server without login form

          2005 Ahvaz unrest Contents Background Causes Casualties Aftermath See also References Navigation menue"At Least 10 Are Killed by Bombs in Iran""Iran"Archived"Arab-Iranians in Iran to make April 15 'Day of Fury'"State of Mind, State of Order: Reactions to Ethnic Unrest in the Islamic Republic of Iran.10.1111/j.1754-9469.2008.00028.x"Iran hangs Arab separatists"Iran Overview from ArchivedConstitution of the Islamic Republic of Iran"Tehran puzzled by forged 'riots' letter""Iran and its minorities: Down in the second class""Iran: Handling Of Ahvaz Unrest Could End With Televised Confessions""Bombings Rock Iran Ahead of Election""Five die in Iran ethnic clashes""Iran: Need for restraint as anniversary of unrest in Khuzestan approaches"Archived"Iranian Sunni protesters killed in clashes with security forces"Archived

          Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme