Tensorflow premade estimator is much slower than custom?2019 Community Moderator ElectionWhy is reading lines from stdin much slower in C++ than Python?Tensorflow: Input pipeline with sparse data for the SVM estimatorHow to get train loss and evaluate loss every global step in Tensorflow Estimator?Tensorflow custom Estimator with Dataset API: embedding lookup (feature_column) NMT taskloading a tensorflow Estimator export_savedmodel() and predicting on tfrecord datasetUsing Tensorflow Estimator API with Images for SemSegTensorflow Estimator: loss not decreasing when using tf.feature_column.embedding_column for a list of categorical variablesStopping criteria for pre-made estimators in TensorFlowtensorflow estimator training only runs half of the stepsTraining Estimators less than one epoch using dataset API?

Should I file my taxes? No income, unemployed, but paid 2k in student loan interest

The (Easy) Road to Code

Tabular environment - text vertically positions itself by bottom of tikz picture in adjacent cell

Professor forcing me to attend a conference, I can't afford even with 50% funding

How can I portion out frozen cookie dough?

What is the orbit and expected lifetime of Crew Dragon trunk?

Who has more? Ireland or Iceland?

What is the best index strategy or query SELECT when performing a search/lookup BETWEEN IP address (IPv4 and IPv6) ranges?

Paper published similar to PhD thesis

Boss Telling direct supervisor I snitched

Can I challenge the interviewer to give me a proper technical feedback?

What is better: yes / no radio, or simple checkbox?

What exactly is the meaning of "fine wine"?

Vector-transposing function

Use Mercury as quenching liquid for swords?

Should we avoid writing fiction about historical events without extensive research?

Having the player face themselves after the mid-game

Can Witch Sight see through Mirror Image?

ESPP--any reason not to go all in?

Short story about cities being connected by a conveyor belt

Why do phishing e-mails use faked e-mail addresses instead of the real one?

“I had a flat in the centre of town, but I didn’t like living there, so …”

How does learning spells work when leveling a multiclass character?

What is Tony Stark injecting into himself in Iron Man 3?



Tensorflow premade estimator is much slower than custom?



2019 Community Moderator ElectionWhy is reading lines from stdin much slower in C++ than Python?Tensorflow: Input pipeline with sparse data for the SVM estimatorHow to get train loss and evaluate loss every global step in Tensorflow Estimator?Tensorflow custom Estimator with Dataset API: embedding lookup (feature_column) NMT taskloading a tensorflow Estimator export_savedmodel() and predicting on tfrecord datasetUsing Tensorflow Estimator API with Images for SemSegTensorflow Estimator: loss not decreasing when using tf.feature_column.embedding_column for a list of categorical variablesStopping criteria for pre-made estimators in TensorFlowtensorflow estimator training only runs half of the stepsTraining Estimators less than one epoch using dataset API?










0















I'm benchmarking general TF operations, and so to establish a baseline I'm trying to figure out how quickly I can train a simple logistic regression with a single pass of the training data. My input is a TFRecord file containing 860,000 sparse rows, with 164,000 one-hot encoded features. Data processing at bottom.



A premade tf.estimator.Estimator, configured like so, can fit one pass of the data in 932 seconds :



feature_columns = [tf.feature_column.numeric_column(key='features',shape=164000)]

custom_config = tf.estimator.RunConfig(save_summary_steps=None,
save_checkpoints_steps=None)

estimator = tf.estimator.LinearClassifier(
feature_columns = feature_columns,
model_dir = os.path.join(MODELDIR,f'PremadeLinearClassifier_currtime()'),
config = custom_config
)


If I read the data into numpy arrays and create a dataset from from_tensor_slices() I can get that down to 467 seconds.



If I build my own training functions, I can perform a single pass of the data in 66.6 seconds, reading from disk.



class LogisticModel(object):
def __init__(self):

self.W = tf.Variable(tf.random_normal([164000,1],mean=0, stddev=0.1))
self.B = tf.Variable(tf.random_normal([],mean=0.0,stddev=0.1))

def __call__(self, x):

return tf.sparse_tensor_dense_matmul(x,self.W) + self.B


def grad_fn(model, inputs, targets):

with tf.GradientTape() as t:
loss_val = loss_fn(model, inputs, targets)

return t.gradient(loss_val, [model.W, model.B])


def loss_fn(model, inputs, targets):

target_size = targets.shape.as_list()[0]
return (tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels = tf.reshape(targets,[target_size,1]),
logits = model(inputs)))
)

def perform_train(model, optim, dataset):

step = 0
for x, y in dataset:
grads = grad_fn(lm, x, y)
optimizer.apply_gradients(zip(grads, [lm.W, lm.B]),
global_step = tf.train.get_or_create_global_step())

if step % 20 == 0:
print(f"Step step: loss_fn(lm,x,y)")
step += 1


1) What could account for this huge speed difference? Is overhead in the estimators that significant?
2) Without reading into memory, can my custom function be improved further? An in-house C++-based library is still an order of magnitude faster.



Data generation:



def ex_to_tensors(ex, tensor_size):

feature_spec = 'sparse': tf.SparseFeature(index_key='indices',
value_key='values',
dtype=tf.int64,
size=tensor_size),
'label': tf.FixedLenFeature([], tf.int64, default_value=0)


parsed_dict = tf.parse_single_example(ex, feature_spec)

return tf.cast(parsed_dict['sparse'],tf.float32), tf.cast(parsed_dict['label'],tf.float32)


def ex_input_fn(*filenames,batch_size=1000, feature_size=int(1e6)):

def parseTensors(x):
return ex_to_tensors(x,feature_size)

dataset = (tf.data.TFRecordDataset(filenames)

.map(parseTensors)
.batch(batch_size)
)

return dataset









share|improve this question


























    0















    I'm benchmarking general TF operations, and so to establish a baseline I'm trying to figure out how quickly I can train a simple logistic regression with a single pass of the training data. My input is a TFRecord file containing 860,000 sparse rows, with 164,000 one-hot encoded features. Data processing at bottom.



    A premade tf.estimator.Estimator, configured like so, can fit one pass of the data in 932 seconds :



    feature_columns = [tf.feature_column.numeric_column(key='features',shape=164000)]

    custom_config = tf.estimator.RunConfig(save_summary_steps=None,
    save_checkpoints_steps=None)

    estimator = tf.estimator.LinearClassifier(
    feature_columns = feature_columns,
    model_dir = os.path.join(MODELDIR,f'PremadeLinearClassifier_currtime()'),
    config = custom_config
    )


    If I read the data into numpy arrays and create a dataset from from_tensor_slices() I can get that down to 467 seconds.



    If I build my own training functions, I can perform a single pass of the data in 66.6 seconds, reading from disk.



    class LogisticModel(object):
    def __init__(self):

    self.W = tf.Variable(tf.random_normal([164000,1],mean=0, stddev=0.1))
    self.B = tf.Variable(tf.random_normal([],mean=0.0,stddev=0.1))

    def __call__(self, x):

    return tf.sparse_tensor_dense_matmul(x,self.W) + self.B


    def grad_fn(model, inputs, targets):

    with tf.GradientTape() as t:
    loss_val = loss_fn(model, inputs, targets)

    return t.gradient(loss_val, [model.W, model.B])


    def loss_fn(model, inputs, targets):

    target_size = targets.shape.as_list()[0]
    return (tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels = tf.reshape(targets,[target_size,1]),
    logits = model(inputs)))
    )

    def perform_train(model, optim, dataset):

    step = 0
    for x, y in dataset:
    grads = grad_fn(lm, x, y)
    optimizer.apply_gradients(zip(grads, [lm.W, lm.B]),
    global_step = tf.train.get_or_create_global_step())

    if step % 20 == 0:
    print(f"Step step: loss_fn(lm,x,y)")
    step += 1


    1) What could account for this huge speed difference? Is overhead in the estimators that significant?
    2) Without reading into memory, can my custom function be improved further? An in-house C++-based library is still an order of magnitude faster.



    Data generation:



    def ex_to_tensors(ex, tensor_size):

    feature_spec = 'sparse': tf.SparseFeature(index_key='indices',
    value_key='values',
    dtype=tf.int64,
    size=tensor_size),
    'label': tf.FixedLenFeature([], tf.int64, default_value=0)


    parsed_dict = tf.parse_single_example(ex, feature_spec)

    return tf.cast(parsed_dict['sparse'],tf.float32), tf.cast(parsed_dict['label'],tf.float32)


    def ex_input_fn(*filenames,batch_size=1000, feature_size=int(1e6)):

    def parseTensors(x):
    return ex_to_tensors(x,feature_size)

    dataset = (tf.data.TFRecordDataset(filenames)

    .map(parseTensors)
    .batch(batch_size)
    )

    return dataset









    share|improve this question
























      0












      0








      0








      I'm benchmarking general TF operations, and so to establish a baseline I'm trying to figure out how quickly I can train a simple logistic regression with a single pass of the training data. My input is a TFRecord file containing 860,000 sparse rows, with 164,000 one-hot encoded features. Data processing at bottom.



      A premade tf.estimator.Estimator, configured like so, can fit one pass of the data in 932 seconds :



      feature_columns = [tf.feature_column.numeric_column(key='features',shape=164000)]

      custom_config = tf.estimator.RunConfig(save_summary_steps=None,
      save_checkpoints_steps=None)

      estimator = tf.estimator.LinearClassifier(
      feature_columns = feature_columns,
      model_dir = os.path.join(MODELDIR,f'PremadeLinearClassifier_currtime()'),
      config = custom_config
      )


      If I read the data into numpy arrays and create a dataset from from_tensor_slices() I can get that down to 467 seconds.



      If I build my own training functions, I can perform a single pass of the data in 66.6 seconds, reading from disk.



      class LogisticModel(object):
      def __init__(self):

      self.W = tf.Variable(tf.random_normal([164000,1],mean=0, stddev=0.1))
      self.B = tf.Variable(tf.random_normal([],mean=0.0,stddev=0.1))

      def __call__(self, x):

      return tf.sparse_tensor_dense_matmul(x,self.W) + self.B


      def grad_fn(model, inputs, targets):

      with tf.GradientTape() as t:
      loss_val = loss_fn(model, inputs, targets)

      return t.gradient(loss_val, [model.W, model.B])


      def loss_fn(model, inputs, targets):

      target_size = targets.shape.as_list()[0]
      return (tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels = tf.reshape(targets,[target_size,1]),
      logits = model(inputs)))
      )

      def perform_train(model, optim, dataset):

      step = 0
      for x, y in dataset:
      grads = grad_fn(lm, x, y)
      optimizer.apply_gradients(zip(grads, [lm.W, lm.B]),
      global_step = tf.train.get_or_create_global_step())

      if step % 20 == 0:
      print(f"Step step: loss_fn(lm,x,y)")
      step += 1


      1) What could account for this huge speed difference? Is overhead in the estimators that significant?
      2) Without reading into memory, can my custom function be improved further? An in-house C++-based library is still an order of magnitude faster.



      Data generation:



      def ex_to_tensors(ex, tensor_size):

      feature_spec = 'sparse': tf.SparseFeature(index_key='indices',
      value_key='values',
      dtype=tf.int64,
      size=tensor_size),
      'label': tf.FixedLenFeature([], tf.int64, default_value=0)


      parsed_dict = tf.parse_single_example(ex, feature_spec)

      return tf.cast(parsed_dict['sparse'],tf.float32), tf.cast(parsed_dict['label'],tf.float32)


      def ex_input_fn(*filenames,batch_size=1000, feature_size=int(1e6)):

      def parseTensors(x):
      return ex_to_tensors(x,feature_size)

      dataset = (tf.data.TFRecordDataset(filenames)

      .map(parseTensors)
      .batch(batch_size)
      )

      return dataset









      share|improve this question














      I'm benchmarking general TF operations, and so to establish a baseline I'm trying to figure out how quickly I can train a simple logistic regression with a single pass of the training data. My input is a TFRecord file containing 860,000 sparse rows, with 164,000 one-hot encoded features. Data processing at bottom.



      A premade tf.estimator.Estimator, configured like so, can fit one pass of the data in 932 seconds :



      feature_columns = [tf.feature_column.numeric_column(key='features',shape=164000)]

      custom_config = tf.estimator.RunConfig(save_summary_steps=None,
      save_checkpoints_steps=None)

      estimator = tf.estimator.LinearClassifier(
      feature_columns = feature_columns,
      model_dir = os.path.join(MODELDIR,f'PremadeLinearClassifier_currtime()'),
      config = custom_config
      )


      If I read the data into numpy arrays and create a dataset from from_tensor_slices() I can get that down to 467 seconds.



      If I build my own training functions, I can perform a single pass of the data in 66.6 seconds, reading from disk.



      class LogisticModel(object):
      def __init__(self):

      self.W = tf.Variable(tf.random_normal([164000,1],mean=0, stddev=0.1))
      self.B = tf.Variable(tf.random_normal([],mean=0.0,stddev=0.1))

      def __call__(self, x):

      return tf.sparse_tensor_dense_matmul(x,self.W) + self.B


      def grad_fn(model, inputs, targets):

      with tf.GradientTape() as t:
      loss_val = loss_fn(model, inputs, targets)

      return t.gradient(loss_val, [model.W, model.B])


      def loss_fn(model, inputs, targets):

      target_size = targets.shape.as_list()[0]
      return (tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels = tf.reshape(targets,[target_size,1]),
      logits = model(inputs)))
      )

      def perform_train(model, optim, dataset):

      step = 0
      for x, y in dataset:
      grads = grad_fn(lm, x, y)
      optimizer.apply_gradients(zip(grads, [lm.W, lm.B]),
      global_step = tf.train.get_or_create_global_step())

      if step % 20 == 0:
      print(f"Step step: loss_fn(lm,x,y)")
      step += 1


      1) What could account for this huge speed difference? Is overhead in the estimators that significant?
      2) Without reading into memory, can my custom function be improved further? An in-house C++-based library is still an order of magnitude faster.



      Data generation:



      def ex_to_tensors(ex, tensor_size):

      feature_spec = 'sparse': tf.SparseFeature(index_key='indices',
      value_key='values',
      dtype=tf.int64,
      size=tensor_size),
      'label': tf.FixedLenFeature([], tf.int64, default_value=0)


      parsed_dict = tf.parse_single_example(ex, feature_spec)

      return tf.cast(parsed_dict['sparse'],tf.float32), tf.cast(parsed_dict['label'],tf.float32)


      def ex_input_fn(*filenames,batch_size=1000, feature_size=int(1e6)):

      def parseTensors(x):
      return ex_to_tensors(x,feature_size)

      dataset = (tf.data.TFRecordDataset(filenames)

      .map(parseTensors)
      .batch(batch_size)
      )

      return dataset






      python tensorflow sparse-matrix tensorflow-estimator sparse-file






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 2 days ago









      Patrick McCarthyPatrick McCarthy

      1,33111331




      1,33111331






















          0






          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55026429%2ftensorflow-premade-estimator-is-much-slower-than-custom%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55026429%2ftensorflow-premade-estimator-is-much-slower-than-custom%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

          Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

          List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229