what is causing F1-score high but Accuracy low in a deep learning model?What is the relationship between loss and validation accuracy?Is it always possible for validation accuracy to be as high as training accuracy?Reporting accuracy and loss issues with MonitoredTrainingSessionthe training accuracy steadily increase, but training loss decrease and then increasePython Keras LSTM learning converges too fast on high lossTest Accuracy Increases Whilst Loss IncreasesWhen to keep learned weights after change deep learning model/hyperparametersError in model performance metricsCNN with Tensorflow, low accuracy on CIFAR-10 and not improvingAccuracy in a CNN model never goes high for training and validation set

How can I raise concerns with a new DM about XP splitting?

Identify a stage play about a VR experience in which participants are encouraged to simulate performing horrific activities

node command while defining a coordinate in TikZ

Should my PhD thesis be submitted under my legal name?

Perfect riffle shuffles

A social experiment. What is the worst that can happen?

Can the harmonic series explain the origin of the major scale?

How did Monica know how to operate Carol's "designer"?

Superhero words!

My boss asked me to take a one-day class, then signs it up as a day off

Pronouncing Homer as in modern Greek

Invariance of results when scaling explanatory variables in logistic regression, is there a proof?

Visiting the UK as unmarried couple

For airliners, what prevents wing strikes on landing in bad weather?

What to do when my ideas aren't chosen, when I strongly disagree with the chosen solution?

Is exact Kanji stroke length important?

I2C signal and power over long range (10meter cable)

Can a malicious addon access internet history and such in chrome/firefox?

Can I Retrieve Email Addresses from BCC?

How will losing mobility of one hand affect my career as a programmer?

Teaching indefinite integrals that require special-casing

The most efficient algorithm to find all possible integer pairs which sum to a given integer

In Star Trek IV, why did the Bounty go back to a time when whales were already rare?

What (else) happened July 1st 1858 in London?



what is causing F1-score high but Accuracy low in a deep learning model?


What is the relationship between loss and validation accuracy?Is it always possible for validation accuracy to be as high as training accuracy?Reporting accuracy and loss issues with MonitoredTrainingSessionthe training accuracy steadily increase, but training loss decrease and then increasePython Keras LSTM learning converges too fast on high lossTest Accuracy Increases Whilst Loss IncreasesWhen to keep learned weights after change deep learning model/hyperparametersError in model performance metricsCNN with Tensorflow, low accuracy on CIFAR-10 and not improvingAccuracy in a CNN model never goes high for training and validation set













0















i'm using BERT base-uncased model to train NER on conll-2003 dataset. i just used BertForTokenClassification (from huggingface) for training which is kind of considering final sequence-layer and then adding final linear layer. where i'm able to produce below results.



with 6 epoch with train/dev data size:: 6973/1739
Test F1-Score: 0.8455102584598987
'test_loss': 0.18759359930737468, 'test_accuracy': 0.42335164835164835, 'global_step': 1308, 'loss': 0.03054473980611891
Validation F1-Score: 0.8771035676507356
'eval_loss': 0.13038920708013477, 'eval_accuracy': 0.4910168195718655, 'global_step': 1308, 'loss': 0.03054473980611891


for finding loss i'm using below functions.



def flat_accuracy(preds, labels):
pred_flat = np.argmax(preds, axis=2).flatten()
labels_flat = labels.flatten()
return np.sum(pred_flat == labels_flat) / len(labels_flat)
for each_batch:
tmp_eval_accuracy = flat_accc(pred_xx, label_ids_xx)
eval_accuracy += tmp_eval_accuracy
nb_eval_steps += 1
eval_accuracy = eval_accuracy / nb_eval_steps


if you have seen above results, it's really bad in terms of accuracy. my question is the method i'm using for finding accuracy is it right or wrong ? i believe it's right because it's just matching number of labels matched out of total labels. and finally sum of each small batch accuracy divide with total batch count.



but if you see, F1-score is coming high. and for F1 score i used (from seqeval.metrics import f1_score)



please tell me what are the possible causes/meaning behind it ?
and how can i know whether my model learned properly or not ? like it should have faced any bias-variance trade-off.



please let me know if you want more information for clarity..
Thanks in advance.










share|improve this question


























    0















    i'm using BERT base-uncased model to train NER on conll-2003 dataset. i just used BertForTokenClassification (from huggingface) for training which is kind of considering final sequence-layer and then adding final linear layer. where i'm able to produce below results.



    with 6 epoch with train/dev data size:: 6973/1739
    Test F1-Score: 0.8455102584598987
    'test_loss': 0.18759359930737468, 'test_accuracy': 0.42335164835164835, 'global_step': 1308, 'loss': 0.03054473980611891
    Validation F1-Score: 0.8771035676507356
    'eval_loss': 0.13038920708013477, 'eval_accuracy': 0.4910168195718655, 'global_step': 1308, 'loss': 0.03054473980611891


    for finding loss i'm using below functions.



    def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=2).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)
    for each_batch:
    tmp_eval_accuracy = flat_accc(pred_xx, label_ids_xx)
    eval_accuracy += tmp_eval_accuracy
    nb_eval_steps += 1
    eval_accuracy = eval_accuracy / nb_eval_steps


    if you have seen above results, it's really bad in terms of accuracy. my question is the method i'm using for finding accuracy is it right or wrong ? i believe it's right because it's just matching number of labels matched out of total labels. and finally sum of each small batch accuracy divide with total batch count.



    but if you see, F1-score is coming high. and for F1 score i used (from seqeval.metrics import f1_score)



    please tell me what are the possible causes/meaning behind it ?
    and how can i know whether my model learned properly or not ? like it should have faced any bias-variance trade-off.



    please let me know if you want more information for clarity..
    Thanks in advance.










    share|improve this question
























      0












      0








      0








      i'm using BERT base-uncased model to train NER on conll-2003 dataset. i just used BertForTokenClassification (from huggingface) for training which is kind of considering final sequence-layer and then adding final linear layer. where i'm able to produce below results.



      with 6 epoch with train/dev data size:: 6973/1739
      Test F1-Score: 0.8455102584598987
      'test_loss': 0.18759359930737468, 'test_accuracy': 0.42335164835164835, 'global_step': 1308, 'loss': 0.03054473980611891
      Validation F1-Score: 0.8771035676507356
      'eval_loss': 0.13038920708013477, 'eval_accuracy': 0.4910168195718655, 'global_step': 1308, 'loss': 0.03054473980611891


      for finding loss i'm using below functions.



      def flat_accuracy(preds, labels):
      pred_flat = np.argmax(preds, axis=2).flatten()
      labels_flat = labels.flatten()
      return np.sum(pred_flat == labels_flat) / len(labels_flat)
      for each_batch:
      tmp_eval_accuracy = flat_accc(pred_xx, label_ids_xx)
      eval_accuracy += tmp_eval_accuracy
      nb_eval_steps += 1
      eval_accuracy = eval_accuracy / nb_eval_steps


      if you have seen above results, it's really bad in terms of accuracy. my question is the method i'm using for finding accuracy is it right or wrong ? i believe it's right because it's just matching number of labels matched out of total labels. and finally sum of each small batch accuracy divide with total batch count.



      but if you see, F1-score is coming high. and for F1 score i used (from seqeval.metrics import f1_score)



      please tell me what are the possible causes/meaning behind it ?
      and how can i know whether my model learned properly or not ? like it should have faced any bias-variance trade-off.



      please let me know if you want more information for clarity..
      Thanks in advance.










      share|improve this question














      i'm using BERT base-uncased model to train NER on conll-2003 dataset. i just used BertForTokenClassification (from huggingface) for training which is kind of considering final sequence-layer and then adding final linear layer. where i'm able to produce below results.



      with 6 epoch with train/dev data size:: 6973/1739
      Test F1-Score: 0.8455102584598987
      'test_loss': 0.18759359930737468, 'test_accuracy': 0.42335164835164835, 'global_step': 1308, 'loss': 0.03054473980611891
      Validation F1-Score: 0.8771035676507356
      'eval_loss': 0.13038920708013477, 'eval_accuracy': 0.4910168195718655, 'global_step': 1308, 'loss': 0.03054473980611891


      for finding loss i'm using below functions.



      def flat_accuracy(preds, labels):
      pred_flat = np.argmax(preds, axis=2).flatten()
      labels_flat = labels.flatten()
      return np.sum(pred_flat == labels_flat) / len(labels_flat)
      for each_batch:
      tmp_eval_accuracy = flat_accc(pred_xx, label_ids_xx)
      eval_accuracy += tmp_eval_accuracy
      nb_eval_steps += 1
      eval_accuracy = eval_accuracy / nb_eval_steps


      if you have seen above results, it's really bad in terms of accuracy. my question is the method i'm using for finding accuracy is it right or wrong ? i believe it's right because it's just matching number of labels matched out of total labels. and finally sum of each small batch accuracy divide with total batch count.



      but if you see, F1-score is coming high. and for F1 score i used (from seqeval.metrics import f1_score)



      please tell me what are the possible causes/meaning behind it ?
      and how can i know whether my model learned properly or not ? like it should have faced any bias-variance trade-off.



      please let me know if you want more information for clarity..
      Thanks in advance.







      deep-learning ner






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 8 at 8:06









      DONDON

      206




      206






















          0






          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55058998%2fwhat-is-causing-f1-score-high-but-accuracy-low-in-a-deep-learning-model%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55058998%2fwhat-is-causing-f1-score-high-but-accuracy-low-in-a-deep-learning-model%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

          Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

          List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229