what is causing F1-score high but Accuracy low in a deep learning model?What is the relationship between loss and validation accuracy?Is it always possible for validation accuracy to be as high as training accuracy?Reporting accuracy and loss issues with MonitoredTrainingSessionthe training accuracy steadily increase, but training loss decrease and then increasePython Keras LSTM learning converges too fast on high lossTest Accuracy Increases Whilst Loss IncreasesWhen to keep learned weights after change deep learning model/hyperparametersError in model performance metricsCNN with Tensorflow, low accuracy on CIFAR-10 and not improvingAccuracy in a CNN model never goes high for training and validation set

How can I raise concerns with a new DM about XP splitting?

Identify a stage play about a VR experience in which participants are encouraged to simulate performing horrific activities

node command while defining a coordinate in TikZ

Should my PhD thesis be submitted under my legal name?

Perfect riffle shuffles

A social experiment. What is the worst that can happen?

Can the harmonic series explain the origin of the major scale?

How did Monica know how to operate Carol's "designer"?

Superhero words!

My boss asked me to take a one-day class, then signs it up as a day off

Pronouncing Homer as in modern Greek

Invariance of results when scaling explanatory variables in logistic regression, is there a proof?

Visiting the UK as unmarried couple

For airliners, what prevents wing strikes on landing in bad weather?

What to do when my ideas aren't chosen, when I strongly disagree with the chosen solution?

Is exact Kanji stroke length important?

I2C signal and power over long range (10meter cable)

Can a malicious addon access internet history and such in chrome/firefox?

Can I Retrieve Email Addresses from BCC?

How will losing mobility of one hand affect my career as a programmer?

Teaching indefinite integrals that require special-casing

The most efficient algorithm to find all possible integer pairs which sum to a given integer

In Star Trek IV, why did the Bounty go back to a time when whales were already rare?

What (else) happened July 1st 1858 in London?

what is causing F1-score high but Accuracy low in a deep learning model?

What is the relationship between loss and validation accuracy?Is it always possible for validation accuracy to be as high as training accuracy?Reporting accuracy and loss issues with MonitoredTrainingSessionthe training accuracy steadily increase, but training loss decrease and then increasePython Keras LSTM learning converges too fast on high lossTest Accuracy Increases Whilst Loss IncreasesWhen to keep learned weights after change deep learning model/hyperparametersError in model performance metricsCNN with Tensorflow, low accuracy on CIFAR-10 and not improvingAccuracy in a CNN model never goes high for training and validation set

i'm using BERT base-uncased model to train NER on conll-2003 dataset. i just used BertForTokenClassification (from huggingface) for training which is kind of considering final sequence-layer and then adding final linear layer. where i'm able to produce below results.

with 6 epoch with train/dev data size:: 6973/1739
Test F1-Score: 0.8455102584598987
'test_loss': 0.18759359930737468, 'test_accuracy': 0.42335164835164835, 'global_step': 1308, 'loss': 0.03054473980611891
Validation F1-Score: 0.8771035676507356
'eval_loss': 0.13038920708013477, 'eval_accuracy': 0.4910168195718655, 'global_step': 1308, 'loss': 0.03054473980611891

for finding loss i'm using below functions.

def flat_accuracy(preds, labels):
 pred_flat = np.argmax(preds, axis=2).flatten()
 labels_flat = labels.flatten()
 return np.sum(pred_flat == labels_flat) / len(labels_flat)
for each_batch:
 tmp_eval_accuracy = flat_accc(pred_xx, label_ids_xx)
 eval_accuracy += tmp_eval_accuracy
 nb_eval_steps += 1
eval_accuracy = eval_accuracy / nb_eval_steps

if you have seen above results, it's really bad in terms of accuracy. my question is the method i'm using for finding accuracy is it right or wrong ? i believe it's right because it's just matching number of labels matched out of total labels. and finally sum of each small batch accuracy divide with total batch count.

but if you see, F1-score is coming high. and for F1 score i used (from seqeval.metrics import f1_score)

please tell me what are the possible causes/meaning behind it ?
and how can i know whether my model learned properly or not ? like it should have faced any bias-variance trade-off.

please let me know if you want more information for clarity..
Thanks in advance.

asked Mar 8 at 8:06

DON

206

add a comment |

with 6 epoch with train/dev data size:: 6973/1739
Test F1-Score: 0.8455102584598987
'test_loss': 0.18759359930737468, 'test_accuracy': 0.42335164835164835, 'global_step': 1308, 'loss': 0.03054473980611891
Validation F1-Score: 0.8771035676507356
'eval_loss': 0.13038920708013477, 'eval_accuracy': 0.4910168195718655, 'global_step': 1308, 'loss': 0.03054473980611891

for finding loss i'm using below functions.

def flat_accuracy(preds, labels):
 pred_flat = np.argmax(preds, axis=2).flatten()
 labels_flat = labels.flatten()
 return np.sum(pred_flat == labels_flat) / len(labels_flat)
for each_batch:
 tmp_eval_accuracy = flat_accc(pred_xx, label_ids_xx)
 eval_accuracy += tmp_eval_accuracy
 nb_eval_steps += 1
eval_accuracy = eval_accuracy / nb_eval_steps

but if you see, F1-score is coming high. and for F1 score i used (from seqeval.metrics import f1_score)

please tell me what are the possible causes/meaning behind it ?
and how can i know whether my model learned properly or not ? like it should have faced any bias-variance trade-off.

please let me know if you want more information for clarity..
Thanks in advance.

asked Mar 8 at 8:06

DON

206

add a comment |

with 6 epoch with train/dev data size:: 6973/1739
Test F1-Score: 0.8455102584598987
'test_loss': 0.18759359930737468, 'test_accuracy': 0.42335164835164835, 'global_step': 1308, 'loss': 0.03054473980611891
Validation F1-Score: 0.8771035676507356
'eval_loss': 0.13038920708013477, 'eval_accuracy': 0.4910168195718655, 'global_step': 1308, 'loss': 0.03054473980611891

for finding loss i'm using below functions.

def flat_accuracy(preds, labels):
 pred_flat = np.argmax(preds, axis=2).flatten()
 labels_flat = labels.flatten()
 return np.sum(pred_flat == labels_flat) / len(labels_flat)
for each_batch:
 tmp_eval_accuracy = flat_accc(pred_xx, label_ids_xx)
 eval_accuracy += tmp_eval_accuracy
 nb_eval_steps += 1
eval_accuracy = eval_accuracy / nb_eval_steps

but if you see, F1-score is coming high. and for F1 score i used (from seqeval.metrics import f1_score)

please tell me what are the possible causes/meaning behind it ?
and how can i know whether my model learned properly or not ? like it should have faced any bias-variance trade-off.

please let me know if you want more information for clarity..
Thanks in advance.

asked Mar 8 at 8:06

DON

206

with 6 epoch with train/dev data size:: 6973/1739
Test F1-Score: 0.8455102584598987
'test_loss': 0.18759359930737468, 'test_accuracy': 0.42335164835164835, 'global_step': 1308, 'loss': 0.03054473980611891
Validation F1-Score: 0.8771035676507356
'eval_loss': 0.13038920708013477, 'eval_accuracy': 0.4910168195718655, 'global_step': 1308, 'loss': 0.03054473980611891

for finding loss i'm using below functions.

def flat_accuracy(preds, labels):
 pred_flat = np.argmax(preds, axis=2).flatten()
 labels_flat = labels.flatten()
 return np.sum(pred_flat == labels_flat) / len(labels_flat)
for each_batch:
 tmp_eval_accuracy = flat_accc(pred_xx, label_ids_xx)
 eval_accuracy += tmp_eval_accuracy
 nb_eval_steps += 1
eval_accuracy = eval_accuracy / nb_eval_steps

but if you see, F1-score is coming high. and for F1 score i used (from seqeval.metrics import f1_score)

please tell me what are the possible causes/meaning behind it ?
and how can i know whether my model learned properly or not ? like it should have faced any bias-variance trade-off.

please let me know if you want more information for clarity..
Thanks in advance.

deep-learning ner

asked Mar 8 at 8:06

DON

206

asked Mar 8 at 8:06

DON

206

asked Mar 8 at 8:06

DON

206

asked Mar 8 at 8:06

DON

206

asked Mar 8 at 8:06

DON

206

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55058998%2fwhat-is-causing-f1-score-high-but-accuracy-low-in-a-deep-learning-model%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggtcf

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Thal And Out Agency railway station See also References External links Navigation menuOfficial Web Site of Pakistan RailwaysArchivedOfficial Web Site of Pakistan Railwayseeexpanding ite

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Thal And Out Agency railway station See also References External links Navigation menuOfficial Web Site of Pakistan RailwaysArchivedOfficial Web Site of Pakistan Railwayseeexpanding ite