what is causing F1-score high but Accuracy low in a deep learning model?What is the relationship between loss and validation accuracy?Is it always possible for validation accuracy to be as high as training accuracy?Reporting accuracy and loss issues with MonitoredTrainingSessionthe training accuracy steadily increase, but training loss decrease and then increasePython Keras LSTM learning converges too fast on high lossTest Accuracy Increases Whilst Loss IncreasesWhen to keep learned weights after change deep learning model/hyperparametersError in model performance metricsCNN with Tensorflow, low accuracy on CIFAR-10 and not improvingAccuracy in a CNN model never goes high for training and validation set
How can I raise concerns with a new DM about XP splitting?
Identify a stage play about a VR experience in which participants are encouraged to simulate performing horrific activities
node command while defining a coordinate in TikZ
Should my PhD thesis be submitted under my legal name?
Perfect riffle shuffles
A social experiment. What is the worst that can happen?
Can the harmonic series explain the origin of the major scale?
How did Monica know how to operate Carol's "designer"?
Superhero words!
My boss asked me to take a one-day class, then signs it up as a day off
Pronouncing Homer as in modern Greek
Invariance of results when scaling explanatory variables in logistic regression, is there a proof?
Visiting the UK as unmarried couple
For airliners, what prevents wing strikes on landing in bad weather?
What to do when my ideas aren't chosen, when I strongly disagree with the chosen solution?
Is exact Kanji stroke length important?
I2C signal and power over long range (10meter cable)
Can a malicious addon access internet history and such in chrome/firefox?
Can I Retrieve Email Addresses from BCC?
How will losing mobility of one hand affect my career as a programmer?
Teaching indefinite integrals that require special-casing
The most efficient algorithm to find all possible integer pairs which sum to a given integer
In Star Trek IV, why did the Bounty go back to a time when whales were already rare?
What (else) happened July 1st 1858 in London?
what is causing F1-score high but Accuracy low in a deep learning model?
What is the relationship between loss and validation accuracy?Is it always possible for validation accuracy to be as high as training accuracy?Reporting accuracy and loss issues with MonitoredTrainingSessionthe training accuracy steadily increase, but training loss decrease and then increasePython Keras LSTM learning converges too fast on high lossTest Accuracy Increases Whilst Loss IncreasesWhen to keep learned weights after change deep learning model/hyperparametersError in model performance metricsCNN with Tensorflow, low accuracy on CIFAR-10 and not improvingAccuracy in a CNN model never goes high for training and validation set
i'm using BERT base-uncased model to train NER on conll-2003 dataset. i just used BertForTokenClassification (from huggingface) for training which is kind of considering final sequence-layer and then adding final linear layer. where i'm able to produce below results.
with 6 epoch with train/dev data size:: 6973/1739
Test F1-Score: 0.8455102584598987
'test_loss': 0.18759359930737468, 'test_accuracy': 0.42335164835164835, 'global_step': 1308, 'loss': 0.03054473980611891
Validation F1-Score: 0.8771035676507356
'eval_loss': 0.13038920708013477, 'eval_accuracy': 0.4910168195718655, 'global_step': 1308, 'loss': 0.03054473980611891
for finding loss i'm using below functions.
def flat_accuracy(preds, labels):
pred_flat = np.argmax(preds, axis=2).flatten()
labels_flat = labels.flatten()
return np.sum(pred_flat == labels_flat) / len(labels_flat)
for each_batch:
tmp_eval_accuracy = flat_accc(pred_xx, label_ids_xx)
eval_accuracy += tmp_eval_accuracy
nb_eval_steps += 1
eval_accuracy = eval_accuracy / nb_eval_steps
if you have seen above results, it's really bad in terms of accuracy. my question is the method i'm using for finding accuracy is it right or wrong ? i believe it's right because it's just matching number of labels matched out of total labels. and finally sum of each small batch accuracy divide with total batch count.
but if you see, F1-score is coming high. and for F1 score i used (from seqeval.metrics import f1_score)
please tell me what are the possible causes/meaning behind it ?
and how can i know whether my model learned properly or not ? like it should have faced any bias-variance trade-off.
please let me know if you want more information for clarity..
Thanks in advance.
deep-learning ner
add a comment |
i'm using BERT base-uncased model to train NER on conll-2003 dataset. i just used BertForTokenClassification (from huggingface) for training which is kind of considering final sequence-layer and then adding final linear layer. where i'm able to produce below results.
with 6 epoch with train/dev data size:: 6973/1739
Test F1-Score: 0.8455102584598987
'test_loss': 0.18759359930737468, 'test_accuracy': 0.42335164835164835, 'global_step': 1308, 'loss': 0.03054473980611891
Validation F1-Score: 0.8771035676507356
'eval_loss': 0.13038920708013477, 'eval_accuracy': 0.4910168195718655, 'global_step': 1308, 'loss': 0.03054473980611891
for finding loss i'm using below functions.
def flat_accuracy(preds, labels):
pred_flat = np.argmax(preds, axis=2).flatten()
labels_flat = labels.flatten()
return np.sum(pred_flat == labels_flat) / len(labels_flat)
for each_batch:
tmp_eval_accuracy = flat_accc(pred_xx, label_ids_xx)
eval_accuracy += tmp_eval_accuracy
nb_eval_steps += 1
eval_accuracy = eval_accuracy / nb_eval_steps
if you have seen above results, it's really bad in terms of accuracy. my question is the method i'm using for finding accuracy is it right or wrong ? i believe it's right because it's just matching number of labels matched out of total labels. and finally sum of each small batch accuracy divide with total batch count.
but if you see, F1-score is coming high. and for F1 score i used (from seqeval.metrics import f1_score)
please tell me what are the possible causes/meaning behind it ?
and how can i know whether my model learned properly or not ? like it should have faced any bias-variance trade-off.
please let me know if you want more information for clarity..
Thanks in advance.
deep-learning ner
add a comment |
i'm using BERT base-uncased model to train NER on conll-2003 dataset. i just used BertForTokenClassification (from huggingface) for training which is kind of considering final sequence-layer and then adding final linear layer. where i'm able to produce below results.
with 6 epoch with train/dev data size:: 6973/1739
Test F1-Score: 0.8455102584598987
'test_loss': 0.18759359930737468, 'test_accuracy': 0.42335164835164835, 'global_step': 1308, 'loss': 0.03054473980611891
Validation F1-Score: 0.8771035676507356
'eval_loss': 0.13038920708013477, 'eval_accuracy': 0.4910168195718655, 'global_step': 1308, 'loss': 0.03054473980611891
for finding loss i'm using below functions.
def flat_accuracy(preds, labels):
pred_flat = np.argmax(preds, axis=2).flatten()
labels_flat = labels.flatten()
return np.sum(pred_flat == labels_flat) / len(labels_flat)
for each_batch:
tmp_eval_accuracy = flat_accc(pred_xx, label_ids_xx)
eval_accuracy += tmp_eval_accuracy
nb_eval_steps += 1
eval_accuracy = eval_accuracy / nb_eval_steps
if you have seen above results, it's really bad in terms of accuracy. my question is the method i'm using for finding accuracy is it right or wrong ? i believe it's right because it's just matching number of labels matched out of total labels. and finally sum of each small batch accuracy divide with total batch count.
but if you see, F1-score is coming high. and for F1 score i used (from seqeval.metrics import f1_score)
please tell me what are the possible causes/meaning behind it ?
and how can i know whether my model learned properly or not ? like it should have faced any bias-variance trade-off.
please let me know if you want more information for clarity..
Thanks in advance.
deep-learning ner
i'm using BERT base-uncased model to train NER on conll-2003 dataset. i just used BertForTokenClassification (from huggingface) for training which is kind of considering final sequence-layer and then adding final linear layer. where i'm able to produce below results.
with 6 epoch with train/dev data size:: 6973/1739
Test F1-Score: 0.8455102584598987
'test_loss': 0.18759359930737468, 'test_accuracy': 0.42335164835164835, 'global_step': 1308, 'loss': 0.03054473980611891
Validation F1-Score: 0.8771035676507356
'eval_loss': 0.13038920708013477, 'eval_accuracy': 0.4910168195718655, 'global_step': 1308, 'loss': 0.03054473980611891
for finding loss i'm using below functions.
def flat_accuracy(preds, labels):
pred_flat = np.argmax(preds, axis=2).flatten()
labels_flat = labels.flatten()
return np.sum(pred_flat == labels_flat) / len(labels_flat)
for each_batch:
tmp_eval_accuracy = flat_accc(pred_xx, label_ids_xx)
eval_accuracy += tmp_eval_accuracy
nb_eval_steps += 1
eval_accuracy = eval_accuracy / nb_eval_steps
if you have seen above results, it's really bad in terms of accuracy. my question is the method i'm using for finding accuracy is it right or wrong ? i believe it's right because it's just matching number of labels matched out of total labels. and finally sum of each small batch accuracy divide with total batch count.
but if you see, F1-score is coming high. and for F1 score i used (from seqeval.metrics import f1_score)
please tell me what are the possible causes/meaning behind it ?
and how can i know whether my model learned properly or not ? like it should have faced any bias-variance trade-off.
please let me know if you want more information for clarity..
Thanks in advance.
deep-learning ner
deep-learning ner
asked Mar 8 at 8:06
DONDON
206
206
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55058998%2fwhat-is-causing-f1-score-high-but-accuracy-low-in-a-deep-learning-model%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55058998%2fwhat-is-causing-f1-score-high-but-accuracy-low-in-a-deep-learning-model%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown