why does the results of reading hardware counters with papi depend on PAPI_library_init position?2019 Community Moderator ElectionWhy does printf not flush after the call unless a newline is in the format string?Why does sizeof(x++) not increment x?Why does the C preprocessor interpret the word “linux” as the constant “1”?Why does ENOENT mean “No such file or directory”?Trouble reading and writing to files in CPAPI Counter IssuesGET COUNTERS with Intel trace Collector&Analyzer and PAPIMonitor performance counters of running process using papiperformance counters values return zero using papi attachInstalling papi on virtualbox(use counters on virtual machine )
Why are there no stars visible in cislunar space?
DisplayForm problem with pi in FractionBox
What kind of footwear is suitable for walking in micro gravity environment?
What is the difference between something being completely legal and being completely decriminalized?
PTIJ: Which Dr. Seuss books should one obtain?
Is there any common country to visit for uk and schengen visa?
What are the consequences of changing the number of hours in a day?
How old is Nick Fury?
When did hardware antialiasing start being available?
What is 管理しきれず?
What is it called when someone votes for an option that's not their first choice?
Why is indicated airspeed rather than ground speed used during the takeoff roll?
Air travel with refrigerated insulin
Do I need to convey a moral for each of my blog post?
Knife as defense against stray dogs
Why didn’t Eve recognize the little cockroach as a living organism?
Why is there so much iron?
PTIJ: At the Passover Seder, is one allowed to speak more than once during Maggid?
Should a narrator ever describe things based on a characters view instead of fact?
What will the Frenchman say?
Why is this tree refusing to shed its dead leaves?
Determine voltage drop over 10G resistors with cheap multimeter
Weird lines in Microsoft Word
How to test the sharpness of a knife?
why does the results of reading hardware counters with papi depend on PAPI_library_init position?
2019 Community Moderator ElectionWhy does printf not flush after the call unless a newline is in the format string?Why does sizeof(x++) not increment x?Why does the C preprocessor interpret the word “linux” as the constant “1”?Why does ENOENT mean “No such file or directory”?Trouble reading and writing to files in CPAPI Counter IssuesGET COUNTERS with Intel trace Collector&Analyzer and PAPIMonitor performance counters of running process using papiperformance counters values return zero using papi attachInstalling papi on virtualbox(use counters on virtual machine )
I am using PAPI library for reading hardware counters. I have noticed that the order of calling PAPI_library_init(PAPI_VER_CURRENT) initialization has an influence on the results I get. My initialization and read of the array is like this:
int retval;
/*
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT)
fprintf(stderr, "PAPI library init error!n");
exit(1);
*/
for(int i=0; i < arr_size; i++)
array[i].value = 1;
//_mm_clflush(&array[i]); flushing does not make difference.
_mm_mfence();
for(int i=0; i < arr_size; i++)
temp = array[i].value ;
_mm_mfence();
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT)
fprintf(stderr, "PAPI library init error!n");
exit(1);
The necessity of second loop to read the array is for coherence protocol I believe but it should not be a big deal here. After this, I add native events of MEM_LOAD_RETIRED to the Eventset I want to read and I use PAPI_read around this third loop (I read it before and after the loop and at the end print the difference) :
for(int i=0; i < arr_size; i++)
temp = array[i].value ;
where arr_size is 1000 and each element of the array is 64 byte size(equal to cache line). I have disabled all the prefetchers . I compile with gcc -O3 flag for optimization and -lpapi library. with this code, for third loop I get:
L1_HIT: 64, L1_MISS: 1011, L2_HIT: 15, L2_MISS: 996.
However if I uncomment PAPI_library_init before the array initialization and comment it after, the results I get is:
L1_HIT: 73, L1_MISS: 1004, L2_HIT: 990, L2_MISS: 14.
I am testing this in skylake server, cache sizes are:
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 22528K
Now I am a bit confused why would papi initialization influence this results. it's L2 hit and miss that change. All I need is third loop, and the effect of first two loop on counters is not taken into account, I believe.
So any hint for this would be helpful as all the documentation says is this: "PAPI_library_init() initializes the PAPI library. It must be called before any low level PAPI functions can be used. If your application is making use of threads PAPI_thread_init (3) must also be called prior to making any calls to the library other than PAPI_library_init()."
c caching x86 papi
|
show 13 more comments
I am using PAPI library for reading hardware counters. I have noticed that the order of calling PAPI_library_init(PAPI_VER_CURRENT) initialization has an influence on the results I get. My initialization and read of the array is like this:
int retval;
/*
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT)
fprintf(stderr, "PAPI library init error!n");
exit(1);
*/
for(int i=0; i < arr_size; i++)
array[i].value = 1;
//_mm_clflush(&array[i]); flushing does not make difference.
_mm_mfence();
for(int i=0; i < arr_size; i++)
temp = array[i].value ;
_mm_mfence();
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT)
fprintf(stderr, "PAPI library init error!n");
exit(1);
The necessity of second loop to read the array is for coherence protocol I believe but it should not be a big deal here. After this, I add native events of MEM_LOAD_RETIRED to the Eventset I want to read and I use PAPI_read around this third loop (I read it before and after the loop and at the end print the difference) :
for(int i=0; i < arr_size; i++)
temp = array[i].value ;
where arr_size is 1000 and each element of the array is 64 byte size(equal to cache line). I have disabled all the prefetchers . I compile with gcc -O3 flag for optimization and -lpapi library. with this code, for third loop I get:
L1_HIT: 64, L1_MISS: 1011, L2_HIT: 15, L2_MISS: 996.
However if I uncomment PAPI_library_init before the array initialization and comment it after, the results I get is:
L1_HIT: 73, L1_MISS: 1004, L2_HIT: 990, L2_MISS: 14.
I am testing this in skylake server, cache sizes are:
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 22528K
Now I am a bit confused why would papi initialization influence this results. it's L2 hit and miss that change. All I need is third loop, and the effect of first two loop on counters is not taken into account, I believe.
So any hint for this would be helpful as all the documentation says is this: "PAPI_library_init() initializes the PAPI library. It must be called before any low level PAPI functions can be used. If your application is making use of threads PAPI_thread_init (3) must also be called prior to making any calls to the library other than PAPI_library_init()."
c caching x86 papi
Can you check without_mm_clflush(&array[i]);
? Can you check for smaller array sizes such as 500 and 300 elements instead of 1000? Did you have qualify the array declaration withvolatile
so that the compiler won't optimize away the loads at-O3
?
– Hadi Brais
Mar 7 at 18:20
@HadiBrais Yes I have volatile array so reading in temp won't be optimized. I will check without flush. now I am sure I will get the same result but I will try
– Ana Khorguani
Mar 7 at 18:30
@HadiBrais yes I have the same behavior without clflush. Just to make sure, this first loop, when I initialize array, it writes first time in the element and then evicts this cache line right? I did not suspect this before but I tested it today and surprisingly this is the observation I got. I read about on demand zeroing in another post, which as I understood was the reason of RFO case right? So is it somehow related to cache line eviction too?
– Ana Khorguani
Mar 7 at 18:37
for 500, initialization of PAPI after, gives result: L1_HIT: 62, L1_MISS: 513, L2_HIT: 16, L2_MISS: 497, and initializing before: L1_HIT: 67, L1_MISS: 510, L2_HIT: 377, L2_MISS: 133. seems to be same behavior. It's same for 300. After initialization: L1_HIT: 83, L1_MISS: 304, L2_HIT: 6, L2_MISS: 298. initializing before array: L1_HIT: 82, L1_MISS: 302, L2_HIT: 117, L2_MISS: 185,
– Ana Khorguani
Mar 7 at 18:48
It is as if PAPI_library_init is causing all the L2 lines to be evicted. Looking at the source code, I don't see why would this happen. What about MEM_LOAD_RETIRED.L3_MISS and MEM_LOAD_RETIRED.L3_HIT?
– Hadi Brais
Mar 7 at 18:58
|
show 13 more comments
I am using PAPI library for reading hardware counters. I have noticed that the order of calling PAPI_library_init(PAPI_VER_CURRENT) initialization has an influence on the results I get. My initialization and read of the array is like this:
int retval;
/*
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT)
fprintf(stderr, "PAPI library init error!n");
exit(1);
*/
for(int i=0; i < arr_size; i++)
array[i].value = 1;
//_mm_clflush(&array[i]); flushing does not make difference.
_mm_mfence();
for(int i=0; i < arr_size; i++)
temp = array[i].value ;
_mm_mfence();
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT)
fprintf(stderr, "PAPI library init error!n");
exit(1);
The necessity of second loop to read the array is for coherence protocol I believe but it should not be a big deal here. After this, I add native events of MEM_LOAD_RETIRED to the Eventset I want to read and I use PAPI_read around this third loop (I read it before and after the loop and at the end print the difference) :
for(int i=0; i < arr_size; i++)
temp = array[i].value ;
where arr_size is 1000 and each element of the array is 64 byte size(equal to cache line). I have disabled all the prefetchers . I compile with gcc -O3 flag for optimization and -lpapi library. with this code, for third loop I get:
L1_HIT: 64, L1_MISS: 1011, L2_HIT: 15, L2_MISS: 996.
However if I uncomment PAPI_library_init before the array initialization and comment it after, the results I get is:
L1_HIT: 73, L1_MISS: 1004, L2_HIT: 990, L2_MISS: 14.
I am testing this in skylake server, cache sizes are:
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 22528K
Now I am a bit confused why would papi initialization influence this results. it's L2 hit and miss that change. All I need is third loop, and the effect of first two loop on counters is not taken into account, I believe.
So any hint for this would be helpful as all the documentation says is this: "PAPI_library_init() initializes the PAPI library. It must be called before any low level PAPI functions can be used. If your application is making use of threads PAPI_thread_init (3) must also be called prior to making any calls to the library other than PAPI_library_init()."
c caching x86 papi
I am using PAPI library for reading hardware counters. I have noticed that the order of calling PAPI_library_init(PAPI_VER_CURRENT) initialization has an influence on the results I get. My initialization and read of the array is like this:
int retval;
/*
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT)
fprintf(stderr, "PAPI library init error!n");
exit(1);
*/
for(int i=0; i < arr_size; i++)
array[i].value = 1;
//_mm_clflush(&array[i]); flushing does not make difference.
_mm_mfence();
for(int i=0; i < arr_size; i++)
temp = array[i].value ;
_mm_mfence();
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT)
fprintf(stderr, "PAPI library init error!n");
exit(1);
The necessity of second loop to read the array is for coherence protocol I believe but it should not be a big deal here. After this, I add native events of MEM_LOAD_RETIRED to the Eventset I want to read and I use PAPI_read around this third loop (I read it before and after the loop and at the end print the difference) :
for(int i=0; i < arr_size; i++)
temp = array[i].value ;
where arr_size is 1000 and each element of the array is 64 byte size(equal to cache line). I have disabled all the prefetchers . I compile with gcc -O3 flag for optimization and -lpapi library. with this code, for third loop I get:
L1_HIT: 64, L1_MISS: 1011, L2_HIT: 15, L2_MISS: 996.
However if I uncomment PAPI_library_init before the array initialization and comment it after, the results I get is:
L1_HIT: 73, L1_MISS: 1004, L2_HIT: 990, L2_MISS: 14.
I am testing this in skylake server, cache sizes are:
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 22528K
Now I am a bit confused why would papi initialization influence this results. it's L2 hit and miss that change. All I need is third loop, and the effect of first two loop on counters is not taken into account, I believe.
So any hint for this would be helpful as all the documentation says is this: "PAPI_library_init() initializes the PAPI library. It must be called before any low level PAPI functions can be used. If your application is making use of threads PAPI_thread_init (3) must also be called prior to making any calls to the library other than PAPI_library_init()."
c caching x86 papi
c caching x86 papi
edited Mar 7 at 18:50
Ana Khorguani
asked Mar 7 at 10:35
Ana KhorguaniAna Khorguani
1097
1097
Can you check without_mm_clflush(&array[i]);
? Can you check for smaller array sizes such as 500 and 300 elements instead of 1000? Did you have qualify the array declaration withvolatile
so that the compiler won't optimize away the loads at-O3
?
– Hadi Brais
Mar 7 at 18:20
@HadiBrais Yes I have volatile array so reading in temp won't be optimized. I will check without flush. now I am sure I will get the same result but I will try
– Ana Khorguani
Mar 7 at 18:30
@HadiBrais yes I have the same behavior without clflush. Just to make sure, this first loop, when I initialize array, it writes first time in the element and then evicts this cache line right? I did not suspect this before but I tested it today and surprisingly this is the observation I got. I read about on demand zeroing in another post, which as I understood was the reason of RFO case right? So is it somehow related to cache line eviction too?
– Ana Khorguani
Mar 7 at 18:37
for 500, initialization of PAPI after, gives result: L1_HIT: 62, L1_MISS: 513, L2_HIT: 16, L2_MISS: 497, and initializing before: L1_HIT: 67, L1_MISS: 510, L2_HIT: 377, L2_MISS: 133. seems to be same behavior. It's same for 300. After initialization: L1_HIT: 83, L1_MISS: 304, L2_HIT: 6, L2_MISS: 298. initializing before array: L1_HIT: 82, L1_MISS: 302, L2_HIT: 117, L2_MISS: 185,
– Ana Khorguani
Mar 7 at 18:48
It is as if PAPI_library_init is causing all the L2 lines to be evicted. Looking at the source code, I don't see why would this happen. What about MEM_LOAD_RETIRED.L3_MISS and MEM_LOAD_RETIRED.L3_HIT?
– Hadi Brais
Mar 7 at 18:58
|
show 13 more comments
Can you check without_mm_clflush(&array[i]);
? Can you check for smaller array sizes such as 500 and 300 elements instead of 1000? Did you have qualify the array declaration withvolatile
so that the compiler won't optimize away the loads at-O3
?
– Hadi Brais
Mar 7 at 18:20
@HadiBrais Yes I have volatile array so reading in temp won't be optimized. I will check without flush. now I am sure I will get the same result but I will try
– Ana Khorguani
Mar 7 at 18:30
@HadiBrais yes I have the same behavior without clflush. Just to make sure, this first loop, when I initialize array, it writes first time in the element and then evicts this cache line right? I did not suspect this before but I tested it today and surprisingly this is the observation I got. I read about on demand zeroing in another post, which as I understood was the reason of RFO case right? So is it somehow related to cache line eviction too?
– Ana Khorguani
Mar 7 at 18:37
for 500, initialization of PAPI after, gives result: L1_HIT: 62, L1_MISS: 513, L2_HIT: 16, L2_MISS: 497, and initializing before: L1_HIT: 67, L1_MISS: 510, L2_HIT: 377, L2_MISS: 133. seems to be same behavior. It's same for 300. After initialization: L1_HIT: 83, L1_MISS: 304, L2_HIT: 6, L2_MISS: 298. initializing before array: L1_HIT: 82, L1_MISS: 302, L2_HIT: 117, L2_MISS: 185,
– Ana Khorguani
Mar 7 at 18:48
It is as if PAPI_library_init is causing all the L2 lines to be evicted. Looking at the source code, I don't see why would this happen. What about MEM_LOAD_RETIRED.L3_MISS and MEM_LOAD_RETIRED.L3_HIT?
– Hadi Brais
Mar 7 at 18:58
Can you check without
_mm_clflush(&array[i]);
? Can you check for smaller array sizes such as 500 and 300 elements instead of 1000? Did you have qualify the array declaration with volatile
so that the compiler won't optimize away the loads at -O3
?– Hadi Brais
Mar 7 at 18:20
Can you check without
_mm_clflush(&array[i]);
? Can you check for smaller array sizes such as 500 and 300 elements instead of 1000? Did you have qualify the array declaration with volatile
so that the compiler won't optimize away the loads at -O3
?– Hadi Brais
Mar 7 at 18:20
@HadiBrais Yes I have volatile array so reading in temp won't be optimized. I will check without flush. now I am sure I will get the same result but I will try
– Ana Khorguani
Mar 7 at 18:30
@HadiBrais Yes I have volatile array so reading in temp won't be optimized. I will check without flush. now I am sure I will get the same result but I will try
– Ana Khorguani
Mar 7 at 18:30
@HadiBrais yes I have the same behavior without clflush. Just to make sure, this first loop, when I initialize array, it writes first time in the element and then evicts this cache line right? I did not suspect this before but I tested it today and surprisingly this is the observation I got. I read about on demand zeroing in another post, which as I understood was the reason of RFO case right? So is it somehow related to cache line eviction too?
– Ana Khorguani
Mar 7 at 18:37
@HadiBrais yes I have the same behavior without clflush. Just to make sure, this first loop, when I initialize array, it writes first time in the element and then evicts this cache line right? I did not suspect this before but I tested it today and surprisingly this is the observation I got. I read about on demand zeroing in another post, which as I understood was the reason of RFO case right? So is it somehow related to cache line eviction too?
– Ana Khorguani
Mar 7 at 18:37
for 500, initialization of PAPI after, gives result: L1_HIT: 62, L1_MISS: 513, L2_HIT: 16, L2_MISS: 497, and initializing before: L1_HIT: 67, L1_MISS: 510, L2_HIT: 377, L2_MISS: 133. seems to be same behavior. It's same for 300. After initialization: L1_HIT: 83, L1_MISS: 304, L2_HIT: 6, L2_MISS: 298. initializing before array: L1_HIT: 82, L1_MISS: 302, L2_HIT: 117, L2_MISS: 185,
– Ana Khorguani
Mar 7 at 18:48
for 500, initialization of PAPI after, gives result: L1_HIT: 62, L1_MISS: 513, L2_HIT: 16, L2_MISS: 497, and initializing before: L1_HIT: 67, L1_MISS: 510, L2_HIT: 377, L2_MISS: 133. seems to be same behavior. It's same for 300. After initialization: L1_HIT: 83, L1_MISS: 304, L2_HIT: 6, L2_MISS: 298. initializing before array: L1_HIT: 82, L1_MISS: 302, L2_HIT: 117, L2_MISS: 185,
– Ana Khorguani
Mar 7 at 18:48
It is as if PAPI_library_init is causing all the L2 lines to be evicted. Looking at the source code, I don't see why would this happen. What about MEM_LOAD_RETIRED.L3_MISS and MEM_LOAD_RETIRED.L3_HIT?
– Hadi Brais
Mar 7 at 18:58
It is as if PAPI_library_init is causing all the L2 lines to be evicted. Looking at the source code, I don't see why would this happen. What about MEM_LOAD_RETIRED.L3_MISS and MEM_LOAD_RETIRED.L3_HIT?
– Hadi Brais
Mar 7 at 18:58
|
show 13 more comments
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55041684%2fwhy-does-the-results-of-reading-hardware-counters-with-papi-depend-on-papi-libra%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55041684%2fwhy-does-the-results-of-reading-hardware-counters-with-papi-depend-on-papi-libra%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you check without
_mm_clflush(&array[i]);
? Can you check for smaller array sizes such as 500 and 300 elements instead of 1000? Did you have qualify the array declaration withvolatile
so that the compiler won't optimize away the loads at-O3
?– Hadi Brais
Mar 7 at 18:20
@HadiBrais Yes I have volatile array so reading in temp won't be optimized. I will check without flush. now I am sure I will get the same result but I will try
– Ana Khorguani
Mar 7 at 18:30
@HadiBrais yes I have the same behavior without clflush. Just to make sure, this first loop, when I initialize array, it writes first time in the element and then evicts this cache line right? I did not suspect this before but I tested it today and surprisingly this is the observation I got. I read about on demand zeroing in another post, which as I understood was the reason of RFO case right? So is it somehow related to cache line eviction too?
– Ana Khorguani
Mar 7 at 18:37
for 500, initialization of PAPI after, gives result: L1_HIT: 62, L1_MISS: 513, L2_HIT: 16, L2_MISS: 497, and initializing before: L1_HIT: 67, L1_MISS: 510, L2_HIT: 377, L2_MISS: 133. seems to be same behavior. It's same for 300. After initialization: L1_HIT: 83, L1_MISS: 304, L2_HIT: 6, L2_MISS: 298. initializing before array: L1_HIT: 82, L1_MISS: 302, L2_HIT: 117, L2_MISS: 185,
– Ana Khorguani
Mar 7 at 18:48
It is as if PAPI_library_init is causing all the L2 lines to be evicted. Looking at the source code, I don't see why would this happen. What about MEM_LOAD_RETIRED.L3_MISS and MEM_LOAD_RETIRED.L3_HIT?
– Hadi Brais
Mar 7 at 18:58