AVX load instruction fails on cygwinWhat are the contents of the memory just allocated by `malloc()`?What is the difference between Cygwin and MinGW?problem with flushing input stream CWhat's the purpose of the LEA instruction?How to navigate to a directory in C: with Cygwin?type checking across source filesProgram run in child process doesn't loopC program originally written in Linux, now porting to Windows using Cygwin (compile w/ gcc)How to drain the java thread stack memory area?gcc 5.40 doesn't include standard include files?Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
What is the highest possible scrabble score for placing a single tile
What (the heck) is a Super Worm Equinox Moon?
Is this toilet slogan correct usage of the English language?
Why Shazam when there is already Superman?
The Digit Triangles
Does "he squandered his car on drink" sound natural?
Review your own paper in Mathematics
Delete multiple columns using awk or sed
Microchip documentation does not label CAN buss pins on micro controller pinout diagram
Which was the first story featuring espers?
How much theory knowledge is actually used while playing?
How does electrical safety system work on ISS?
Do we have to expect a queue for the shuttle from Watford Junction to Harry Potter Studio?
Why does AES have exactly 10 rounds for a 128-bit key, 12 for 192 bits and 14 for a 256-bit key size?
Doesn't the system of the Supreme Court oppose justice?
A variation to the phrase "hanging over my shoulders"
Is there any evidence that Cleopatra and Caesarion considered fleeing to India to escape the Romans?
When were female captains banned from Starfleet?
Can you use Vicious Mockery to win an argument or gain favours?
Why do Radio Buttons not fill the entire outer circle?
Make a Bowl of Alphabet Soup
Stack Interview Code methods made from class Node and Smart Pointers
Has the laser at Magurele, Romania reached a tenth of the Sun's power?
Can I say "fingers" when referring to toes?
AVX load instruction fails on cygwin
What are the contents of the memory just allocated by `malloc()`?What is the difference between Cygwin and MinGW?problem with flushing input stream CWhat's the purpose of the LEA instruction?How to navigate to a directory in C: with Cygwin?type checking across source filesProgram run in child process doesn't loopC program originally written in Linux, now porting to Windows using Cygwin (compile w/ gcc)How to drain the java thread stack memory area?gcc 5.40 doesn't include standard include files?Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
When I run the code on my machine, the program goes segmentation fault.
#include <immintrin.h>
#include <stdint.h>
static inline __m256i load_vector(__m256i const * addr)
__m256i res = _mm256_load_si256(addr);
return res;
void test2()
int32_t *src;
src = _mm_malloc(sizeof(__m256i), 32);
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);
int main(int argc,char *argv[])
test2();
return 0;
I tried to debug this with gdb and it goes segmentation fault when _mm256_load_si256 is called.
I run the code on cygwin gcc on AMD 2990wx CPU.
How can be happen such things?
c gcc cygwin x86-64 avx
|
show 2 more comments
When I run the code on my machine, the program goes segmentation fault.
#include <immintrin.h>
#include <stdint.h>
static inline __m256i load_vector(__m256i const * addr)
__m256i res = _mm256_load_si256(addr);
return res;
void test2()
int32_t *src;
src = _mm_malloc(sizeof(__m256i), 32);
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);
int main(int argc,char *argv[])
test2();
return 0;
I tried to debug this with gdb and it goes segmentation fault when _mm256_load_si256 is called.
I run the code on cygwin gcc on AMD 2990wx CPU.
How can be happen such things?
c gcc cygwin x86-64 avx
2
Works on my machine; I don't see anything wrong there. You might try looking more closely with gdb to see what went wrong. What instruction generated the segfault?
– Jason R
Mar 8 at 0:02
Is cygwin gcc's_mm_mallocbroken and not returning 32-byte aligned memory?
– Peter Cordes
Mar 8 at 8:11
Reading uninitialized memory is Undefined Behavior: stackoverflow.com/a/37184840
– chtz
Mar 8 at 13:05
2
@chtz Technically it's UB, but we can do better than that. I don't see how that can cause the OP's segfault. @OP since you're using cygwin, that probably means Windows. What compiler flags are you using? If it's-O0then it's possible thatresis being put on the stack. And GCC has a stack alignment problem that has made AVX unusable on Windows since antiquity.
– Mysticial
Mar 8 at 20:23
@Mysticial I agree that this is unlikely the cause of the segfault. I therefore just posted it as a comment (of course, I could have made it more clear that this is likely unrelated).
– chtz
Mar 9 at 20:38
|
show 2 more comments
When I run the code on my machine, the program goes segmentation fault.
#include <immintrin.h>
#include <stdint.h>
static inline __m256i load_vector(__m256i const * addr)
__m256i res = _mm256_load_si256(addr);
return res;
void test2()
int32_t *src;
src = _mm_malloc(sizeof(__m256i), 32);
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);
int main(int argc,char *argv[])
test2();
return 0;
I tried to debug this with gdb and it goes segmentation fault when _mm256_load_si256 is called.
I run the code on cygwin gcc on AMD 2990wx CPU.
How can be happen such things?
c gcc cygwin x86-64 avx
When I run the code on my machine, the program goes segmentation fault.
#include <immintrin.h>
#include <stdint.h>
static inline __m256i load_vector(__m256i const * addr)
__m256i res = _mm256_load_si256(addr);
return res;
void test2()
int32_t *src;
src = _mm_malloc(sizeof(__m256i), 32);
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);
int main(int argc,char *argv[])
test2();
return 0;
I tried to debug this with gdb and it goes segmentation fault when _mm256_load_si256 is called.
I run the code on cygwin gcc on AMD 2990wx CPU.
How can be happen such things?
c gcc cygwin x86-64 avx
c gcc cygwin x86-64 avx
edited Mar 11 at 2:18
Peter Cordes
132k18201338
132k18201338
asked Mar 7 at 23:43
KurodaKuroda
685
685
2
Works on my machine; I don't see anything wrong there. You might try looking more closely with gdb to see what went wrong. What instruction generated the segfault?
– Jason R
Mar 8 at 0:02
Is cygwin gcc's_mm_mallocbroken and not returning 32-byte aligned memory?
– Peter Cordes
Mar 8 at 8:11
Reading uninitialized memory is Undefined Behavior: stackoverflow.com/a/37184840
– chtz
Mar 8 at 13:05
2
@chtz Technically it's UB, but we can do better than that. I don't see how that can cause the OP's segfault. @OP since you're using cygwin, that probably means Windows. What compiler flags are you using? If it's-O0then it's possible thatresis being put on the stack. And GCC has a stack alignment problem that has made AVX unusable on Windows since antiquity.
– Mysticial
Mar 8 at 20:23
@Mysticial I agree that this is unlikely the cause of the segfault. I therefore just posted it as a comment (of course, I could have made it more clear that this is likely unrelated).
– chtz
Mar 9 at 20:38
|
show 2 more comments
2
Works on my machine; I don't see anything wrong there. You might try looking more closely with gdb to see what went wrong. What instruction generated the segfault?
– Jason R
Mar 8 at 0:02
Is cygwin gcc's_mm_mallocbroken and not returning 32-byte aligned memory?
– Peter Cordes
Mar 8 at 8:11
Reading uninitialized memory is Undefined Behavior: stackoverflow.com/a/37184840
– chtz
Mar 8 at 13:05
2
@chtz Technically it's UB, but we can do better than that. I don't see how that can cause the OP's segfault. @OP since you're using cygwin, that probably means Windows. What compiler flags are you using? If it's-O0then it's possible thatresis being put on the stack. And GCC has a stack alignment problem that has made AVX unusable on Windows since antiquity.
– Mysticial
Mar 8 at 20:23
@Mysticial I agree that this is unlikely the cause of the segfault. I therefore just posted it as a comment (of course, I could have made it more clear that this is likely unrelated).
– chtz
Mar 9 at 20:38
2
2
Works on my machine; I don't see anything wrong there. You might try looking more closely with gdb to see what went wrong. What instruction generated the segfault?
– Jason R
Mar 8 at 0:02
Works on my machine; I don't see anything wrong there. You might try looking more closely with gdb to see what went wrong. What instruction generated the segfault?
– Jason R
Mar 8 at 0:02
Is cygwin gcc's
_mm_malloc broken and not returning 32-byte aligned memory?– Peter Cordes
Mar 8 at 8:11
Is cygwin gcc's
_mm_malloc broken and not returning 32-byte aligned memory?– Peter Cordes
Mar 8 at 8:11
Reading uninitialized memory is Undefined Behavior: stackoverflow.com/a/37184840
– chtz
Mar 8 at 13:05
Reading uninitialized memory is Undefined Behavior: stackoverflow.com/a/37184840
– chtz
Mar 8 at 13:05
2
2
@chtz Technically it's UB, but we can do better than that. I don't see how that can cause the OP's segfault. @OP since you're using cygwin, that probably means Windows. What compiler flags are you using? If it's
-O0 then it's possible that res is being put on the stack. And GCC has a stack alignment problem that has made AVX unusable on Windows since antiquity.– Mysticial
Mar 8 at 20:23
@chtz Technically it's UB, but we can do better than that. I don't see how that can cause the OP's segfault. @OP since you're using cygwin, that probably means Windows. What compiler flags are you using? If it's
-O0 then it's possible that res is being put on the stack. And GCC has a stack alignment problem that has made AVX unusable on Windows since antiquity.– Mysticial
Mar 8 at 20:23
@Mysticial I agree that this is unlikely the cause of the segfault. I therefore just posted it as a comment (of course, I could have made it more clear that this is likely unrelated).
– chtz
Mar 9 at 20:38
@Mysticial I agree that this is unlikely the cause of the segfault. I therefore just posted it as a comment (of course, I could have made it more clear that this is likely unrelated).
– chtz
Mar 9 at 20:38
|
show 2 more comments
1 Answer
1
active
oldest
votes
I did further debug. _mm_malloc wasn't the problem, it was alignment of local variables.
At the second vmovdqa to store the vector into the caller's pointer, RAX was not 32-byte aligned. vec in test2 seems not to be aligned. (Cygwin/mingw return the __m256i vector by reference with the caller passing a hidden pointer, unlike the standard Windows x64 calling convention that return it by value).
This is the known Cygwin bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412) that Mysticial linked in comments: Cygwin GCC can't safely use AVX because it doesn't properly align the stack for __m256i locals that get stored to memory. (Cygwin/MinGW gcc will properly align alignas(32) int arr[8] = 0;, but they do it by aligning a separate pointer, not RSP or RBP. Apparently there's some SEH limitation on stack frame manipulation)
Clang, MSVC, and ICC all support __m256i properly.
With optimization enabled gcc often won't make faulting code, but sometimes even optimized code will store/reload a 32-byte vector to the stack.
_ZL11load_vectorPKDv4_x:
.LFB3671:
.file 2 "min_case.c"
.loc 2 4 0
.cfi_startproc
pushq %rbp
.seh_pushreg %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.seh_setframe %rbp, 0
.cfi_def_cfa_register 6
subq $16, %rsp
.seh_stackalloc 16
.seh_endprologue
movq %rcx, 16(%rbp)
movq %rdx, 24(%rbp)
movq 24(%rbp), %rax
movq %rax, -8(%rbp)
.LBB4:
.LBB5:
.file 3 "/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/include/avxintrin.h"
.loc 3 909 0
movq -8(%rbp), %rax
vmovdqa (%rax), %ymm0
.LBE5:
.LBE4:
.loc 2 5 0
movq 16(%rbp), %rax
vmovdqa %ymm0, (%rax)
.loc 2 6 0
movq 16(%rbp), %rax
addq $16, %rsp
popq %rbp
.cfi_restore 6
.cfi_def_cfa 7, 8
ret
__m256i was not aligned in this test-case:
#include <immintrin.h>
#include <stdint.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
const char* check_alignment(const void *ptr, uintptr_t alignment)
return (((uintptr_t)ptr) & (alignment - 1)) == 0 ? "aligned" : "NOT aligned";
static inline __m256i load_vector(__m256i const * addr)
printf("addr:%sn", check_alignment(addr, 32));
__m256i res;
printf("&res:%sn", check_alignment(&res, 32));
res = _mm256_load_si256(addr);
return res;
void test2()
int32_t *src;
src = (int32_t *)_mm_malloc(sizeof(__m256i), 32);
src[0] = 0; src[0] = 1; src[2] = 2; src[3] = 3;
src[4] = 4; src[5] = 5; src[6] = 6; src[7] = 7;
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);
int main(int argc,char *argv[])
test2();
return 0;
// results
// addr:aligned
// &res:NOT aligned
// Segmentation fault
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55054592%2favx-load-instruction-fails-on-cygwin%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I did further debug. _mm_malloc wasn't the problem, it was alignment of local variables.
At the second vmovdqa to store the vector into the caller's pointer, RAX was not 32-byte aligned. vec in test2 seems not to be aligned. (Cygwin/mingw return the __m256i vector by reference with the caller passing a hidden pointer, unlike the standard Windows x64 calling convention that return it by value).
This is the known Cygwin bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412) that Mysticial linked in comments: Cygwin GCC can't safely use AVX because it doesn't properly align the stack for __m256i locals that get stored to memory. (Cygwin/MinGW gcc will properly align alignas(32) int arr[8] = 0;, but they do it by aligning a separate pointer, not RSP or RBP. Apparently there's some SEH limitation on stack frame manipulation)
Clang, MSVC, and ICC all support __m256i properly.
With optimization enabled gcc often won't make faulting code, but sometimes even optimized code will store/reload a 32-byte vector to the stack.
_ZL11load_vectorPKDv4_x:
.LFB3671:
.file 2 "min_case.c"
.loc 2 4 0
.cfi_startproc
pushq %rbp
.seh_pushreg %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.seh_setframe %rbp, 0
.cfi_def_cfa_register 6
subq $16, %rsp
.seh_stackalloc 16
.seh_endprologue
movq %rcx, 16(%rbp)
movq %rdx, 24(%rbp)
movq 24(%rbp), %rax
movq %rax, -8(%rbp)
.LBB4:
.LBB5:
.file 3 "/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/include/avxintrin.h"
.loc 3 909 0
movq -8(%rbp), %rax
vmovdqa (%rax), %ymm0
.LBE5:
.LBE4:
.loc 2 5 0
movq 16(%rbp), %rax
vmovdqa %ymm0, (%rax)
.loc 2 6 0
movq 16(%rbp), %rax
addq $16, %rsp
popq %rbp
.cfi_restore 6
.cfi_def_cfa 7, 8
ret
__m256i was not aligned in this test-case:
#include <immintrin.h>
#include <stdint.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
const char* check_alignment(const void *ptr, uintptr_t alignment)
return (((uintptr_t)ptr) & (alignment - 1)) == 0 ? "aligned" : "NOT aligned";
static inline __m256i load_vector(__m256i const * addr)
printf("addr:%sn", check_alignment(addr, 32));
__m256i res;
printf("&res:%sn", check_alignment(&res, 32));
res = _mm256_load_si256(addr);
return res;
void test2()
int32_t *src;
src = (int32_t *)_mm_malloc(sizeof(__m256i), 32);
src[0] = 0; src[0] = 1; src[2] = 2; src[3] = 3;
src[4] = 4; src[5] = 5; src[6] = 6; src[7] = 7;
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);
int main(int argc,char *argv[])
test2();
return 0;
// results
// addr:aligned
// &res:NOT aligned
// Segmentation fault
add a comment |
I did further debug. _mm_malloc wasn't the problem, it was alignment of local variables.
At the second vmovdqa to store the vector into the caller's pointer, RAX was not 32-byte aligned. vec in test2 seems not to be aligned. (Cygwin/mingw return the __m256i vector by reference with the caller passing a hidden pointer, unlike the standard Windows x64 calling convention that return it by value).
This is the known Cygwin bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412) that Mysticial linked in comments: Cygwin GCC can't safely use AVX because it doesn't properly align the stack for __m256i locals that get stored to memory. (Cygwin/MinGW gcc will properly align alignas(32) int arr[8] = 0;, but they do it by aligning a separate pointer, not RSP or RBP. Apparently there's some SEH limitation on stack frame manipulation)
Clang, MSVC, and ICC all support __m256i properly.
With optimization enabled gcc often won't make faulting code, but sometimes even optimized code will store/reload a 32-byte vector to the stack.
_ZL11load_vectorPKDv4_x:
.LFB3671:
.file 2 "min_case.c"
.loc 2 4 0
.cfi_startproc
pushq %rbp
.seh_pushreg %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.seh_setframe %rbp, 0
.cfi_def_cfa_register 6
subq $16, %rsp
.seh_stackalloc 16
.seh_endprologue
movq %rcx, 16(%rbp)
movq %rdx, 24(%rbp)
movq 24(%rbp), %rax
movq %rax, -8(%rbp)
.LBB4:
.LBB5:
.file 3 "/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/include/avxintrin.h"
.loc 3 909 0
movq -8(%rbp), %rax
vmovdqa (%rax), %ymm0
.LBE5:
.LBE4:
.loc 2 5 0
movq 16(%rbp), %rax
vmovdqa %ymm0, (%rax)
.loc 2 6 0
movq 16(%rbp), %rax
addq $16, %rsp
popq %rbp
.cfi_restore 6
.cfi_def_cfa 7, 8
ret
__m256i was not aligned in this test-case:
#include <immintrin.h>
#include <stdint.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
const char* check_alignment(const void *ptr, uintptr_t alignment)
return (((uintptr_t)ptr) & (alignment - 1)) == 0 ? "aligned" : "NOT aligned";
static inline __m256i load_vector(__m256i const * addr)
printf("addr:%sn", check_alignment(addr, 32));
__m256i res;
printf("&res:%sn", check_alignment(&res, 32));
res = _mm256_load_si256(addr);
return res;
void test2()
int32_t *src;
src = (int32_t *)_mm_malloc(sizeof(__m256i), 32);
src[0] = 0; src[0] = 1; src[2] = 2; src[3] = 3;
src[4] = 4; src[5] = 5; src[6] = 6; src[7] = 7;
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);
int main(int argc,char *argv[])
test2();
return 0;
// results
// addr:aligned
// &res:NOT aligned
// Segmentation fault
add a comment |
I did further debug. _mm_malloc wasn't the problem, it was alignment of local variables.
At the second vmovdqa to store the vector into the caller's pointer, RAX was not 32-byte aligned. vec in test2 seems not to be aligned. (Cygwin/mingw return the __m256i vector by reference with the caller passing a hidden pointer, unlike the standard Windows x64 calling convention that return it by value).
This is the known Cygwin bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412) that Mysticial linked in comments: Cygwin GCC can't safely use AVX because it doesn't properly align the stack for __m256i locals that get stored to memory. (Cygwin/MinGW gcc will properly align alignas(32) int arr[8] = 0;, but they do it by aligning a separate pointer, not RSP or RBP. Apparently there's some SEH limitation on stack frame manipulation)
Clang, MSVC, and ICC all support __m256i properly.
With optimization enabled gcc often won't make faulting code, but sometimes even optimized code will store/reload a 32-byte vector to the stack.
_ZL11load_vectorPKDv4_x:
.LFB3671:
.file 2 "min_case.c"
.loc 2 4 0
.cfi_startproc
pushq %rbp
.seh_pushreg %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.seh_setframe %rbp, 0
.cfi_def_cfa_register 6
subq $16, %rsp
.seh_stackalloc 16
.seh_endprologue
movq %rcx, 16(%rbp)
movq %rdx, 24(%rbp)
movq 24(%rbp), %rax
movq %rax, -8(%rbp)
.LBB4:
.LBB5:
.file 3 "/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/include/avxintrin.h"
.loc 3 909 0
movq -8(%rbp), %rax
vmovdqa (%rax), %ymm0
.LBE5:
.LBE4:
.loc 2 5 0
movq 16(%rbp), %rax
vmovdqa %ymm0, (%rax)
.loc 2 6 0
movq 16(%rbp), %rax
addq $16, %rsp
popq %rbp
.cfi_restore 6
.cfi_def_cfa 7, 8
ret
__m256i was not aligned in this test-case:
#include <immintrin.h>
#include <stdint.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
const char* check_alignment(const void *ptr, uintptr_t alignment)
return (((uintptr_t)ptr) & (alignment - 1)) == 0 ? "aligned" : "NOT aligned";
static inline __m256i load_vector(__m256i const * addr)
printf("addr:%sn", check_alignment(addr, 32));
__m256i res;
printf("&res:%sn", check_alignment(&res, 32));
res = _mm256_load_si256(addr);
return res;
void test2()
int32_t *src;
src = (int32_t *)_mm_malloc(sizeof(__m256i), 32);
src[0] = 0; src[0] = 1; src[2] = 2; src[3] = 3;
src[4] = 4; src[5] = 5; src[6] = 6; src[7] = 7;
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);
int main(int argc,char *argv[])
test2();
return 0;
// results
// addr:aligned
// &res:NOT aligned
// Segmentation fault
I did further debug. _mm_malloc wasn't the problem, it was alignment of local variables.
At the second vmovdqa to store the vector into the caller's pointer, RAX was not 32-byte aligned. vec in test2 seems not to be aligned. (Cygwin/mingw return the __m256i vector by reference with the caller passing a hidden pointer, unlike the standard Windows x64 calling convention that return it by value).
This is the known Cygwin bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412) that Mysticial linked in comments: Cygwin GCC can't safely use AVX because it doesn't properly align the stack for __m256i locals that get stored to memory. (Cygwin/MinGW gcc will properly align alignas(32) int arr[8] = 0;, but they do it by aligning a separate pointer, not RSP or RBP. Apparently there's some SEH limitation on stack frame manipulation)
Clang, MSVC, and ICC all support __m256i properly.
With optimization enabled gcc often won't make faulting code, but sometimes even optimized code will store/reload a 32-byte vector to the stack.
_ZL11load_vectorPKDv4_x:
.LFB3671:
.file 2 "min_case.c"
.loc 2 4 0
.cfi_startproc
pushq %rbp
.seh_pushreg %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.seh_setframe %rbp, 0
.cfi_def_cfa_register 6
subq $16, %rsp
.seh_stackalloc 16
.seh_endprologue
movq %rcx, 16(%rbp)
movq %rdx, 24(%rbp)
movq 24(%rbp), %rax
movq %rax, -8(%rbp)
.LBB4:
.LBB5:
.file 3 "/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/include/avxintrin.h"
.loc 3 909 0
movq -8(%rbp), %rax
vmovdqa (%rax), %ymm0
.LBE5:
.LBE4:
.loc 2 5 0
movq 16(%rbp), %rax
vmovdqa %ymm0, (%rax)
.loc 2 6 0
movq 16(%rbp), %rax
addq $16, %rsp
popq %rbp
.cfi_restore 6
.cfi_def_cfa 7, 8
ret
__m256i was not aligned in this test-case:
#include <immintrin.h>
#include <stdint.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
const char* check_alignment(const void *ptr, uintptr_t alignment)
return (((uintptr_t)ptr) & (alignment - 1)) == 0 ? "aligned" : "NOT aligned";
static inline __m256i load_vector(__m256i const * addr)
printf("addr:%sn", check_alignment(addr, 32));
__m256i res;
printf("&res:%sn", check_alignment(&res, 32));
res = _mm256_load_si256(addr);
return res;
void test2()
int32_t *src;
src = (int32_t *)_mm_malloc(sizeof(__m256i), 32);
src[0] = 0; src[0] = 1; src[2] = 2; src[3] = 3;
src[4] = 4; src[5] = 5; src[6] = 6; src[7] = 7;
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);
int main(int argc,char *argv[])
test2();
return 0;
// results
// addr:aligned
// &res:NOT aligned
// Segmentation fault
edited Mar 11 at 2:25
Peter Cordes
132k18201338
132k18201338
answered Mar 10 at 17:28
KurodaKuroda
685
685
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55054592%2favx-load-instruction-fails-on-cygwin%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Works on my machine; I don't see anything wrong there. You might try looking more closely with gdb to see what went wrong. What instruction generated the segfault?
– Jason R
Mar 8 at 0:02
Is cygwin gcc's
_mm_mallocbroken and not returning 32-byte aligned memory?– Peter Cordes
Mar 8 at 8:11
Reading uninitialized memory is Undefined Behavior: stackoverflow.com/a/37184840
– chtz
Mar 8 at 13:05
2
@chtz Technically it's UB, but we can do better than that. I don't see how that can cause the OP's segfault. @OP since you're using cygwin, that probably means Windows. What compiler flags are you using? If it's
-O0then it's possible thatresis being put on the stack. And GCC has a stack alignment problem that has made AVX unusable on Windows since antiquity.– Mysticial
Mar 8 at 20:23
@Mysticial I agree that this is unlikely the cause of the segfault. I therefore just posted it as a comment (of course, I could have made it more clear that this is likely unrelated).
– chtz
Mar 9 at 20:38