AVX load instruction fails on cygwinWhat are the contents of the memory just allocated by `malloc()`?What is the difference between Cygwin and MinGW?problem with flushing input stream CWhat's the purpose of the LEA instruction?How to navigate to a directory in C: with Cygwin?type checking across source filesProgram run in child process doesn't loopC program originally written in Linux, now porting to Windows using Cygwin (compile w/ gcc)How to drain the java thread stack memory area?gcc 5.40 doesn't include standard include files?Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2

What is the highest possible scrabble score for placing a single tile

What (the heck) is a Super Worm Equinox Moon?

Is this toilet slogan correct usage of the English language?

Why Shazam when there is already Superman?

The Digit Triangles

Does "he squandered his car on drink" sound natural?

Review your own paper in Mathematics

Delete multiple columns using awk or sed

Microchip documentation does not label CAN buss pins on micro controller pinout diagram

Which was the first story featuring espers?

How much theory knowledge is actually used while playing?

How does electrical safety system work on ISS?

Do we have to expect a queue for the shuttle from Watford Junction to Harry Potter Studio?

Why does AES have exactly 10 rounds for a 128-bit key, 12 for 192 bits and 14 for a 256-bit key size?

Doesn't the system of the Supreme Court oppose justice?

A variation to the phrase "hanging over my shoulders"

Is there any evidence that Cleopatra and Caesarion considered fleeing to India to escape the Romans?

When were female captains banned from Starfleet?

Can you use Vicious Mockery to win an argument or gain favours?

Why do Radio Buttons not fill the entire outer circle?

Make a Bowl of Alphabet Soup

Stack Interview Code methods made from class Node and Smart Pointers

Has the laser at Magurele, Romania reached a tenth of the Sun's power?

Can I say "fingers" when referring to toes?



AVX load instruction fails on cygwin


What are the contents of the memory just allocated by `malloc()`?What is the difference between Cygwin and MinGW?problem with flushing input stream CWhat's the purpose of the LEA instruction?How to navigate to a directory in C: with Cygwin?type checking across source filesProgram run in child process doesn't loopC program originally written in Linux, now porting to Windows using Cygwin (compile w/ gcc)How to drain the java thread stack memory area?gcc 5.40 doesn't include standard include files?Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2













2















When I run the code on my machine, the program goes segmentation fault.



#include <immintrin.h>
#include <stdint.h>

static inline __m256i load_vector(__m256i const * addr)
__m256i res = _mm256_load_si256(addr);
return res;

void test2()
int32_t *src;
src = _mm_malloc(sizeof(__m256i), 32);
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);


int main(int argc,char *argv[])
test2();
return 0;



I tried to debug this with gdb and it goes segmentation fault when _mm256_load_si256 is called.



I run the code on cygwin gcc on AMD 2990wx CPU.
How can be happen such things?










share|improve this question



















  • 2





    Works on my machine; I don't see anything wrong there. You might try looking more closely with gdb to see what went wrong. What instruction generated the segfault?

    – Jason R
    Mar 8 at 0:02











  • Is cygwin gcc's _mm_malloc broken and not returning 32-byte aligned memory?

    – Peter Cordes
    Mar 8 at 8:11











  • Reading uninitialized memory is Undefined Behavior: stackoverflow.com/a/37184840

    – chtz
    Mar 8 at 13:05






  • 2





    @chtz Technically it's UB, but we can do better than that. I don't see how that can cause the OP's segfault. @OP since you're using cygwin, that probably means Windows. What compiler flags are you using? If it's -O0 then it's possible that res is being put on the stack. And GCC has a stack alignment problem that has made AVX unusable on Windows since antiquity.

    – Mysticial
    Mar 8 at 20:23












  • @Mysticial I agree that this is unlikely the cause of the segfault. I therefore just posted it as a comment (of course, I could have made it more clear that this is likely unrelated).

    – chtz
    Mar 9 at 20:38















2















When I run the code on my machine, the program goes segmentation fault.



#include <immintrin.h>
#include <stdint.h>

static inline __m256i load_vector(__m256i const * addr)
__m256i res = _mm256_load_si256(addr);
return res;

void test2()
int32_t *src;
src = _mm_malloc(sizeof(__m256i), 32);
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);


int main(int argc,char *argv[])
test2();
return 0;



I tried to debug this with gdb and it goes segmentation fault when _mm256_load_si256 is called.



I run the code on cygwin gcc on AMD 2990wx CPU.
How can be happen such things?










share|improve this question



















  • 2





    Works on my machine; I don't see anything wrong there. You might try looking more closely with gdb to see what went wrong. What instruction generated the segfault?

    – Jason R
    Mar 8 at 0:02











  • Is cygwin gcc's _mm_malloc broken and not returning 32-byte aligned memory?

    – Peter Cordes
    Mar 8 at 8:11











  • Reading uninitialized memory is Undefined Behavior: stackoverflow.com/a/37184840

    – chtz
    Mar 8 at 13:05






  • 2





    @chtz Technically it's UB, but we can do better than that. I don't see how that can cause the OP's segfault. @OP since you're using cygwin, that probably means Windows. What compiler flags are you using? If it's -O0 then it's possible that res is being put on the stack. And GCC has a stack alignment problem that has made AVX unusable on Windows since antiquity.

    – Mysticial
    Mar 8 at 20:23












  • @Mysticial I agree that this is unlikely the cause of the segfault. I therefore just posted it as a comment (of course, I could have made it more clear that this is likely unrelated).

    – chtz
    Mar 9 at 20:38













2












2








2








When I run the code on my machine, the program goes segmentation fault.



#include <immintrin.h>
#include <stdint.h>

static inline __m256i load_vector(__m256i const * addr)
__m256i res = _mm256_load_si256(addr);
return res;

void test2()
int32_t *src;
src = _mm_malloc(sizeof(__m256i), 32);
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);


int main(int argc,char *argv[])
test2();
return 0;



I tried to debug this with gdb and it goes segmentation fault when _mm256_load_si256 is called.



I run the code on cygwin gcc on AMD 2990wx CPU.
How can be happen such things?










share|improve this question
















When I run the code on my machine, the program goes segmentation fault.



#include <immintrin.h>
#include <stdint.h>

static inline __m256i load_vector(__m256i const * addr)
__m256i res = _mm256_load_si256(addr);
return res;

void test2()
int32_t *src;
src = _mm_malloc(sizeof(__m256i), 32);
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);


int main(int argc,char *argv[])
test2();
return 0;



I tried to debug this with gdb and it goes segmentation fault when _mm256_load_si256 is called.



I run the code on cygwin gcc on AMD 2990wx CPU.
How can be happen such things?







c gcc cygwin x86-64 avx






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 11 at 2:18









Peter Cordes

132k18201338




132k18201338










asked Mar 7 at 23:43









KurodaKuroda

685




685







  • 2





    Works on my machine; I don't see anything wrong there. You might try looking more closely with gdb to see what went wrong. What instruction generated the segfault?

    – Jason R
    Mar 8 at 0:02











  • Is cygwin gcc's _mm_malloc broken and not returning 32-byte aligned memory?

    – Peter Cordes
    Mar 8 at 8:11











  • Reading uninitialized memory is Undefined Behavior: stackoverflow.com/a/37184840

    – chtz
    Mar 8 at 13:05






  • 2





    @chtz Technically it's UB, but we can do better than that. I don't see how that can cause the OP's segfault. @OP since you're using cygwin, that probably means Windows. What compiler flags are you using? If it's -O0 then it's possible that res is being put on the stack. And GCC has a stack alignment problem that has made AVX unusable on Windows since antiquity.

    – Mysticial
    Mar 8 at 20:23












  • @Mysticial I agree that this is unlikely the cause of the segfault. I therefore just posted it as a comment (of course, I could have made it more clear that this is likely unrelated).

    – chtz
    Mar 9 at 20:38












  • 2





    Works on my machine; I don't see anything wrong there. You might try looking more closely with gdb to see what went wrong. What instruction generated the segfault?

    – Jason R
    Mar 8 at 0:02











  • Is cygwin gcc's _mm_malloc broken and not returning 32-byte aligned memory?

    – Peter Cordes
    Mar 8 at 8:11











  • Reading uninitialized memory is Undefined Behavior: stackoverflow.com/a/37184840

    – chtz
    Mar 8 at 13:05






  • 2





    @chtz Technically it's UB, but we can do better than that. I don't see how that can cause the OP's segfault. @OP since you're using cygwin, that probably means Windows. What compiler flags are you using? If it's -O0 then it's possible that res is being put on the stack. And GCC has a stack alignment problem that has made AVX unusable on Windows since antiquity.

    – Mysticial
    Mar 8 at 20:23












  • @Mysticial I agree that this is unlikely the cause of the segfault. I therefore just posted it as a comment (of course, I could have made it more clear that this is likely unrelated).

    – chtz
    Mar 9 at 20:38







2




2





Works on my machine; I don't see anything wrong there. You might try looking more closely with gdb to see what went wrong. What instruction generated the segfault?

– Jason R
Mar 8 at 0:02





Works on my machine; I don't see anything wrong there. You might try looking more closely with gdb to see what went wrong. What instruction generated the segfault?

– Jason R
Mar 8 at 0:02













Is cygwin gcc's _mm_malloc broken and not returning 32-byte aligned memory?

– Peter Cordes
Mar 8 at 8:11





Is cygwin gcc's _mm_malloc broken and not returning 32-byte aligned memory?

– Peter Cordes
Mar 8 at 8:11













Reading uninitialized memory is Undefined Behavior: stackoverflow.com/a/37184840

– chtz
Mar 8 at 13:05





Reading uninitialized memory is Undefined Behavior: stackoverflow.com/a/37184840

– chtz
Mar 8 at 13:05




2




2





@chtz Technically it's UB, but we can do better than that. I don't see how that can cause the OP's segfault. @OP since you're using cygwin, that probably means Windows. What compiler flags are you using? If it's -O0 then it's possible that res is being put on the stack. And GCC has a stack alignment problem that has made AVX unusable on Windows since antiquity.

– Mysticial
Mar 8 at 20:23






@chtz Technically it's UB, but we can do better than that. I don't see how that can cause the OP's segfault. @OP since you're using cygwin, that probably means Windows. What compiler flags are you using? If it's -O0 then it's possible that res is being put on the stack. And GCC has a stack alignment problem that has made AVX unusable on Windows since antiquity.

– Mysticial
Mar 8 at 20:23














@Mysticial I agree that this is unlikely the cause of the segfault. I therefore just posted it as a comment (of course, I could have made it more clear that this is likely unrelated).

– chtz
Mar 9 at 20:38





@Mysticial I agree that this is unlikely the cause of the segfault. I therefore just posted it as a comment (of course, I could have made it more clear that this is likely unrelated).

– chtz
Mar 9 at 20:38












1 Answer
1






active

oldest

votes


















3














I did further debug. _mm_malloc wasn't the problem, it was alignment of local variables.



At the second vmovdqa to store the vector into the caller's pointer, RAX was not 32-byte aligned. vec in test2 seems not to be aligned. (Cygwin/mingw return the __m256i vector by reference with the caller passing a hidden pointer, unlike the standard Windows x64 calling convention that return it by value).



This is the known Cygwin bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412) that Mysticial linked in comments: Cygwin GCC can't safely use AVX because it doesn't properly align the stack for __m256i locals that get stored to memory. (Cygwin/MinGW gcc will properly align alignas(32) int arr[8] = 0;, but they do it by aligning a separate pointer, not RSP or RBP. Apparently there's some SEH limitation on stack frame manipulation)



Clang, MSVC, and ICC all support __m256i properly.



With optimization enabled gcc often won't make faulting code, but sometimes even optimized code will store/reload a 32-byte vector to the stack.



_ZL11load_vectorPKDv4_x:
.LFB3671:
.file 2 "min_case.c"
.loc 2 4 0
.cfi_startproc
pushq %rbp
.seh_pushreg %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.seh_setframe %rbp, 0
.cfi_def_cfa_register 6
subq $16, %rsp
.seh_stackalloc 16
.seh_endprologue
movq %rcx, 16(%rbp)
movq %rdx, 24(%rbp)
movq 24(%rbp), %rax
movq %rax, -8(%rbp)
.LBB4:
.LBB5:
.file 3 "/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/include/avxintrin.h"
.loc 3 909 0
movq -8(%rbp), %rax
vmovdqa (%rax), %ymm0
.LBE5:
.LBE4:
.loc 2 5 0
movq 16(%rbp), %rax
vmovdqa %ymm0, (%rax)
.loc 2 6 0
movq 16(%rbp), %rax
addq $16, %rsp
popq %rbp
.cfi_restore 6
.cfi_def_cfa 7, 8
ret



__m256i was not aligned in this test-case:



#include <immintrin.h>
#include <stdint.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

const char* check_alignment(const void *ptr, uintptr_t alignment)
return (((uintptr_t)ptr) & (alignment - 1)) == 0 ? "aligned" : "NOT aligned";


static inline __m256i load_vector(__m256i const * addr)
printf("addr:%sn", check_alignment(addr, 32));
__m256i res;
printf("&res:%sn", check_alignment(&res, 32));
res = _mm256_load_si256(addr);
return res;

void test2()
int32_t *src;
src = (int32_t *)_mm_malloc(sizeof(__m256i), 32);
src[0] = 0; src[0] = 1; src[2] = 2; src[3] = 3;
src[4] = 4; src[5] = 5; src[6] = 6; src[7] = 7;
__m256i vec = load_vector((__m256i const * )src);
_mm_free(src);


int main(int argc,char *argv[])
test2();
return 0;


// results
// addr:aligned
// &res:NOT aligned
// Segmentation fault





share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55054592%2favx-load-instruction-fails-on-cygwin%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    I did further debug. _mm_malloc wasn't the problem, it was alignment of local variables.



    At the second vmovdqa to store the vector into the caller's pointer, RAX was not 32-byte aligned. vec in test2 seems not to be aligned. (Cygwin/mingw return the __m256i vector by reference with the caller passing a hidden pointer, unlike the standard Windows x64 calling convention that return it by value).



    This is the known Cygwin bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412) that Mysticial linked in comments: Cygwin GCC can't safely use AVX because it doesn't properly align the stack for __m256i locals that get stored to memory. (Cygwin/MinGW gcc will properly align alignas(32) int arr[8] = 0;, but they do it by aligning a separate pointer, not RSP or RBP. Apparently there's some SEH limitation on stack frame manipulation)



    Clang, MSVC, and ICC all support __m256i properly.



    With optimization enabled gcc often won't make faulting code, but sometimes even optimized code will store/reload a 32-byte vector to the stack.



    _ZL11load_vectorPKDv4_x:
    .LFB3671:
    .file 2 "min_case.c"
    .loc 2 4 0
    .cfi_startproc
    pushq %rbp
    .seh_pushreg %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq %rsp, %rbp
    .seh_setframe %rbp, 0
    .cfi_def_cfa_register 6
    subq $16, %rsp
    .seh_stackalloc 16
    .seh_endprologue
    movq %rcx, 16(%rbp)
    movq %rdx, 24(%rbp)
    movq 24(%rbp), %rax
    movq %rax, -8(%rbp)
    .LBB4:
    .LBB5:
    .file 3 "/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/include/avxintrin.h"
    .loc 3 909 0
    movq -8(%rbp), %rax
    vmovdqa (%rax), %ymm0
    .LBE5:
    .LBE4:
    .loc 2 5 0
    movq 16(%rbp), %rax
    vmovdqa %ymm0, (%rax)
    .loc 2 6 0
    movq 16(%rbp), %rax
    addq $16, %rsp
    popq %rbp
    .cfi_restore 6
    .cfi_def_cfa 7, 8
    ret



    __m256i was not aligned in this test-case:



    #include <immintrin.h>
    #include <stdint.h>
    #include <assert.h>
    #include <stdio.h>
    #include <stdlib.h>

    const char* check_alignment(const void *ptr, uintptr_t alignment)
    return (((uintptr_t)ptr) & (alignment - 1)) == 0 ? "aligned" : "NOT aligned";


    static inline __m256i load_vector(__m256i const * addr)
    printf("addr:%sn", check_alignment(addr, 32));
    __m256i res;
    printf("&res:%sn", check_alignment(&res, 32));
    res = _mm256_load_si256(addr);
    return res;

    void test2()
    int32_t *src;
    src = (int32_t *)_mm_malloc(sizeof(__m256i), 32);
    src[0] = 0; src[0] = 1; src[2] = 2; src[3] = 3;
    src[4] = 4; src[5] = 5; src[6] = 6; src[7] = 7;
    __m256i vec = load_vector((__m256i const * )src);
    _mm_free(src);


    int main(int argc,char *argv[])
    test2();
    return 0;


    // results
    // addr:aligned
    // &res:NOT aligned
    // Segmentation fault





    share|improve this answer





























      3














      I did further debug. _mm_malloc wasn't the problem, it was alignment of local variables.



      At the second vmovdqa to store the vector into the caller's pointer, RAX was not 32-byte aligned. vec in test2 seems not to be aligned. (Cygwin/mingw return the __m256i vector by reference with the caller passing a hidden pointer, unlike the standard Windows x64 calling convention that return it by value).



      This is the known Cygwin bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412) that Mysticial linked in comments: Cygwin GCC can't safely use AVX because it doesn't properly align the stack for __m256i locals that get stored to memory. (Cygwin/MinGW gcc will properly align alignas(32) int arr[8] = 0;, but they do it by aligning a separate pointer, not RSP or RBP. Apparently there's some SEH limitation on stack frame manipulation)



      Clang, MSVC, and ICC all support __m256i properly.



      With optimization enabled gcc often won't make faulting code, but sometimes even optimized code will store/reload a 32-byte vector to the stack.



      _ZL11load_vectorPKDv4_x:
      .LFB3671:
      .file 2 "min_case.c"
      .loc 2 4 0
      .cfi_startproc
      pushq %rbp
      .seh_pushreg %rbp
      .cfi_def_cfa_offset 16
      .cfi_offset 6, -16
      movq %rsp, %rbp
      .seh_setframe %rbp, 0
      .cfi_def_cfa_register 6
      subq $16, %rsp
      .seh_stackalloc 16
      .seh_endprologue
      movq %rcx, 16(%rbp)
      movq %rdx, 24(%rbp)
      movq 24(%rbp), %rax
      movq %rax, -8(%rbp)
      .LBB4:
      .LBB5:
      .file 3 "/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/include/avxintrin.h"
      .loc 3 909 0
      movq -8(%rbp), %rax
      vmovdqa (%rax), %ymm0
      .LBE5:
      .LBE4:
      .loc 2 5 0
      movq 16(%rbp), %rax
      vmovdqa %ymm0, (%rax)
      .loc 2 6 0
      movq 16(%rbp), %rax
      addq $16, %rsp
      popq %rbp
      .cfi_restore 6
      .cfi_def_cfa 7, 8
      ret



      __m256i was not aligned in this test-case:



      #include <immintrin.h>
      #include <stdint.h>
      #include <assert.h>
      #include <stdio.h>
      #include <stdlib.h>

      const char* check_alignment(const void *ptr, uintptr_t alignment)
      return (((uintptr_t)ptr) & (alignment - 1)) == 0 ? "aligned" : "NOT aligned";


      static inline __m256i load_vector(__m256i const * addr)
      printf("addr:%sn", check_alignment(addr, 32));
      __m256i res;
      printf("&res:%sn", check_alignment(&res, 32));
      res = _mm256_load_si256(addr);
      return res;

      void test2()
      int32_t *src;
      src = (int32_t *)_mm_malloc(sizeof(__m256i), 32);
      src[0] = 0; src[0] = 1; src[2] = 2; src[3] = 3;
      src[4] = 4; src[5] = 5; src[6] = 6; src[7] = 7;
      __m256i vec = load_vector((__m256i const * )src);
      _mm_free(src);


      int main(int argc,char *argv[])
      test2();
      return 0;


      // results
      // addr:aligned
      // &res:NOT aligned
      // Segmentation fault





      share|improve this answer



























        3












        3








        3







        I did further debug. _mm_malloc wasn't the problem, it was alignment of local variables.



        At the second vmovdqa to store the vector into the caller's pointer, RAX was not 32-byte aligned. vec in test2 seems not to be aligned. (Cygwin/mingw return the __m256i vector by reference with the caller passing a hidden pointer, unlike the standard Windows x64 calling convention that return it by value).



        This is the known Cygwin bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412) that Mysticial linked in comments: Cygwin GCC can't safely use AVX because it doesn't properly align the stack for __m256i locals that get stored to memory. (Cygwin/MinGW gcc will properly align alignas(32) int arr[8] = 0;, but they do it by aligning a separate pointer, not RSP or RBP. Apparently there's some SEH limitation on stack frame manipulation)



        Clang, MSVC, and ICC all support __m256i properly.



        With optimization enabled gcc often won't make faulting code, but sometimes even optimized code will store/reload a 32-byte vector to the stack.



        _ZL11load_vectorPKDv4_x:
        .LFB3671:
        .file 2 "min_case.c"
        .loc 2 4 0
        .cfi_startproc
        pushq %rbp
        .seh_pushreg %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq %rsp, %rbp
        .seh_setframe %rbp, 0
        .cfi_def_cfa_register 6
        subq $16, %rsp
        .seh_stackalloc 16
        .seh_endprologue
        movq %rcx, 16(%rbp)
        movq %rdx, 24(%rbp)
        movq 24(%rbp), %rax
        movq %rax, -8(%rbp)
        .LBB4:
        .LBB5:
        .file 3 "/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/include/avxintrin.h"
        .loc 3 909 0
        movq -8(%rbp), %rax
        vmovdqa (%rax), %ymm0
        .LBE5:
        .LBE4:
        .loc 2 5 0
        movq 16(%rbp), %rax
        vmovdqa %ymm0, (%rax)
        .loc 2 6 0
        movq 16(%rbp), %rax
        addq $16, %rsp
        popq %rbp
        .cfi_restore 6
        .cfi_def_cfa 7, 8
        ret



        __m256i was not aligned in this test-case:



        #include <immintrin.h>
        #include <stdint.h>
        #include <assert.h>
        #include <stdio.h>
        #include <stdlib.h>

        const char* check_alignment(const void *ptr, uintptr_t alignment)
        return (((uintptr_t)ptr) & (alignment - 1)) == 0 ? "aligned" : "NOT aligned";


        static inline __m256i load_vector(__m256i const * addr)
        printf("addr:%sn", check_alignment(addr, 32));
        __m256i res;
        printf("&res:%sn", check_alignment(&res, 32));
        res = _mm256_load_si256(addr);
        return res;

        void test2()
        int32_t *src;
        src = (int32_t *)_mm_malloc(sizeof(__m256i), 32);
        src[0] = 0; src[0] = 1; src[2] = 2; src[3] = 3;
        src[4] = 4; src[5] = 5; src[6] = 6; src[7] = 7;
        __m256i vec = load_vector((__m256i const * )src);
        _mm_free(src);


        int main(int argc,char *argv[])
        test2();
        return 0;


        // results
        // addr:aligned
        // &res:NOT aligned
        // Segmentation fault





        share|improve this answer















        I did further debug. _mm_malloc wasn't the problem, it was alignment of local variables.



        At the second vmovdqa to store the vector into the caller's pointer, RAX was not 32-byte aligned. vec in test2 seems not to be aligned. (Cygwin/mingw return the __m256i vector by reference with the caller passing a hidden pointer, unlike the standard Windows x64 calling convention that return it by value).



        This is the known Cygwin bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412) that Mysticial linked in comments: Cygwin GCC can't safely use AVX because it doesn't properly align the stack for __m256i locals that get stored to memory. (Cygwin/MinGW gcc will properly align alignas(32) int arr[8] = 0;, but they do it by aligning a separate pointer, not RSP or RBP. Apparently there's some SEH limitation on stack frame manipulation)



        Clang, MSVC, and ICC all support __m256i properly.



        With optimization enabled gcc often won't make faulting code, but sometimes even optimized code will store/reload a 32-byte vector to the stack.



        _ZL11load_vectorPKDv4_x:
        .LFB3671:
        .file 2 "min_case.c"
        .loc 2 4 0
        .cfi_startproc
        pushq %rbp
        .seh_pushreg %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq %rsp, %rbp
        .seh_setframe %rbp, 0
        .cfi_def_cfa_register 6
        subq $16, %rsp
        .seh_stackalloc 16
        .seh_endprologue
        movq %rcx, 16(%rbp)
        movq %rdx, 24(%rbp)
        movq 24(%rbp), %rax
        movq %rax, -8(%rbp)
        .LBB4:
        .LBB5:
        .file 3 "/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/include/avxintrin.h"
        .loc 3 909 0
        movq -8(%rbp), %rax
        vmovdqa (%rax), %ymm0
        .LBE5:
        .LBE4:
        .loc 2 5 0
        movq 16(%rbp), %rax
        vmovdqa %ymm0, (%rax)
        .loc 2 6 0
        movq 16(%rbp), %rax
        addq $16, %rsp
        popq %rbp
        .cfi_restore 6
        .cfi_def_cfa 7, 8
        ret



        __m256i was not aligned in this test-case:



        #include <immintrin.h>
        #include <stdint.h>
        #include <assert.h>
        #include <stdio.h>
        #include <stdlib.h>

        const char* check_alignment(const void *ptr, uintptr_t alignment)
        return (((uintptr_t)ptr) & (alignment - 1)) == 0 ? "aligned" : "NOT aligned";


        static inline __m256i load_vector(__m256i const * addr)
        printf("addr:%sn", check_alignment(addr, 32));
        __m256i res;
        printf("&res:%sn", check_alignment(&res, 32));
        res = _mm256_load_si256(addr);
        return res;

        void test2()
        int32_t *src;
        src = (int32_t *)_mm_malloc(sizeof(__m256i), 32);
        src[0] = 0; src[0] = 1; src[2] = 2; src[3] = 3;
        src[4] = 4; src[5] = 5; src[6] = 6; src[7] = 7;
        __m256i vec = load_vector((__m256i const * )src);
        _mm_free(src);


        int main(int argc,char *argv[])
        test2();
        return 0;


        // results
        // addr:aligned
        // &res:NOT aligned
        // Segmentation fault






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 11 at 2:25









        Peter Cordes

        132k18201338




        132k18201338










        answered Mar 10 at 17:28









        KurodaKuroda

        685




        685





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55054592%2favx-load-instruction-fails-on-cygwin%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

            Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

            List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229