Shuffle a DataFrame while keeping internal orderShuffling a list of objectsHow to randomize (shuffle) a JavaScript array?Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasHow to change the order of DataFrame columns?Delete column from pandas DataFrame by column nameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersShuffle DataFrame rows

Can I use a neutral wire from another outlet to repair a broken neutral?

How to say in German "enjoying home comforts"

Does a druid starting with a bow start with no arrows?

Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?

I'm flying to France today and my passport expires in less than 2 months

RG-213 Cable with electric strained wire as metallic shield of Coaxial cable

How can saying a song's name be a copyright violation?

What is the word for reserving something for yourself before others do?

What reasons are there for a Capitalist to oppose a 100% inheritance tax?

How do conventional missiles fly?

How to take photos in burst mode, without vibration?

What killed these X2 caps?

Will google still index a page if I use a $_SESSION variable?

Can I ask the recruiters in my resume to put the reason why I am rejected?

90's TV series where a boy goes to another dimension through portal near power lines

How to draw the figure with four pentagons?

Could gravitational lensing be used to protect a spaceship from a laser?

Took a trip to a parallel universe, need help deciphering

Why doesn't H₄O²⁺ exist?

What's the difference between 'rename' and 'mv'?

Emailing HOD to enhance faculty application

How do I write bicross product symbols in latex?

Theorems that impeded progress

Western buddy movie with a supernatural twist where a woman turns into an eagle at the end



Shuffle a DataFrame while keeping internal order


Shuffling a list of objectsHow to randomize (shuffle) a JavaScript array?Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasHow to change the order of DataFrame columns?Delete column from pandas DataFrame by column nameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersShuffle DataFrame rows






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








3















I have a dataframe that contains a pre-processed data, such that every 4 rows is a sequence (later to be reshaped and used for lstm training).



I want to shuffle the dataframe, but I want to keep every sequence of rows untouched . For example:
a = [1,2,3,4,10,11,12,13,20,21,22,23] will turn into something like: a = [20,21,22,23,1,2,3,4,10,11,12,13].



df.sample(frac=1) is not enough since it will break the sequences.



Solution , thanks to @Wen-Ben:



seq_length = 4 
length_array = np.arange((df.shape[0]//seq_length)*seq_length)
trunc_data = df.head((df.shape[0]//seq_length)*seq_length)
d = x : y for x, y in trunc_data.groupby(length_array//seq_length)
yourdf = pd.concat([d.get(x) for x in np.random.choice(len(d),len(d.keys()),replace=False)])









share|improve this question



















  • 1





    Is there any other column in the frame which has one unique value per row sequence? For example, the column can have value 1 for sequence 1,2,3,4 and 2 for 10,11,12,13. If not, is it ok to add such a column?

    – entropy
    Mar 8 at 23:47











  • Question has nothing to do with machine-learning - kindly do not spam the tag (removed).

    – desertnaut
    Mar 9 at 0:19











  • @suicidalteddy 1,2,3,4 represent 4 rows of a certain column, not a row. I can add another column - but how will it help ? keep in mind that many values will repeat

    – M.F
    Mar 9 at 9:34


















3















I have a dataframe that contains a pre-processed data, such that every 4 rows is a sequence (later to be reshaped and used for lstm training).



I want to shuffle the dataframe, but I want to keep every sequence of rows untouched . For example:
a = [1,2,3,4,10,11,12,13,20,21,22,23] will turn into something like: a = [20,21,22,23,1,2,3,4,10,11,12,13].



df.sample(frac=1) is not enough since it will break the sequences.



Solution , thanks to @Wen-Ben:



seq_length = 4 
length_array = np.arange((df.shape[0]//seq_length)*seq_length)
trunc_data = df.head((df.shape[0]//seq_length)*seq_length)
d = x : y for x, y in trunc_data.groupby(length_array//seq_length)
yourdf = pd.concat([d.get(x) for x in np.random.choice(len(d),len(d.keys()),replace=False)])









share|improve this question



















  • 1





    Is there any other column in the frame which has one unique value per row sequence? For example, the column can have value 1 for sequence 1,2,3,4 and 2 for 10,11,12,13. If not, is it ok to add such a column?

    – entropy
    Mar 8 at 23:47











  • Question has nothing to do with machine-learning - kindly do not spam the tag (removed).

    – desertnaut
    Mar 9 at 0:19











  • @suicidalteddy 1,2,3,4 represent 4 rows of a certain column, not a row. I can add another column - but how will it help ? keep in mind that many values will repeat

    – M.F
    Mar 9 at 9:34














3












3








3








I have a dataframe that contains a pre-processed data, such that every 4 rows is a sequence (later to be reshaped and used for lstm training).



I want to shuffle the dataframe, but I want to keep every sequence of rows untouched . For example:
a = [1,2,3,4,10,11,12,13,20,21,22,23] will turn into something like: a = [20,21,22,23,1,2,3,4,10,11,12,13].



df.sample(frac=1) is not enough since it will break the sequences.



Solution , thanks to @Wen-Ben:



seq_length = 4 
length_array = np.arange((df.shape[0]//seq_length)*seq_length)
trunc_data = df.head((df.shape[0]//seq_length)*seq_length)
d = x : y for x, y in trunc_data.groupby(length_array//seq_length)
yourdf = pd.concat([d.get(x) for x in np.random.choice(len(d),len(d.keys()),replace=False)])









share|improve this question
















I have a dataframe that contains a pre-processed data, such that every 4 rows is a sequence (later to be reshaped and used for lstm training).



I want to shuffle the dataframe, but I want to keep every sequence of rows untouched . For example:
a = [1,2,3,4,10,11,12,13,20,21,22,23] will turn into something like: a = [20,21,22,23,1,2,3,4,10,11,12,13].



df.sample(frac=1) is not enough since it will break the sequences.



Solution , thanks to @Wen-Ben:



seq_length = 4 
length_array = np.arange((df.shape[0]//seq_length)*seq_length)
trunc_data = df.head((df.shape[0]//seq_length)*seq_length)
d = x : y for x, y in trunc_data.groupby(length_array//seq_length)
yourdf = pd.concat([d.get(x) for x in np.random.choice(len(d),len(d.keys()),replace=False)])






python pandas shuffle






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 9 at 10:31







M.F

















asked Mar 8 at 23:33









M.FM.F

235




235







  • 1





    Is there any other column in the frame which has one unique value per row sequence? For example, the column can have value 1 for sequence 1,2,3,4 and 2 for 10,11,12,13. If not, is it ok to add such a column?

    – entropy
    Mar 8 at 23:47











  • Question has nothing to do with machine-learning - kindly do not spam the tag (removed).

    – desertnaut
    Mar 9 at 0:19











  • @suicidalteddy 1,2,3,4 represent 4 rows of a certain column, not a row. I can add another column - but how will it help ? keep in mind that many values will repeat

    – M.F
    Mar 9 at 9:34













  • 1





    Is there any other column in the frame which has one unique value per row sequence? For example, the column can have value 1 for sequence 1,2,3,4 and 2 for 10,11,12,13. If not, is it ok to add such a column?

    – entropy
    Mar 8 at 23:47











  • Question has nothing to do with machine-learning - kindly do not spam the tag (removed).

    – desertnaut
    Mar 9 at 0:19











  • @suicidalteddy 1,2,3,4 represent 4 rows of a certain column, not a row. I can add another column - but how will it help ? keep in mind that many values will repeat

    – M.F
    Mar 9 at 9:34








1




1





Is there any other column in the frame which has one unique value per row sequence? For example, the column can have value 1 for sequence 1,2,3,4 and 2 for 10,11,12,13. If not, is it ok to add such a column?

– entropy
Mar 8 at 23:47





Is there any other column in the frame which has one unique value per row sequence? For example, the column can have value 1 for sequence 1,2,3,4 and 2 for 10,11,12,13. If not, is it ok to add such a column?

– entropy
Mar 8 at 23:47













Question has nothing to do with machine-learning - kindly do not spam the tag (removed).

– desertnaut
Mar 9 at 0:19





Question has nothing to do with machine-learning - kindly do not spam the tag (removed).

– desertnaut
Mar 9 at 0:19













@suicidalteddy 1,2,3,4 represent 4 rows of a certain column, not a row. I can add another column - but how will it help ? keep in mind that many values will repeat

– M.F
Mar 9 at 9:34






@suicidalteddy 1,2,3,4 represent 4 rows of a certain column, not a row. I can add another column - but how will it help ? keep in mind that many values will repeat

– M.F
Mar 9 at 9:34













3 Answers
3






active

oldest

votes


















1














Is this what you need , np.random.choice



d=x : y for x, y in df.groupby(np.arange(len(df))//4)

yourdf=pd.concat([d.get(x) for x in np.random.choice(len(d),2,replace=False)])
yourdf
Out[986]:
col1 col2
4 5 e
5 6 f
6 7 g
7 8 h
0 1 a
1 2 b
2 3 c
3 4 d





share|improve this answer

























  • This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to use np.roll)

    – P Maschhoff
    Mar 9 at 0:50











  • @PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` , reshape will failed

    – Wen-Ben
    Mar 9 at 0:54











  • @Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!

    – M.F
    Mar 9 at 10:29



















0














You can reshuffle in groups of 4 by... grouping the index into groups of four and then shuffling them.



Example:



df = pd.DataFrame(np.random.randint(10, size=(12, 2)))


 a b
0 5 4
1 7 7
2 7 8
3 8 4
4 9 4
5 9 0
6 1 5
7 4 1
8 0 1
9 5 6
10 1 3
11 9 2


new_index = np.array(df.index).reshape(-1, 4)
np.random.shuffle(new_index) # shuffles array in-place
df = df.loc[new_index.reshape(-1)]


 a b
8 0 1
9 5 6
10 1 3
11 9 2
4 9 4
5 9 0
6 1 5
7 4 1
0 5 4
1 7 7
2 7 8
3 8 4





share|improve this answer






























    0














    As you said that you have the data in sequences of 4, then the length of the data frame should be in multiples of 4. If your data is in sequences of 3, kindly, change 4 to 3 in the code.



    >>> import pandas as pd
    >>> import numpy as np


    Creating the table:



    >>> df = pd.DataFrame('col1':[1,2,3,4,5,6,7,8],'col2':['a','b','c','d','e','f','g','h'])
    >>> df
    col1 col2
    0 1 a
    1 2 b
    2 3 c
    3 4 d
    4 5 e
    5 6 f
    6 7 g
    7 8 h
    >>> df.shape[0]
    8


    Creating the list for shuffling:



    >>> np_range = np.arange(0,df.shape[0])
    >>> np_range
    array([0, 1, 2, 3, 4, 5, 6, 7])


    Reshaping and shuffling:



    >>> np_range1 = np.reshape(np_range,(df.shape[0]/4,4))
    >>> np_range1
    array([[0, 1, 2, 3],
    [4, 5, 6, 7]])
    >>> np.random.shuffle(np_range1)
    >>> np_range1
    array([[4, 5, 6, 7],
    [0, 1, 2, 3]])
    >>> np_range2 = np.reshape(np_range1,(df.shape[0],))
    >>> np_range2
    array([4, 5, 6, 7, 0, 1, 2, 3])


    Selecting the data:



    >>> new_df = df.loc[np_range2]
    >>> new_df
    col1 col2
    4 5 e
    5 6 f
    6 7 g
    7 8 h
    0 1 a
    1 2 b
    2 3 c
    3 4 d



    I hope this helps! Thank you!






    share|improve this answer

























    • @M.F I hope this helps

      – sambasiva rao
      Mar 9 at 0:16











    • @Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!

      – sambasiva rao
      Mar 9 at 0:19











    • When the length of df is not n*4, for example , it is has 6 rows, reshape will fail any thought ?

      – Wen-Ben
      Mar 9 at 0:55











    • @Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?

      – sambasiva rao
      Mar 9 at 1:06











    • I am not sure about that , that is why I am not using reshape.

      – Wen-Ben
      Mar 9 at 1:12











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072435%2fshuffle-a-dataframe-while-keeping-internal-order%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Is this what you need , np.random.choice



    d=x : y for x, y in df.groupby(np.arange(len(df))//4)

    yourdf=pd.concat([d.get(x) for x in np.random.choice(len(d),2,replace=False)])
    yourdf
    Out[986]:
    col1 col2
    4 5 e
    5 6 f
    6 7 g
    7 8 h
    0 1 a
    1 2 b
    2 3 c
    3 4 d





    share|improve this answer

























    • This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to use np.roll)

      – P Maschhoff
      Mar 9 at 0:50











    • @PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` , reshape will failed

      – Wen-Ben
      Mar 9 at 0:54











    • @Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!

      – M.F
      Mar 9 at 10:29
















    1














    Is this what you need , np.random.choice



    d=x : y for x, y in df.groupby(np.arange(len(df))//4)

    yourdf=pd.concat([d.get(x) for x in np.random.choice(len(d),2,replace=False)])
    yourdf
    Out[986]:
    col1 col2
    4 5 e
    5 6 f
    6 7 g
    7 8 h
    0 1 a
    1 2 b
    2 3 c
    3 4 d





    share|improve this answer

























    • This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to use np.roll)

      – P Maschhoff
      Mar 9 at 0:50











    • @PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` , reshape will failed

      – Wen-Ben
      Mar 9 at 0:54











    • @Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!

      – M.F
      Mar 9 at 10:29














    1












    1








    1







    Is this what you need , np.random.choice



    d=x : y for x, y in df.groupby(np.arange(len(df))//4)

    yourdf=pd.concat([d.get(x) for x in np.random.choice(len(d),2,replace=False)])
    yourdf
    Out[986]:
    col1 col2
    4 5 e
    5 6 f
    6 7 g
    7 8 h
    0 1 a
    1 2 b
    2 3 c
    3 4 d





    share|improve this answer















    Is this what you need , np.random.choice



    d=x : y for x, y in df.groupby(np.arange(len(df))//4)

    yourdf=pd.concat([d.get(x) for x in np.random.choice(len(d),2,replace=False)])
    yourdf
    Out[986]:
    col1 col2
    4 5 e
    5 6 f
    6 7 g
    7 8 h
    0 1 a
    1 2 b
    2 3 c
    3 4 d






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Mar 9 at 0:51

























    answered Mar 9 at 0:13









    Wen-BenWen-Ben

    123k83671




    123k83671












    • This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to use np.roll)

      – P Maschhoff
      Mar 9 at 0:50











    • @PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` , reshape will failed

      – Wen-Ben
      Mar 9 at 0:54











    • @Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!

      – M.F
      Mar 9 at 10:29


















    • This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to use np.roll)

      – P Maschhoff
      Mar 9 at 0:50











    • @PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` , reshape will failed

      – Wen-Ben
      Mar 9 at 0:54











    • @Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!

      – M.F
      Mar 9 at 10:29

















    This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to use np.roll)

    – P Maschhoff
    Mar 9 at 0:50





    This assumes the index is numeric and continuous. Check out my answer that makes no such assumptions. (Though this is an improvement on your original answer to use np.roll)

    – P Maschhoff
    Mar 9 at 0:50













    @PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` , reshape will failed

    – Wen-Ben
    Mar 9 at 0:54





    @PMaschhoff your answer is ok , but only work for the when the len of df is n*4 , assuming the df is length 6 ,` np.arange(len(df)-2).reshape(-1, 4)` , reshape will failed

    – Wen-Ben
    Mar 9 at 0:54













    @Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!

    – M.F
    Mar 9 at 10:29






    @Wen-Ben this is it! I tweaked your code a little bit - added as and edit in the question. Thanks!

    – M.F
    Mar 9 at 10:29














    0














    You can reshuffle in groups of 4 by... grouping the index into groups of four and then shuffling them.



    Example:



    df = pd.DataFrame(np.random.randint(10, size=(12, 2)))


     a b
    0 5 4
    1 7 7
    2 7 8
    3 8 4
    4 9 4
    5 9 0
    6 1 5
    7 4 1
    8 0 1
    9 5 6
    10 1 3
    11 9 2


    new_index = np.array(df.index).reshape(-1, 4)
    np.random.shuffle(new_index) # shuffles array in-place
    df = df.loc[new_index.reshape(-1)]


     a b
    8 0 1
    9 5 6
    10 1 3
    11 9 2
    4 9 4
    5 9 0
    6 1 5
    7 4 1
    0 5 4
    1 7 7
    2 7 8
    3 8 4





    share|improve this answer



























      0














      You can reshuffle in groups of 4 by... grouping the index into groups of four and then shuffling them.



      Example:



      df = pd.DataFrame(np.random.randint(10, size=(12, 2)))


       a b
      0 5 4
      1 7 7
      2 7 8
      3 8 4
      4 9 4
      5 9 0
      6 1 5
      7 4 1
      8 0 1
      9 5 6
      10 1 3
      11 9 2


      new_index = np.array(df.index).reshape(-1, 4)
      np.random.shuffle(new_index) # shuffles array in-place
      df = df.loc[new_index.reshape(-1)]


       a b
      8 0 1
      9 5 6
      10 1 3
      11 9 2
      4 9 4
      5 9 0
      6 1 5
      7 4 1
      0 5 4
      1 7 7
      2 7 8
      3 8 4





      share|improve this answer

























        0












        0








        0







        You can reshuffle in groups of 4 by... grouping the index into groups of four and then shuffling them.



        Example:



        df = pd.DataFrame(np.random.randint(10, size=(12, 2)))


         a b
        0 5 4
        1 7 7
        2 7 8
        3 8 4
        4 9 4
        5 9 0
        6 1 5
        7 4 1
        8 0 1
        9 5 6
        10 1 3
        11 9 2


        new_index = np.array(df.index).reshape(-1, 4)
        np.random.shuffle(new_index) # shuffles array in-place
        df = df.loc[new_index.reshape(-1)]


         a b
        8 0 1
        9 5 6
        10 1 3
        11 9 2
        4 9 4
        5 9 0
        6 1 5
        7 4 1
        0 5 4
        1 7 7
        2 7 8
        3 8 4





        share|improve this answer













        You can reshuffle in groups of 4 by... grouping the index into groups of four and then shuffling them.



        Example:



        df = pd.DataFrame(np.random.randint(10, size=(12, 2)))


         a b
        0 5 4
        1 7 7
        2 7 8
        3 8 4
        4 9 4
        5 9 0
        6 1 5
        7 4 1
        8 0 1
        9 5 6
        10 1 3
        11 9 2


        new_index = np.array(df.index).reshape(-1, 4)
        np.random.shuffle(new_index) # shuffles array in-place
        df = df.loc[new_index.reshape(-1)]


         a b
        8 0 1
        9 5 6
        10 1 3
        11 9 2
        4 9 4
        5 9 0
        6 1 5
        7 4 1
        0 5 4
        1 7 7
        2 7 8
        3 8 4






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 9 at 0:18









        P MaschhoffP Maschhoff

        1663




        1663





















            0














            As you said that you have the data in sequences of 4, then the length of the data frame should be in multiples of 4. If your data is in sequences of 3, kindly, change 4 to 3 in the code.



            >>> import pandas as pd
            >>> import numpy as np


            Creating the table:



            >>> df = pd.DataFrame('col1':[1,2,3,4,5,6,7,8],'col2':['a','b','c','d','e','f','g','h'])
            >>> df
            col1 col2
            0 1 a
            1 2 b
            2 3 c
            3 4 d
            4 5 e
            5 6 f
            6 7 g
            7 8 h
            >>> df.shape[0]
            8


            Creating the list for shuffling:



            >>> np_range = np.arange(0,df.shape[0])
            >>> np_range
            array([0, 1, 2, 3, 4, 5, 6, 7])


            Reshaping and shuffling:



            >>> np_range1 = np.reshape(np_range,(df.shape[0]/4,4))
            >>> np_range1
            array([[0, 1, 2, 3],
            [4, 5, 6, 7]])
            >>> np.random.shuffle(np_range1)
            >>> np_range1
            array([[4, 5, 6, 7],
            [0, 1, 2, 3]])
            >>> np_range2 = np.reshape(np_range1,(df.shape[0],))
            >>> np_range2
            array([4, 5, 6, 7, 0, 1, 2, 3])


            Selecting the data:



            >>> new_df = df.loc[np_range2]
            >>> new_df
            col1 col2
            4 5 e
            5 6 f
            6 7 g
            7 8 h
            0 1 a
            1 2 b
            2 3 c
            3 4 d



            I hope this helps! Thank you!






            share|improve this answer

























            • @M.F I hope this helps

              – sambasiva rao
              Mar 9 at 0:16











            • @Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!

              – sambasiva rao
              Mar 9 at 0:19











            • When the length of df is not n*4, for example , it is has 6 rows, reshape will fail any thought ?

              – Wen-Ben
              Mar 9 at 0:55











            • @Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?

              – sambasiva rao
              Mar 9 at 1:06











            • I am not sure about that , that is why I am not using reshape.

              – Wen-Ben
              Mar 9 at 1:12















            0














            As you said that you have the data in sequences of 4, then the length of the data frame should be in multiples of 4. If your data is in sequences of 3, kindly, change 4 to 3 in the code.



            >>> import pandas as pd
            >>> import numpy as np


            Creating the table:



            >>> df = pd.DataFrame('col1':[1,2,3,4,5,6,7,8],'col2':['a','b','c','d','e','f','g','h'])
            >>> df
            col1 col2
            0 1 a
            1 2 b
            2 3 c
            3 4 d
            4 5 e
            5 6 f
            6 7 g
            7 8 h
            >>> df.shape[0]
            8


            Creating the list for shuffling:



            >>> np_range = np.arange(0,df.shape[0])
            >>> np_range
            array([0, 1, 2, 3, 4, 5, 6, 7])


            Reshaping and shuffling:



            >>> np_range1 = np.reshape(np_range,(df.shape[0]/4,4))
            >>> np_range1
            array([[0, 1, 2, 3],
            [4, 5, 6, 7]])
            >>> np.random.shuffle(np_range1)
            >>> np_range1
            array([[4, 5, 6, 7],
            [0, 1, 2, 3]])
            >>> np_range2 = np.reshape(np_range1,(df.shape[0],))
            >>> np_range2
            array([4, 5, 6, 7, 0, 1, 2, 3])


            Selecting the data:



            >>> new_df = df.loc[np_range2]
            >>> new_df
            col1 col2
            4 5 e
            5 6 f
            6 7 g
            7 8 h
            0 1 a
            1 2 b
            2 3 c
            3 4 d



            I hope this helps! Thank you!






            share|improve this answer

























            • @M.F I hope this helps

              – sambasiva rao
              Mar 9 at 0:16











            • @Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!

              – sambasiva rao
              Mar 9 at 0:19











            • When the length of df is not n*4, for example , it is has 6 rows, reshape will fail any thought ?

              – Wen-Ben
              Mar 9 at 0:55











            • @Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?

              – sambasiva rao
              Mar 9 at 1:06











            • I am not sure about that , that is why I am not using reshape.

              – Wen-Ben
              Mar 9 at 1:12













            0












            0








            0







            As you said that you have the data in sequences of 4, then the length of the data frame should be in multiples of 4. If your data is in sequences of 3, kindly, change 4 to 3 in the code.



            >>> import pandas as pd
            >>> import numpy as np


            Creating the table:



            >>> df = pd.DataFrame('col1':[1,2,3,4,5,6,7,8],'col2':['a','b','c','d','e','f','g','h'])
            >>> df
            col1 col2
            0 1 a
            1 2 b
            2 3 c
            3 4 d
            4 5 e
            5 6 f
            6 7 g
            7 8 h
            >>> df.shape[0]
            8


            Creating the list for shuffling:



            >>> np_range = np.arange(0,df.shape[0])
            >>> np_range
            array([0, 1, 2, 3, 4, 5, 6, 7])


            Reshaping and shuffling:



            >>> np_range1 = np.reshape(np_range,(df.shape[0]/4,4))
            >>> np_range1
            array([[0, 1, 2, 3],
            [4, 5, 6, 7]])
            >>> np.random.shuffle(np_range1)
            >>> np_range1
            array([[4, 5, 6, 7],
            [0, 1, 2, 3]])
            >>> np_range2 = np.reshape(np_range1,(df.shape[0],))
            >>> np_range2
            array([4, 5, 6, 7, 0, 1, 2, 3])


            Selecting the data:



            >>> new_df = df.loc[np_range2]
            >>> new_df
            col1 col2
            4 5 e
            5 6 f
            6 7 g
            7 8 h
            0 1 a
            1 2 b
            2 3 c
            3 4 d



            I hope this helps! Thank you!






            share|improve this answer















            As you said that you have the data in sequences of 4, then the length of the data frame should be in multiples of 4. If your data is in sequences of 3, kindly, change 4 to 3 in the code.



            >>> import pandas as pd
            >>> import numpy as np


            Creating the table:



            >>> df = pd.DataFrame('col1':[1,2,3,4,5,6,7,8],'col2':['a','b','c','d','e','f','g','h'])
            >>> df
            col1 col2
            0 1 a
            1 2 b
            2 3 c
            3 4 d
            4 5 e
            5 6 f
            6 7 g
            7 8 h
            >>> df.shape[0]
            8


            Creating the list for shuffling:



            >>> np_range = np.arange(0,df.shape[0])
            >>> np_range
            array([0, 1, 2, 3, 4, 5, 6, 7])


            Reshaping and shuffling:



            >>> np_range1 = np.reshape(np_range,(df.shape[0]/4,4))
            >>> np_range1
            array([[0, 1, 2, 3],
            [4, 5, 6, 7]])
            >>> np.random.shuffle(np_range1)
            >>> np_range1
            array([[4, 5, 6, 7],
            [0, 1, 2, 3]])
            >>> np_range2 = np.reshape(np_range1,(df.shape[0],))
            >>> np_range2
            array([4, 5, 6, 7, 0, 1, 2, 3])


            Selecting the data:



            >>> new_df = df.loc[np_range2]
            >>> new_df
            col1 col2
            4 5 e
            5 6 f
            6 7 g
            7 8 h
            0 1 a
            1 2 b
            2 3 c
            3 4 d



            I hope this helps! Thank you!







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Mar 9 at 1:12

























            answered Mar 9 at 0:14









            sambasiva raosambasiva rao

            1386




            1386












            • @M.F I hope this helps

              – sambasiva rao
              Mar 9 at 0:16











            • @Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!

              – sambasiva rao
              Mar 9 at 0:19











            • When the length of df is not n*4, for example , it is has 6 rows, reshape will fail any thought ?

              – Wen-Ben
              Mar 9 at 0:55











            • @Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?

              – sambasiva rao
              Mar 9 at 1:06











            • I am not sure about that , that is why I am not using reshape.

              – Wen-Ben
              Mar 9 at 1:12

















            • @M.F I hope this helps

              – sambasiva rao
              Mar 9 at 0:16











            • @Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!

              – sambasiva rao
              Mar 9 at 0:19











            • When the length of df is not n*4, for example , it is has 6 rows, reshape will fail any thought ?

              – Wen-Ben
              Mar 9 at 0:55











            • @Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?

              – sambasiva rao
              Mar 9 at 1:06











            • I am not sure about that , that is why I am not using reshape.

              – Wen-Ben
              Mar 9 at 1:12
















            @M.F I hope this helps

            – sambasiva rao
            Mar 9 at 0:16





            @M.F I hope this helps

            – sambasiva rao
            Mar 9 at 0:16













            @Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!

            – sambasiva rao
            Mar 9 at 0:19





            @Wen-Ben roll with rotate the array i.e., bring last n indexes to the begining. It is clearly not performing any shuffle. Thank you!

            – sambasiva rao
            Mar 9 at 0:19













            When the length of df is not n*4, for example , it is has 6 rows, reshape will fail any thought ?

            – Wen-Ben
            Mar 9 at 0:55





            When the length of df is not n*4, for example , it is has 6 rows, reshape will fail any thought ?

            – Wen-Ben
            Mar 9 at 0:55













            @Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?

            – sambasiva rao
            Mar 9 at 1:06





            @Wen-Ben Yeah it will fail, but this depends on his data, when he is trying to shuffle his data which is already in the sequence of 4. Then the whole data should be in multiples of 4 for sure. If his data sequence is in multiples of three then he has to replace 4 with 3. Is my reasoning correct or did i miss something?

            – sambasiva rao
            Mar 9 at 1:06













            I am not sure about that , that is why I am not using reshape.

            – Wen-Ben
            Mar 9 at 1:12





            I am not sure about that , that is why I am not using reshape.

            – Wen-Ben
            Mar 9 at 1:12

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072435%2fshuffle-a-dataframe-while-keeping-internal-order%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

            Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

            List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229