Does integer encoding of strings and using this as an input to decision tree (sklearn) makes the splitting attributes discrete or continuous?Passing categorical data to Sklearn Decision TreeHow to handle catagorical data while training decision tree using scikit-learn/ sklearn?How to explain feature importance after one-hot encode used for decision treeIs there any way to visualize decision tree (sklearn) with categorical features consolidated from one hot encoded features?Discretizing continuous variables for RandomForest in SklearnUse of one-hot encoder to build decision treesIn sklearn, how can one-hot encoding help when building decision tree with categorical features?Using OneHotEncoder for categorical features in decision tree classifierDecision Tree producing 0.5 value at split for binary variablesHow to use unified pipelines on numerical and categorical features in machine learning?How do you feed 'str' data to decision tree without one-hot encoding

Is exact Kanji stroke length important?

How long to clear the 'suck zone' of a turbofan after start is initiated?

I'm in charge of equipment buying but no one's ever happy with what I choose. How to fix this?

Is expanding the research of a group into machine learning as a PhD student risky?

Pre-amplifier input protection

How does Loki do this?

Energy of the particles in the particle accelerator

What can we do to stop prior company from asking us questions?

India just shot down a satellite from the ground. At what altitude range is the resulting debris field?

How do scammers retract money, while you can’t?

Detecting if an element is found inside a container

How does it work when somebody invests in my business?

Is this apparent Class Action settlement a spam message?

How did Doctor Strange see the winning outcome in Avengers: Infinity War?

Two monoidal structures and copowering

Is `x >> pure y` equivalent to `liftM (const y) x`

Sequence of Tenses: Translating the subjunctive

How do I extract a value from a time formatted value in excel?

Can the discrete variable be a negative number?

Is there a korbon needed for conversion?

Where does the Z80 processor start executing from?

Do sorcerers' Subtle Spells require a skill check to be unseen?

How do I rename a Linux host without needing to reboot for the rename to take effect?

Is oxalic acid dihydrate considered a primary acid standard in analytical chemistry?

Does integer encoding of strings and using this as an input to decision tree (sklearn) makes the splitting attributes discrete or continuous?

Passing categorical data to Sklearn Decision TreeHow to handle catagorical data while training decision tree using scikit-learn/ sklearn?How to explain feature importance after one-hot encode used for decision treeIs there any way to visualize decision tree (sklearn) with categorical features consolidated from one hot encoded features?Discretizing continuous variables for RandomForest in SklearnUse of one-hot encoder to build decision treesIn sklearn, how can one-hot encoding help when building decision tree with categorical features?Using OneHotEncoder for categorical features in decision tree classifierDecision Tree producing 0.5 value at split for binary variablesHow to use unified pipelines on numerical and categorical features in machine learning?How do you feed 'str' data to decision tree without one-hot encoding

I have to use Decision Tree classifier to classify certain data. However, the attribute values are strings, and as I found here: https://datascience.stackexchange.com/questions/5226/strings-as-features-in-decision-tree-random-forest, it said that strings cannot be used as an input. Hence I used integer encoding for the strings.

In this article, Passing categorical data to Sklearn Decision Tree, I found out that passing integer-encoded data may result in a wrong answer since sklearn assumes an ordering among the data. So, the only way out is using OneHotEncoder module.

Using OneHotEncoder module increases the number of features (e.g. if there is an attribute 'price' with values ['high','med','low'], one-hot-encoding would result in inclusion of 3 attributes related to the actual attribute 'price'; those can be interpreted as ['price-high','price-med', 'price-low'] and the attribute values will be either 1 or 0 depending on the data), which I don't want since I have to print the decision tree in a certain format which would require the original features (e.g. I need 'price').

Is there a way out of this?

asked Mar 8 at 11:26

Sarthak Chakraborty

111

add a comment |

Is there a way out of this?

asked Mar 8 at 11:26

Sarthak Chakraborty

111

add a comment |

Is there a way out of this?

asked Mar 8 at 11:26

Sarthak Chakraborty

111

Is there a way out of this?

python scikit-learn decision-tree

asked Mar 8 at 11:26

Sarthak Chakraborty

111

asked Mar 8 at 11:26

Sarthak Chakraborty

111

asked Mar 8 at 11:26

Sarthak Chakraborty

111

asked Mar 8 at 11:26

Sarthak Chakraborty

111

asked Mar 8 at 11:26

Sarthak Chakraborty

111

add a comment |

1 Answer
1

active

oldest

votes

I think pd.get_dummies would be useful since you want to keep track of the original feature names, when creating one-hot vectors.

Example:

df = pd.DataFrame('price': ['high', 'medium', 'high', 'low'], 'some_feature': ['b', 'a', 'c','a'])
pd.get_dummies(df,columns=['price','some_feature'])

 price_high price_low price_medium some_feature_a some_feature_b some_feature_c
0 1 0 0 0 1 0
1 0 0 1 1 0 0
2 1 0 0 0 0 1
3 0 1 0 1 0 0

When feed this dataframe to decision tree, you could get a better understanding!

answered Mar 9 at 12:38

AI_Learning

4,05021035

Sure. That would convert the data to one-hot-encoded form. But, the Decision Tree will be constructed on the new features (e.g. price_high, price_low, etc). So while printing the Decision Tree, the features would not be "price" or "some_feature", but "price_high", "price_low", etc.

– Sarthak Chakraborty
Mar 10 at 16:42

yes. why do you want the see just the price as feature name when we already created dummies for it. I think, having it as price_high would have more explanation of how the split has been made in the decision tree

– AI_Learning
Mar 10 at 17:59

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55062283%2fdoes-integer-encoding-of-strings-and-using-this-as-an-input-to-decision-tree-sk%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I think pd.get_dummies would be useful since you want to keep track of the original feature names, when creating one-hot vectors.

Example:

df = pd.DataFrame('price': ['high', 'medium', 'high', 'low'], 'some_feature': ['b', 'a', 'c','a'])
pd.get_dummies(df,columns=['price','some_feature'])

 price_high price_low price_medium some_feature_a some_feature_b some_feature_c
0 1 0 0 0 1 0
1 0 0 1 1 0 0
2 1 0 0 0 0 1
3 0 1 0 1 0 0

When feed this dataframe to decision tree, you could get a better understanding!

answered Mar 9 at 12:38

AI_Learning

4,05021035

Sure. That would convert the data to one-hot-encoded form. But, the Decision Tree will be constructed on the new features (e.g. price_high, price_low, etc). So while printing the Decision Tree, the features would not be "price" or "some_feature", but "price_high", "price_low", etc.

– Sarthak Chakraborty
Mar 10 at 16:42

yes. why do you want the see just the price as feature name when we already created dummies for it. I think, having it as price_high would have more explanation of how the split has been made in the decision tree

– AI_Learning
Mar 10 at 17:59

add a comment |

I think pd.get_dummies would be useful since you want to keep track of the original feature names, when creating one-hot vectors.

Example:

df = pd.DataFrame('price': ['high', 'medium', 'high', 'low'], 'some_feature': ['b', 'a', 'c','a'])
pd.get_dummies(df,columns=['price','some_feature'])

 price_high price_low price_medium some_feature_a some_feature_b some_feature_c
0 1 0 0 0 1 0
1 0 0 1 1 0 0
2 1 0 0 0 0 1
3 0 1 0 1 0 0

When feed this dataframe to decision tree, you could get a better understanding!

answered Mar 9 at 12:38

AI_Learning

4,05021035

Sure. That would convert the data to one-hot-encoded form. But, the Decision Tree will be constructed on the new features (e.g. price_high, price_low, etc). So while printing the Decision Tree, the features would not be "price" or "some_feature", but "price_high", "price_low", etc.

– Sarthak Chakraborty
Mar 10 at 16:42

yes. why do you want the see just the price as feature name when we already created dummies for it. I think, having it as price_high would have more explanation of how the split has been made in the decision tree

– AI_Learning
Mar 10 at 17:59

add a comment |

I think pd.get_dummies would be useful since you want to keep track of the original feature names, when creating one-hot vectors.

Example:

df = pd.DataFrame('price': ['high', 'medium', 'high', 'low'], 'some_feature': ['b', 'a', 'c','a'])
pd.get_dummies(df,columns=['price','some_feature'])

 price_high price_low price_medium some_feature_a some_feature_b some_feature_c
0 1 0 0 0 1 0
1 0 0 1 1 0 0
2 1 0 0 0 0 1
3 0 1 0 1 0 0

When feed this dataframe to decision tree, you could get a better understanding!

answered Mar 9 at 12:38

AI_Learning

4,05021035

I think pd.get_dummies would be useful since you want to keep track of the original feature names, when creating one-hot vectors.

Example:

df = pd.DataFrame('price': ['high', 'medium', 'high', 'low'], 'some_feature': ['b', 'a', 'c','a'])
pd.get_dummies(df,columns=['price','some_feature'])

 price_high price_low price_medium some_feature_a some_feature_b some_feature_c
0 1 0 0 0 1 0
1 0 0 1 1 0 0
2 1 0 0 0 0 1
3 0 1 0 1 0 0

When feed this dataframe to decision tree, you could get a better understanding!

answered Mar 9 at 12:38

AI_Learning

4,05021035

answered Mar 9 at 12:38

AI_Learning

4,05021035

answered Mar 9 at 12:38

AI_Learning

4,05021035

answered Mar 9 at 12:38

AI_Learning

4,05021035

Sure. That would convert the data to one-hot-encoded form. But, the Decision Tree will be constructed on the new features (e.g. price_high, price_low, etc). So while printing the Decision Tree, the features would not be "price" or "some_feature", but "price_high", "price_low", etc.

– Sarthak Chakraborty
Mar 10 at 16:42

yes. why do you want the see just the price as feature name when we already created dummies for it. I think, having it as price_high would have more explanation of how the split has been made in the decision tree

– AI_Learning
Mar 10 at 17:59

add a comment |

Sure. That would convert the data to one-hot-encoded form. But, the Decision Tree will be constructed on the new features (e.g. price_high, price_low, etc). So while printing the Decision Tree, the features would not be "price" or "some_feature", but "price_high", "price_low", etc.

– Sarthak Chakraborty
Mar 10 at 16:42

yes. why do you want the see just the price as feature name when we already created dummies for it. I think, having it as price_high would have more explanation of how the split has been made in the decision tree

– AI_Learning
Mar 10 at 17:59

Sure. That would convert the data to one-hot-encoded form. But, the Decision Tree will be constructed on the new features (e.g. price_high, price_low, etc). So while printing the Decision Tree, the features would not be "price" or "some_feature", but "price_high", "price_low", etc.

– Sarthak Chakraborty
Mar 10 at 16:42

yes. why do you want the see just the price as feature name when we already created dummies for it. I think, having it as price_high would have more explanation of how the split has been made in the decision tree

– AI_Learning
Mar 10 at 17:59

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggtcf

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

1 Answer
1

1 Answer
1

1 Answer
1