Dummy/One-Hot Encode an Array/Set of Columns?

driscoll42
driscoll42 Registered Posts: 6

In my data I have two different types of data that I basically want to treat the same way. In one I have a column with array data, like:

ColumnA
[A,B]
[A]

[B,C]

I want to dummy encode these to make something like:

ColumnA_AColumnA_BColumnA_C
110
100
011

And then in another case I have a set of columns like:

ColumnAColumnBColumnC
A
AB
B
BCD

That ideally I'd like to merge together to make:

ABCD
1000
1100
0100
0111

The two scenarios I think are effectively the same (I could convert from one to the other easily enough), however I'm not sure the best way to do this. While I could write some python code to basically do this, in an ideal world, I wouldn't add a few hundred extra columns to my data. And I rather like in the models the showing that the ColumnA is 5% or whatever important. Sure I'd be breaking it up to show that ColumnA_A is probably 0.5%, but I don't want to be distributing that small if I don't have to.

Any suggestions on how to handle this?

Answers

Setup Info
    Tags
      Help me…