Join us on today at 11 am ET for Leveraging Marketing Data in the Sports and Entertainment World Learn more

read.dataset: avoir reading as factor?

Dataiker
Dataiker
read.dataset: avoir reading as factor?
Hi,

When reading a dataset with an R recipe, I find myself struggling with types. In this specific case I would like to know how to prevent the DSS from reading my string columns as factors. In R there is an option "stringsAsFactors = F" in the function "data.frame". Is there any equivalent in the DSS?

My column is stored as a string but when using a recipe it is read as a factor. I need to compare the value with a specific string and it cannot be done with factors.

Alternatively, would you be able to suggest any function to convert the type? A naive way to do this would be to use as.character but it doesn't work (the output is a number, not my word as a string).

Many thanks,

Raphaëlle
0 Kudos
5 Replies
Dataiker Alumni
Hi Raphaëlle,

The trick might be to apply as.character on individual columns, not on the dataframe. Does this page answer your question? http://stackoverflow.com/questions/19204729/how-to-change-factor-labels-into-string-in-a-data-frame
0 Kudos
Level 1
Thank you for you reply but I am afraid that it doesn't. My question wasn't clear, allow me to rephrase it.

I am actually trying to applying as.character() on an individual column.
Here is an example (I cannot share the actual data) of the vector Z:
Aaaa
Bbbb
Cccc
So indeed I can use as.character(Z) but it doesn't always work for some reason. (number output instead of "Aaaa", "Bbbb", "Cccc").

The point of my question was to point out that it'd be probably more efficient to use an option similar to stringsAsFactors = F when reading the whole dataset.
This page points out the difference: http://stackoverflow.com/questions/2851015/convert-data-frame-columns-from-factors-to-characters
0 Kudos
Dataiker
Dataiker
Hi Raphaëlle,

What version of DSS are you using? In DSS 2.1, we switched from stringsAsFactors=T to stringsAsFactors=F, so all of your character columns should now be read as characters, not factors.

On a related note, would you like the ability to specify stringsAsFactors or is having stringsAsFactors=F sufficient?

Thanks,

Eric
Dataiker
Dataiker
One other thing:

Because of this change, we deprecated read.dataset(). dkuReadDataset() is now the preferred function.
0 Kudos
Level 1
Hello Eric,

I am using DSS 2.0.4a ! Glad to see that this option is implemented handily in 2.1.

In my opinion it would be useful to be able to specify stringsAsFactors for specific cases. However, if the default is F, it should be fine in most situations.

Many thanks !
Regards,
Raphaëlle
0 Kudos
Labels (2)