Survey banner
Share your feedback on the Dataiku documentation with this 5 min survey. Thanks! TAKE THE SURVEY

data type and value changed

omar21
Level 2
data type and value changed

I have a string column in a table that contains values like "00123456" when I import the table into jupyter with the cmd. get_dataframe the column type becomes integer and I get the value 123456 without 00. how can i fix this problem and get the same value and data type

17 Replies
Turribeach

What is the data type of that column in the schema of the dataset? (Dataset => Settings => Schema). Where is the dataset stored on? (what type of dataset?).

0 Kudos
omar21
Level 2
Author

@Turribeach 

The type is string and the meaninig is text

0 Kudos
Turribeach

What do you get if you do this in Jupyter for for your dataframe object:

dataset_df.info()
0 Kudos
omar21
Level 2
Author

i get this , 

for the column concerned its type after import into jupyter is object but when I check the values ​​I find them without 00

0 Kudos
AdrienL
Dataiker

If you want to disable pandas' type inference, try passing infer_with_pandas=False to the get_dataframe call

omar21
Level 2
Author

I tested this with the parameter keep_default_na=True, but I received an error :
ValueError: Integer column has NA values in column 15

0 Kudos
AdrienL
Dataiker
Ah yes well pandas will not let you have NAs in integer columns
0 Kudos
omar21
Level 2
Author

in my table i have 2 millions rows and when i add the parametrs infer_with_pandas=False, keep_default_na=True,limit=10000 to import just the first 10.000 rows it goes very well and the data type is not changed but with the all rows it gives an error

0 Kudos
AdrienL
Dataiker
What is the error? If it's the NA error above, as I mentioned it will not be supported for an int column. Not having it for the first 10k rows just means that there is no NA in those 10k rows, but there are in other rows.
0 Kudos
omar21
Level 2
Author

how to Specify dtype option on import with get_dataframe ? 

0 Kudos
AdrienL
Dataiker

For get_dataframe, the dtypes are either inferred by pandas or forced from the dataset's schema.

If you want to force the dtypes, you can use iter_dataframes_forced_types.

0 Kudos
omar21
Level 2
Author

thank you for your answer, and if I want to import my data to have a dataframe with iter_dataframes_forced_types how can I do it because I didn't understand the code

0 Kudos
AdrienL
Dataiker
iter_dataframes_forced_types will iterate on chunks of your datasets, e.g. chunks of 10k rows, each given as a dataframe. If you want to have only one big dataframe (you will need to make sure there is enough memory to hold this), you can for instance pass a very large chunk size, and your loop would only iterate once since it will fetch all available data.
0 Kudos
omar21
Level 2
Author

yes that's what I did and it gave me this like the image and now how to have it in the form of a dataframe table 

0 Kudos
AdrienL
Dataiker
In this example, the df variable contains the dataframe
0 Kudos
omar21
Level 2
Author

in each column we have storage type and meaning and i want to force the meaning of a column to text with the formula or python code on a prepared recipe how can i do this ?

 

 

0 Kudos
AdrienL
Dataiker
You can't with a formula or python step. You can force the meaning and storage type by clicking the column's header (on the meaning or storage type).

(Side note: this topic is starting to digress, at some point it would be better to open new topics if the questions are only very loosely related to the original question 🙂)
0 Kudos