data type and value changed
I have a string column in a table that contains values like "00123456" when I import the table into jupyter with the cmd. get_dataframe the column type becomes integer and I get the value 123456 without 00. how can i fix this problem and get the same value and data type
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,978 Neuron
What is the data type of that column in the schema of the dataset? (Dataset => Settings => Schema). Where is the dataset stored on? (what type of dataset?).
-
The type is string and the meaninig is text
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,978 Neuron
What do you get if you do this in Jupyter for for your dataframe object:
dataset_df.info()
-
i get this ,
for the column concerned its type after import into jupyter is object but when I check the values I find them without 00
-
If you want to disable pandas' type inference, try passing infer_with_pandas=False to the get_dataframe call
-
I tested this with the parameter keep_default_na=True, but I received an error :
ValueError: Integer column has NA values in column 15 -
Ah yes well pandas will not let you have NAs in integer columns
-
in my table i have 2 millions rows and when i add the parametrs infer_with_pandas=False, keep_default_na=True,limit=10000 to import just the first 10.000 rows it goes very well and the data type is not changed but with the all rows it gives an error
-
What is the error? If it's the NA error above, as I mentioned it will not be supported for an int column. Not having it for the first 10k rows just means that there is no NA in those 10k rows, but there are in other rows.
-
how to Specify dtype option on import with get_dataframe ?
-
For get_dataframe, the dtypes are either inferred by pandas or forced from the dataset's schema.
If you want to force the dtypes, you can use iter_dataframes_forced_types.
-
thank you for your answer, and if I want to import my data to have a dataframe with iter_dataframes_forced_types how can I do it because I didn't understand the code
-
iter_dataframes_forced_types will iterate on chunks of your datasets, e.g. chunks of 10k rows, each given as a dataframe. If you want to have only one big dataframe (you will need to make sure there is enough memory to hold this), you can for instance pass a very large chunk size, and your loop would only iterate once since it will fetch all available data.
-
yes that's what I did and it gave me this like the image and now how to have it in the form of a dataframe table
-
In this example, the df variable contains the dataframe
-
in each column we have storage type and meaning and i want to force the meaning of a column to text with the formula or python code on a prepared recipe how can i do this ?
-
You can't with a formula or python step. You can force the meaning and storage type by clicking the column's header (on the meaning or storage type).
(Side note: this topic is starting to digress, at some point it would be better to open new topics if the questions are only very loosely related to the original question)