data type and value changed

omar21
omar21 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 18 ✭✭✭✭

I have a string column in a table that contains values like "00123456" when I import the table into jupyter with the cmd. get_dataframe the column type becomes integer and I get the value 123456 without 00. how can i fix this problem and get the same value and data type

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,987 Neuron

    What is the data type of that column in the schema of the dataset? (Dataset => Settings => Schema). Where is the dataset stored on? (what type of dataset?).

  • omar21
    omar21 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 18 ✭✭✭✭

    @Turribeach

    The type is string and the meaninig is text

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,987 Neuron
    edited July 17

    What do you get if you do this in Jupyter for for your dataframe object:

    dataset_df.info()
  • omar21
    omar21 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 18 ✭✭✭✭

    i get this ,

    for the column concerned its type after import into jupyter is object but when I check the values ​​I find them without 00

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker

    If you want to disable pandas' type inference, try passing infer_with_pandas=False to the get_dataframe call

  • omar21
    omar21 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 18 ✭✭✭✭

    I tested this with the parameter keep_default_na=True, but I received an error :
    ValueError: Integer column has NA values in column 15

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    Ah yes well pandas will not let you have NAs in integer columns
  • omar21
    omar21 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 18 ✭✭✭✭

    in my table i have 2 millions rows and when i add the parametrs infer_with_pandas=False, keep_default_na=True,limit=10000 to import just the first 10.000 rows it goes very well and the data type is not changed but with the all rows it gives an error

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    What is the error? If it's the NA error above, as I mentioned it will not be supported for an int column. Not having it for the first 10k rows just means that there is no NA in those 10k rows, but there are in other rows.
  • omar21
    omar21 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 18 ✭✭✭✭

    how to Specify dtype option on import with get_dataframe ?

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker

    For get_dataframe, the dtypes are either inferred by pandas or forced from the dataset's schema.

    If you want to force the dtypes, you can use iter_dataframes_forced_types.

  • omar21
    omar21 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 18 ✭✭✭✭

    thank you for your answer, and if I want to import my data to have a dataframe with iter_dataframes_forced_types how can I do it because I didn't understand the code

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    iter_dataframes_forced_types will iterate on chunks of your datasets, e.g. chunks of 10k rows, each given as a dataframe. If you want to have only one big dataframe (you will need to make sure there is enough memory to hold this), you can for instance pass a very large chunk size, and your loop would only iterate once since it will fetch all available data.
  • omar21
    omar21 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 18 ✭✭✭✭

    yes that's what I did and it gave me this like the image and now how to have it in the form of a dataframe table

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    In this example, the df variable contains the dataframe
  • omar21
    omar21 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 18 ✭✭✭✭

    in each column we have storage type and meaning and i want to force the meaning of a column to text with the formula or python code on a prepared recipe how can i do this ?

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    You can't with a formula or python step. You can force the meaning and storage type by clicking the column's header (on the meaning or storage type).

    (Side note: this topic is starting to digress, at some point it would be better to open new topics if the questions are only very loosely related to the original question )
Setup Info
    Tags
      Help me…