Python output read the wrong way on the output.

afostor
afostor Registered Posts: 6 ✭✭✭✭

I checked my output dataframe in Python formula before the dataiku.Dataset("output_dataset").write_with_schema(df) command. But, when I check the output dataset, it ommits some rows and fill them with duplicates. What could have happended?

Tagged:

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,352 Dataiker
    edited July 2024

    Hi,

    This type of issue can be due to an unsupported version of pandas being used in your code env.

    Can you please confirm the exact Python version you are using and the pandas version? You can quickly check by running :

    import pandas as pd
    from platform import python_version
    
    print(python_version())
    print(pd.show_versions())

    You should be using one of the Pandas versions available for your Code env -> Packages to install -> Core package versions:

    Screenshot 2022-05-04 at 20.52.50.png

  • afostor
    afostor Registered Posts: 6 ✭✭✭✭
    edited July 2024

    These are my versions and, as I can observe, they match with the ones supported by Dataiku .

    3.6.8
    
    INSTALLED VERSIONS
    ------------------
    commit           : b5958ee1999e9aead1938c0bba2b674378807b3d
    python           : 3.6.8.final.0
    python-bits      : 64
    OS               : Linux
    OS-release       : 3.10.0-1160.59.1.el7.x86_64
    Version          : #1 SMP Wed Feb 16 12:17:35 UTC 2022
    machine          : x86_64
    processor        : x86_64
    byteorder        : little
    LC_ALL           : en_US.UTF-8
    LANG             : en_US.UTF-8
    LOCALE           : en_US.UTF-8
    
    pandas           : 1.1.5
    numpy            : 1.19.5
    pytz             : 2020.5
    dateutil         : 2.8.1
    pip              : 21.3.1
    setuptools       : 51.3.3
    Cython           : None
    pytest           : None
    hypothesis       : None
    sphinx           : None
    blosc            : None
    feather          : None
    xlsxwriter       : None
    lxml.etree       : None
    html5lib         : None
    pymysql          : None
    psycopg2         : None
    jinja2           : 3.0.1
    IPython          : 7.16.1
    pandas_datareader: None
    bs4              : None
    bottleneck       : None
    fsspec           : 2021.08.1
    fastparquet      : None
    gcsfs            : 2021.08.1
    matplotlib       : 3.3.4
    numexpr          : 2.7.3
    odfpy            : None
    openpyxl         : None
    pandas_gbq       : 0.14.1
    pyarrow          : 5.0.0
    pytables         : None
    pyxlsb           : None
    s3fs             : None
    scipy            : 1.5.4
    sqlalchemy       : 1.4.23
    tables           : None
    tabulate         : 0.8.9
    xarray           : None
    xlrd             : 2.0.1
    xlwt             : None
    numba            : 0.53.1
    None

    However, I changed my entire environment for another configured and it worked. I think maybe other libraries are not according to the Dataiku functionalities.

    Thanks

Setup Info
    Tags
      Help me…