Python output read the wrong way on the output.
I checked my output dataframe in Python formula before the dataiku.Dataset("output_dataset").write_with_schema(df) command. But, when I check the output dataset, it ommits some rows and fill them with duplicates. What could have happended?
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
This type of issue can be due to an unsupported version of pandas being used in your code env.
Can you please confirm the exact Python version you are using and the pandas version? You can quickly check by running :
import pandas as pd from platform import python_version print(python_version()) print(pd.show_versions())
You should be using one of the Pandas versions available for your Code env -> Packages to install -> Core package versions:
-
These are my versions and, as I can observe, they match with the ones supported by Dataiku .
3.6.8 INSTALLED VERSIONS ------------------ commit : b5958ee1999e9aead1938c0bba2b674378807b3d python : 3.6.8.final.0 python-bits : 64 OS : Linux OS-release : 3.10.0-1160.59.1.el7.x86_64 Version : #1 SMP Wed Feb 16 12:17:35 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.1.5 numpy : 1.19.5 pytz : 2020.5 dateutil : 2.8.1 pip : 21.3.1 setuptools : 51.3.3 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : 7.16.1 pandas_datareader: None bs4 : None bottleneck : None fsspec : 2021.08.1 fastparquet : None gcsfs : 2021.08.1 matplotlib : 3.3.4 numexpr : 2.7.3 odfpy : None openpyxl : None pandas_gbq : 0.14.1 pyarrow : 5.0.0 pytables : None pyxlsb : None s3fs : None scipy : 1.5.4 sqlalchemy : 1.4.23 tables : None tabulate : 0.8.9 xarray : None xlrd : 2.0.1 xlwt : None numba : 0.53.1 None
However, I changed my entire environment for another configured and it worked. I think maybe other libraries are not according to the Dataiku functionalities.
Thanks