Python output read the wrong way on the output.

afostor · May 2022

I checked my output dataframe in Python formula before the dataiku.Dataset("output_dataset").write_with_schema(df) command. But, when I check the output dataset, it ommits some rows and fill them with duplicates. What could have happended?

Alexandru · May 2022

Hi,

This type of issue can be due to an unsupported version of pandas being used in your code env.

Can you please confirm the exact Python version you are using and the pandas version? You can quickly check by running :

import pandas as pd
from platform import python_version

print(python_version())
print(pd.show_versions())

You should be using one of the Pandas versions available for your Code env -> Packages to install -> Core package versions:

Screenshot 2022-05-04 at 20.52.50.png

afostor · May 2022

These are my versions and, as I can observe, they match with the ones supported by Dataiku .

3.6.8

INSTALLED VERSIONS
------------------
commit           : b5958ee1999e9aead1938c0bba2b674378807b3d
python           : 3.6.8.final.0
python-bits      : 64
OS               : Linux
OS-release       : 3.10.0-1160.59.1.el7.x86_64
Version          : #1 SMP Wed Feb 16 12:17:35 UTC 2022
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : en_US.UTF-8
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.5
numpy            : 1.19.5
pytz             : 2020.5
dateutil         : 2.8.1
pip              : 21.3.1
setuptools       : 51.3.3
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.0.1
IPython          : 7.16.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : 2021.08.1
fastparquet      : None
gcsfs            : 2021.08.1
matplotlib       : 3.3.4
numexpr          : 2.7.3
odfpy            : None
openpyxl         : None
pandas_gbq       : 0.14.1
pyarrow          : 5.0.0
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.4
sqlalchemy       : 1.4.23
tables           : None
tabulate         : 0.8.9
xarray           : None
xlrd             : 2.0.1
xlwt             : None
numba            : 0.53.1
None

However, I changed my entire environment for another configured and it worked. I think maybe other libraries are not according to the Dataiku functionalities.

Thanks

Python output read the wrong way on the output.

Answers

Categories

Setup Info

Tags