Add support for Pandas 2.0

Turribeach · ‎07-26-2023

Pandas 2.0 can bring great performance improvements when using the pyarrow backend:

https://towardsdatascience.com/pandas-2-0-a-game-changer-for-data-scientists-3cd281fcc4b4

crunis · ‎08-21-2023

It also allows nullable integer data type , sometimes is really annoying that my integer field becomes float only because there's a single NA

AshleyW · ‎11-09-2023

Thanks for your idea, @Turribeach. Your idea meets the criteria for submission, we'll reach out should we require more information.

If you’re reading this post and think [add more optional details] would be a great capability to add to DSS, be sure to kudos the original post! Feel free to leave a comment in the discussion about how this capability would help you or your team.

Take care,
Ashley

Turribeach · ‎11-28-2023

It also worth pointing out that we are already seeing the impact on not having a recent pandas version supported. In our Dataiku v12 environments it takes more than 6 mins to build any code environment with pandas. For 3.9 envs pip downloads pandas-1.1.5-cp39-cp39-manylinux1_x86_64.whl but for 3.11 pip gets pandas-1.3.5.tar.gz. This is because there are no pre-compiled pandas 1.3.5 for Python 3.11! So on Python 3.11 we are basically downloading the pandas source and building it from scratch including the cpython extensions. This is a risky thing to do since building from source is a much more complex and prone to error process than just installing a pre-compiled package. So sooner or later this build will break either due to OS or package dependencies.

And to add more complexity our Python developers which work on our internal Python data libraries also struggle to get code envs created using Python 3.11 and pandas 1.3.5 since there are no pre-compiled binaries for Windows and building from source is even harder on Windows as Windows doesn't come with the necesary software to do so.

AsishM · ‎12-01-2023

There is a decent amount SQLAlchemy functionality that was introduced in 2.0+ that we require but pandas < 2 is not compatible with it at all, rendering any projects that need it having to skip using Dataiku. Pandas 1.3.5 is almost 2 years old by now.

Turribeach · ‎01-22-2024

Great to see this product idea gathering some momentum. If you haven't done so please raise a support ticket with Dataiku and express your desire to get pandas 2.0 or above supported. Then also speak to your Customer Success Manager / Account Manager and reiterate your request. Last time I checked with Dataiku they said there was no interest from other customers for support for pandas 2.x so you need to make yourself heard if you want Dataiku to look at this request. Thanks!

Turribeach · ‎01-23-2024

No pandas 2.0 yet but Dataiku v12.5 added support for pandas 1.4/1.5:

https://doc.dataiku.com/dss/latest/release_notes/12.html#apis-and-coding-experience

This basically completes all Pandas versions since 0.23 below 2.x.

Add support for Pandas 2.0

Labels

Data Exploration and Preparation

Designer Experience

Ecosystem and Integrations

platform and infrastructure

Consistent display of chart title when hover on chart tab

I want to use Dataiku in Japanese.

Programmatic Git Support (Shell, Python API or Both)

Method to re-order V12 Visual ML override rules