Add support for Pandas 2.0

Pandas 2.0 can bring great performance improvements when using the pyarrow backend:
 
6 Comments
crunis
Level 2

It also allows nullable integer data type , sometimes is really annoying that my integer field becomes float only because there's a single NA

It also allows nullable integer data type , sometimes is really annoying that my integer field becomes float only because there's a single NA

AshleyW
Dataiker

Thanks for your idea, @Turribeach. Your idea meets the criteria for submission, we'll reach out should we require more information.

If you’re reading this post and think [add more optional details] would be a great capability to add to DSS, be sure to kudos the original post! Feel free to leave a comment in the discussion about how this capability would help you or your team.

Take care,
Ashley

Status changed to: In the Backlog

Thanks for your idea, @Turribeach. Your idea meets the criteria for submission, we'll reach out should we require more information.

If you’re reading this post and think [add more optional details] would be a great capability to add to DSS, be sure to kudos the original post! Feel free to leave a comment in the discussion about how this capability would help you or your team.

Take care,
Ashley

It also worth pointing out that we are already seeing the impact on not having a recent pandas version supported. In our Dataiku v12 environments it takes more than 6 mins to build any code environment with pandas. For 3.9 envs pip downloads pandas-1.1.5-cp39-cp39-manylinux1_x86_64.whl but for 3.11 pip gets pandas-1.3.5.tar.gz. This is because there are no pre-compiled pandas 1.3.5 for Python 3.11! So on Python 3.11 we are basically downloading the pandas source and building it from scratch including the cpython extensions. This is a risky thing to do since building from source is a much more complex and prone to error process than just installing a pre-compiled package. So sooner or later this build will break either due to OS or package dependencies.

And to add more complexity our Python developers which work on our internal Python data libraries also struggle to get code envs created using Python 3.11 and pandas 1.3.5 since there are no pre-compiled binaries for Windows and building from source is even harder on Windows as Windows doesn't come with the necesary software to do so.

It also worth pointing out that we are already seeing the impact on not having a recent pandas version supported. In our Dataiku v12 environments it takes more than 6 mins to build any code environment with pandas. For 3.9 envs pip downloads pandas-1.1.5-cp39-cp39-manylinux1_x86_64.whl but for 3.11 pip gets pandas-1.3.5.tar.gz. This is because there are no pre-compiled pandas 1.3.5 for Python 3.11! So on Python 3.11 we are basically downloading the pandas source and building it from scratch including the cpython extensions. This is a risky thing to do since building from source is a much more complex and prone to error process than just installing a pre-compiled package. So sooner or later this build will break either due to OS or package dependencies.

And to add more complexity our Python developers which work on our internal Python data libraries also struggle to get code envs created using Python 3.11 and pandas 1.3.5 since there are no pre-compiled binaries for Windows and building from source is even harder on Windows as Windows doesn't come with the necesary software to do so.

AsishM
Level 2

There is a decent amount SQLAlchemy functionality that was introduced in 2.0+ that we require but pandas < 2 is not compatible with it at all, rendering any projects that need it having to skip using Dataiku. Pandas 1.3.5 is almost 2 years old by now. 

There is a decent amount SQLAlchemy functionality that was introduced in 2.0+ that we require but pandas < 2 is not compatible with it at all, rendering any projects that need it having to skip using Dataiku. Pandas 1.3.5 is almost 2 years old by now. 

Great to see this product idea gathering some momentum. If you haven't done so please raise a support ticket with Dataiku and express your desire to get pandas 2.0 or above supported. Then also speak to your Customer Success Manager / Account Manager and reiterate your request. Last time I checked with Dataiku they said there was no interest from other customers for support for pandas 2.x so you need to make yourself heard if you want Dataiku to look at this request. Thanks!

Great to see this product idea gathering some momentum. If you haven't done so please raise a support ticket with Dataiku and express your desire to get pandas 2.0 or above supported. Then also speak to your Customer Success Manager / Account Manager and reiterate your request. Last time I checked with Dataiku they said there was no interest from other customers for support for pandas 2.x so you need to make yourself heard if you want Dataiku to look at this request. Thanks!

No pandas 2.0 yet but Dataiku v12.5 added support for pandas 1.4/1.5:

https://doc.dataiku.com/dss/latest/release_notes/12.html#apis-and-coding-experience

This basically completes all Pandas versions since 0.23 below 2.x.

No pandas 2.0 yet but Dataiku v12.5 added support for pandas 1.4/1.5:

https://doc.dataiku.com/dss/latest/release_notes/12.html#apis-and-coding-experience

This basically completes all Pandas versions since 0.23 below 2.x.