Add support for Pandas 2.0
Comments
-
It also allows nullable integer data type , sometimes is really annoying that my integer field becomes float only because there's a single NA
-
Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 162 Dataiker
Thanks for your idea, @Turribeach
. Your idea meets the criteria for submission, we'll reach out should we require more information.
If you’re reading this post and think [add more optional details] would be a great capability to add to DSS, be sure to kudos the original post! Feel free to leave a comment in the discussion about how this capability would help you or your team.
Take care,
Ashley -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
It also worth pointing out that we are already seeing the impact on not having a recent pandas version supported. In our Dataiku v12 environments it takes more than 6 mins to build any code environment with pandas. For 3.9 envs pip downloads pandas-1.1.5-cp39-cp39-manylinux1_x86_64.whl but for 3.11 pip gets pandas-1.3.5.tar.gz. This is because there are no pre-compiled pandas 1.3.5 for Python 3.11! So on Python 3.11 we are basically downloading the pandas source and building it from scratch including the cpython extensions. This is a risky thing to do since building from source is a much more complex and prone to error process than just installing a pre-compiled package. So sooner or later this build will break either due to OS or package dependencies.
And to add more complexity our Python developers which work on our internal Python data libraries also struggle to get code envs created using Python 3.11 and pandas 1.3.5 since there are no pre-compiled binaries for Windows and building from source is even harder on Windows as Windows doesn't come with the necesary software to do so.
-
There is a decent amount SQLAlchemy functionality that was introduced in 2.0+ that we require but pandas < 2 is not compatible with it at all, rendering any projects that need it having to skip using Dataiku. Pandas 1.3.5 is almost 2 years old by now.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
Great to see this product idea gathering some momentum. If you haven't done so please raise a support ticket with Dataiku and express your desire to get pandas 2.0 or above supported. Then also speak to your Customer Success Manager / Account Manager and reiterate your request. Last time I checked with Dataiku they said there was no interest from other customers for support for pandas 2.x so you need to make yourself heard if you want Dataiku to look at this request. Thanks!
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
No pandas 2.0 yet but Dataiku v12.5 added support for pandas 1.4/1.5:
https://doc.dataiku.com/dss/latest/release_notes/12.html#apis-and-coding-experience
This basically completes all Pandas versions since 0.23 below 2.x.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
Dataiku v13.0.0 added support for Pandas 2.x:
However the PyArrow backend is still not supported nor is Polars. So the benefits to the product are limited.
-
Would love to see Polars and PyArrow backend supported