remove the depenency between DSS pandas and python recipe pandas

0 Kudos

Dataiku demands pandas>=1.1,<1.2.
Bokeh 3.* demands pandas>=1.2
geopandas 0.14.* demands pandas>=1.4

So, they can't be used within DataIku. I think there are other examples.

I imagine there are several solutions, one which seem simple is to create a fork of pandas, e.g. dsspandas, an exact copy of pandas 1.1 and replace all import pandas as pd by import dsspandas as pd (and all import ... from pandas also)


4 Comments

What you are asking is technically unfeasible and probably the wrong thing to ask because it is not really a requirement but how you think it should be achieved. Reading your idea it appears that you want to use Bokeh 3.* and geopandas 0.14.* because we presume you want some functionality provided on those versions. That is your requirement.

First of all let's clarify that Dataiku does not demand pandas>=1.1,<1.2. If you have Python 3.7/3.8/3.9/3.10/3.11 code environments you can move to pandas>=1.3,<1.4 which results gives you pandas 1.3.5 which is the latest 1.3.x pandas release. So while pandas 1.4 is not supported there are other pandas newer versions available. Bokeh <3.1.1 is supported which means you can get up to Bokeh 3.1.0 which needs Python >=3.8 and pandas >=1.2. So clearly the issue is with geopandas 0.14.* which as you said needs pandas >=1.4.0 but also Python >=3.9.

Having said that geopandas 0.13.2, which is only a few months older than 0.14.*, will allow pandas >=1.1.0 so if you can live with geopandas 0.13.2 then happy days. Below is a Python 3.9 code environment I was able to create using Bokeh 3.1.0, geopandas 0.13.2 and pandas 1.3.5.

If that is not good enough and you really need geopandas 0.14.* then what you really need to ask Dataiku for is to support a higher version of pandas. There is already a Product Idea to support Pandas 2.0 so I suggest you vote on that idea by clicking in the up arrow. The more votes the idea gets the more chances Dataiku will implement it in a future release.

Screenshot 2024-01-20 at 13.02.11.png

 

What you are asking is technically unfeasible and probably the wrong thing to ask because it is not really a requirement but how you think it should be achieved. Reading your idea it appears that you want to use Bokeh 3.* and geopandas 0.14.* because we presume you want some functionality provided on those versions. That is your requirement.

First of all let's clarify that Dataiku does not demand pandas>=1.1,<1.2. If you have Python 3.7/3.8/3.9/3.10/3.11 code environments you can move to pandas>=1.3,<1.4 which results gives you pandas 1.3.5 which is the latest 1.3.x pandas release. So while pandas 1.4 is not supported there are other pandas newer versions available. Bokeh <3.1.1 is supported which means you can get up to Bokeh 3.1.0 which needs Python >=3.8 and pandas >=1.2. So clearly the issue is with geopandas 0.14.* which as you said needs pandas >=1.4.0 but also Python >=3.9.

Having said that geopandas 0.13.2, which is only a few months older than 0.14.*, will allow pandas >=1.1.0 so if you can live with geopandas 0.13.2 then happy days. Below is a Python 3.9 code environment I was able to create using Bokeh 3.1.0, geopandas 0.13.2 and pandas 1.3.5.

If that is not good enough and you really need geopandas 0.14.* then what you really need to ask Dataiku for is to support a higher version of pandas. There is already a Product Idea to support Pandas 2.0 so I suggest you vote on that idea by clicking in the up arrow. The more votes the idea gets the more chances Dataiku will implement it in a future release.

Screenshot 2024-01-20 at 13.02.11.png

 

Pierre_Ceteaud
Level 1

What I understand from your answer is that the pandas version limit is not intrinsic to dataiku but is specific to our installation.
The fact remains that this requires all users of a dataiku installation to use the same version of pandas, which can be annoying in a professional environment.

What I understand from your answer is that the pandas version limit is not intrinsic to dataiku but is specific to our installation.
The fact remains that this requires all users of a dataiku installation to use the same version of pandas, which can be annoying in a professional environment.

Hi, not really. What I meant to say is that Dataiku does give you some flexibility in terms of the pandas versions that you can use inside Dataiku code environments. In the latest Dataiku versions you can use from pandas 0.23 till pandas 1.3.x, including all the middle releases (1.0.x, 1.1.x and 1.2.x). In order to have those options available you will need to have at least Python 3.6 and Python 3.7 installed in your system. This is a fairly simple task to do for any competent System Administrator.

Additionally these Python versions are not a Dataiku requirement but a requirement of being able to manage virtual code environments using Python and moving to the relevant pandas versions. As an example when and if Dataiku adds support for pandas 2.0.x you will need to have Python 3.8 installed as that's what that version of pandas requires (see requires on this page). And when they support pandas 2.1.x you will need to have Python 3.9 installed as that's what that version of pandas requires (see requires on this page).

Finally the range of pandas versions you get is intrinsic to Dataiku because Dataiku uses pandas DataFrames to handle all its Python interactions. Therefore it has to code it's interfaces to specific versions of pandas to be able to work correctly. This is how software development works, you make things compatible with contemporary software. It is not possible to have foward compatibility.

Hi, not really. What I meant to say is that Dataiku does give you some flexibility in terms of the pandas versions that you can use inside Dataiku code environments. In the latest Dataiku versions you can use from pandas 0.23 till pandas 1.3.x, including all the middle releases (1.0.x, 1.1.x and 1.2.x). In order to have those options available you will need to have at least Python 3.6 and Python 3.7 installed in your system. This is a fairly simple task to do for any competent System Administrator.

Additionally these Python versions are not a Dataiku requirement but a requirement of being able to manage virtual code environments using Python and moving to the relevant pandas versions. As an example when and if Dataiku adds support for pandas 2.0.x you will need to have Python 3.8 installed as that's what that version of pandas requires (see requires on this page). And when they support pandas 2.1.x you will need to have Python 3.9 installed as that's what that version of pandas requires (see requires on this page).

Finally the range of pandas versions you get is intrinsic to Dataiku because Dataiku uses pandas DataFrames to handle all its Python interactions. Therefore it has to code it's interfaces to specific versions of pandas to be able to work correctly. This is how software development works, you make things compatible with contemporary software. It is not possible to have foward compatibility.

Seems to be your lucky day. Dataiku v12.5 added support for pandas 1.4/1.5:

https://doc.dataiku.com/dss/latest/release_notes/12.html#apis-and-coding-experience

This basically completes all Pandas versions since 0.23 below 2.x. You now need to convince your sysadmins to do the upgrade. ๐Ÿ˜‰

Seems to be your lucky day. Dataiku v12.5 added support for pandas 1.4/1.5:

https://doc.dataiku.com/dss/latest/release_notes/12.html#apis-and-coding-experience

This basically completes all Pandas versions since 0.23 below 2.x. You now need to convince your sysadmins to do the upgrade. ๐Ÿ˜‰