Google Sheets Plugin import bug

Options
Eldiias
Eldiias Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭

Hi there!

I encountered a bug (or potentially it is a hardcoded limitation) while using Google Sheets plugin. I have the following flow:

  1. Dataset is transformed and stored into GSheets
  2. Another dataset is transformed, the dataset from Gsheets is loaded and appended to it. New dataset is stored as a Dataiku dataset.

Sounds simple. But it didn't work. After careful review of the flow, I found out the following:

  1. The first dataset has long column names. There is no limitation on the column name length, so not a problem.
  2. Dataset in GSheets is OK, the column names are correct.
  3. While importing a dataset from GSheets, the column names are cropped. So, instead of total_transport_cost_per_ton it is total_transport_cost_per.

I believe the issue is related with a plugin, though I didn't find any similar comment yet.

I could definitely just update the column names (since I have the same ones in the second dataset), but I would love to retrieve column names correctly.

Tagged:

Answers

  • HarizoR
    HarizoR Dataiker, Alpha Tester, Registered Posts: 138 Dataiker
    Options

    Hi Eldiias,

    After reviewing the plugin's code it appears that your issue is indeed due to a hardcoded limitation of 25 characters for the slugified column names (see here: https://github.com/dataiku/dataiku-contrib/blob/master/googlesheets/python-connectors/googlesheets-sheet/connector.py). In your case, the easiest workaround is to switch the plugin to development mode and edit that value manually.

    Hope this helps!

    Best,

    Harizo

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @HarizoR

    Is there any likelihood that this is going to be updated to have longer column names. I have also run into this limitations.

    I work at a small non-profit with limited software development skills. We would prefer not to make a Dataiku plugin into a local version that we will have to continue to support. What is the likelihood that this will be enhanced to take much longer column names. Say at least as long as say PostgreSQL server will take? Which is 59 characters.

  • Eldiias
    Eldiias Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭
    Options

    Hey Tom!

    I would propose you to use a different flow.

    1. Store the credentials of your google user account in a Dataiku folder.
    2. Import/export data to GSheets using Python recipe. That increases the robustness and if you need to update the credentials, you will need to update it only once in the folder instead of updating every single GSheet dataset.
    3. In python you can use gspread library. It has quite simple code allowing pretty smooth workflow.

    Best!

    Eldiias

Setup Info
    Tags
      Help me…