How to lower all column name in the analysis module ?

UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
Hi guys,

Is there a possibility to lower all the column name in the analysis module ? I have 30 columns in my file , i synced it into a postgresql table and after that, i would lower all the column name without renaming these manually or with a python recipe. I wondered if there is a special feature or a python snippet into the analysis module to make it.

Thanks.

Answers

  • jrouquie
    jrouquie Dataiker Alumni Posts: 87 ✭✭✭✭✭✭✭

    I am not aware of a possibility to do so directly in the GUI, but if too tedious to do it by hand, you might want to edit



    DATA_DIR/config/projects/PROJECTFOO/datasets/datasetbar.json



    through a script. Here, datasetbar is the input dataset to the sync recipe. Then

    • open this sync recipe, click “resync schema” so that the schemas of both DSS datasets are lowercase.
    • rerun this recipe, so that at the end DSS writes the new schema to postgres.







    If your dataset fits in memory, I would replace the sync recipe by a Python recipe that change the column names and then copy its input to its output. This way there is no unnecessary duplication of data.

  • Thomas
    Thomas Dataiker Alumni Posts: 19 ✭✭✭✭✭
    edited July 17

    Hi,

    Here is a solution that might work in Visual Preparation scripts (under Analysis):

    • Create a Custom python script that outputs a JSON object (Python dict) storing the lowered column names:

    import json

    def process(row):
    columns = row.keys()
    o = {}
    for column in columns:
    new_name = str(column).lower()
    o[new_name] = row[column]
    return json.dumps(o)
    • via the Columns view, delete all the columns except the newly created JSON object
    • Flatten the JSON object
    • Get rid again of the unwanted columns, then, again under the Columns view, mass rename the columns by removing the prefix generated by flattening the JSON object

    That's it. You end up with a view of your dataset where all the columns are lowered.

    NOTE: this is not the best solution since you still need to build your dataset at the end, so this may create a not-so-necessary copy of your base dataset.

  • jereze
    jereze Alpha Tester, Dataiker Alumni Posts: 190 ✭✭✭✭✭✭✭✭
    edited July 17

    If you want to lower the column name because of PostegreSQL, the solution might be to double quote the column name in your SQL code:


    select "MyColomn1", mycolumn2 from "MyTable";

    More info here.

  • tassosv
    tassosv Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 1 ✭✭✭

    Hi,

    If you would like to do it using a recipe and not code scripts. You can change:

    1) from table view to column view

    2) Select all columns or those you would like to rename

    3) Select actions and from their menu, rename

    4) From rename menu choose convert to lowercase

Setup Info
    Tags
      Help me…