Join us on July 16th as we explore real-world Reinforcement Learning Learn more

Change column names with Plugin

Level 3
Change column names with Plugin

I have my final dataset and but I want to change the column names based on a naming convention at the end of the code.  Right now I'm using 

output_dataset.write_schema(columns, dropAndCreate=True)

There are no errors in the code, but when I go to explore the data, 'the relation does not exist'. 

After Settings > 'Test Connection' and 'Create Table Now'  works fine... the table has 0 records with the correct column names.

Is there a better function to do this?

0 Kudos
3 Replies
Dataiker
Dataiker

Hi,

Are you using the write_dataframe method after write_schema? https://doc.dataiku.com/dss/latest/python-api/datasets.html#dataiku.Dataset.write_dataframe

Hope it helps,

Alex

0 Kudos
Level 3
Author

The query has the potential of returning hundreds of millions of rows.  Knowing this ,I was avoiding loading into Dataframes. (requires loading data through DSS memory right?)  

0 Kudos
Dataiker
Dataiker

Hi,

If you do not want to go through pandas.DataFrame() then you can use another part of the Dataiku API:

- SQLExecutor2: https://doc.dataiku.com/dss/latest/python-api/sql.html#dataiku.core.sql.SQLExecutor2.exec_recipe_fra...

- HiveExecutor2: https://doc.dataiku.com/dss/latest/python-api/sql.html#dataiku.core.sql.HiveExecutor.exec_recipe_fra...

- ImpalaExecutor2: https://doc.dataiku.com/dss/latest/python-api/sql.html#dataiku.core.sql.ImpalaExecutor.exec_recipe_f...

This is assuming your input and output datasets are in SQL or HDFS connections. Otherwise, you will be required to go through the "regular" dataiku API using pandas.DataFrame. Note that you can do chunked reading and writing: https://doc.dataiku.com/dss/latest/python-api/datasets.html#chunked-reading-and-writing-with-pandas

Hope it helps,

Alex

0 Kudos