Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Problem when trying to create a new dataset in project with a Hive table (from Python API)

esteban23
Level 2
Problem when trying to create a new dataset in project with a Hive table (from Python API)

Hi! I'd like to import a Hive table to a project (from a notebook) using the Dataiku's Python API. The idea is to replicate the process done through the UI (which is successfull, as you may see in the picture below):

hiveok.PNG

After doing it through the UI, then this table appears as a dataset in the 'Dataset' page of the project (this is what I need)

However, when I try to do the same process on a notebook I get an error. I have tried two approaches:

1) First approach:

 

import dataiku
client = dataiku.api_client()
project = client.get_project('MYPROJECT')
import_definition = project.init_tables_import()
import_definition.add_hive_table("referenciales", "sbl_tipo_identificacion")

prepared_import = import_definition.prepare()
future = prepared_import.execute()

import_result = future.wait_for_result()

 

Gives the following error:

 

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-52-432f67d71601> in <module>
      2 import_definition.add_hive_table("referenciales", "sbl_tipo_identificacion")
      3 
----> 4 prepared_import = import_definition.prepare()
      5 future = prepared_import.execute()
      6 

~/Dataiku_install/dataiku-dss-8.0.4/python/dataikuapi/dss/project.py in prepare(self)
   1327 
   1328         future = self.client.get_future(ret["jobId"])
-> 1329         future.wait_for_result()
   1330         return TablesPreparedImport(self.client, self.project_key, future.get_result())
   1331 

~/Dataiku_install/dataiku-dss-8.0.4/python/dataikuapi/dss/future.py in wait_for_result(self)
     73         Wait and get the future result
     74         """
---> 75         if self.state.get('hasResult', False):
     76             return self.result_wrapper(self.state.get('result', None))
     77         if self.state is None or not self.state.get('hasResult', False) or self.state_is_peek:

AttributeError: 'NoneType' object has no attribute 'get'

 

 

2) Second approach:

 

from dataiku.core.sql import SQLExecutor2

#Building dataset where the result of the query will be stored.
builder = project.new_managed_dataset_creation_helper("temp_dataset")
builder.with_store_into("hdfs_connection", format_option_id="PARQUET_HIVE")
dataset = builder.create()

executor = SQLExecutor2(connection="referenciales")
executor.exec_recipe_fragment(temp_dataset, "select * from sbl_tipo_identificacion", overwrite_output_schema=True)

 

When trying this, the following error is printed:

 

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-33-13217d6dde60> in <module>
      2 
      3 #output_dataset = dataiku.Dataset("temp_outputDataset2")
----> 4 SQLExecutor2.exec_recipe_fragment(output_dataset, streamed_query)

~/Dataiku_install/dataiku-dss-8.0.4/python/dataiku/core/sql.py in exec_recipe_fragment(output_dataset, query, pre_queries, post_queries, overwrite_output_schema, drop_partitioned_on_schema_mismatch)
    181             data={
    182                 "outputDataset": output_dataset.full_name,
--> 183                 "activityId" : spec["currentActivityId"],
    184                 "query" : query,
    185                 "preQueries" : json.dumps(pre_queries),

TypeError: 'NoneType' object is not subscriptable

 

Anyone knows why do these errors occur? or maybe some other methods to try? Thanks!

0 Kudos
2 Replies
fchataigner2
Dataiker
Dataiker

Hi,

for the first issue, you should update to 9.0.4 to get the fix for this bug. The second issue comes from using exec_recipe_fragment, which can only be used in recipe (as the name implies). Also, even if it had worked, you code would have extracted the table from hive and reloaded it into another dataset, effectively duplicating the data; this is probably not what you're looking for

esteban23
Level 2
Author

Hi! Got it. Thanks.

However, it's very unlikely I get to use version 9 given the fact that I'm not the admin, thus can't run the update. Do you know any other way I can solve the mentioned problem?

0 Kudos
A banner prompting to get Dataiku DSS