Problem when trying to create a new dataset in project with a Hive table (from Python API)

Options
esteban23
esteban23 Registered Posts: 4 ✭✭✭✭
edited July 16 in Using Dataiku

Hi! I'd like to import a Hive table to a project (from a notebook) using the Dataiku's Python API. The idea is to replicate the process done through the UI (which is successfull, as you may see in the picture below):

hiveok.PNG

After doing it through the UI, then this table appears as a dataset in the 'Dataset' page of the project (this is what I need)

However, when I try to do the same process on a notebook I get an error. I have tried two approaches:

1) First approach:

import dataiku
client = dataiku.api_client()
project = client.get_project('MYPROJECT')
import_definition = project.init_tables_import()
import_definition.add_hive_table("referenciales", "sbl_tipo_identificacion")

prepared_import = import_definition.prepare()
future = prepared_import.execute()

import_result = future.wait_for_result()

Gives the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-52-432f67d71601> in <module>
      2 import_definition.add_hive_table("referenciales", "sbl_tipo_identificacion")
      3 
----> 4 prepared_import = import_definition.prepare()
      5 future = prepared_import.execute()
      6 

~/Dataiku_install/dataiku-dss-8.0.4/python/dataikuapi/dss/project.py in prepare(self)
   1327 
   1328         future = self.client.get_future(ret["jobId"])
-> 1329         future.wait_for_result()
   1330         return TablesPreparedImport(self.client, self.project_key, future.get_result())
   1331 

~/Dataiku_install/dataiku-dss-8.0.4/python/dataikuapi/dss/future.py in wait_for_result(self)
     73         Wait and get the future result
     74         """
---> 75         if self.state.get('hasResult', False):
     76             return self.result_wrapper(self.state.get('result', None))
     77         if self.state is None or not self.state.get('hasResult', False) or self.state_is_peek:

AttributeError: 'NoneType' object has no attribute 'get'

2) Second approach:

from dataiku.core.sql import SQLExecutor2

#Building dataset where the result of the query will be stored.
builder = project.new_managed_dataset_creation_helper("temp_dataset")
builder.with_store_into("hdfs_connection", format_option_id="PARQUET_HIVE")
dataset = builder.create()

executor = SQLExecutor2(connection="referenciales")
executor.exec_recipe_fragment(temp_dataset, "select * from sbl_tipo_identificacion", overwrite_output_schema=True)

When trying this, the following error is printed:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-33-13217d6dde60> in <module>
      2 
      3 #output_dataset = dataiku.Dataset("temp_outputDataset2")
----> 4 SQLExecutor2.exec_recipe_fragment(output_dataset, streamed_query)

~/Dataiku_install/dataiku-dss-8.0.4/python/dataiku/core/sql.py in exec_recipe_fragment(output_dataset, query, pre_queries, post_queries, overwrite_output_schema, drop_partitioned_on_schema_mismatch)
    181             data={
    182                 "outputDataset": output_dataset.full_name,
--> 183                 "activityId" : spec["currentActivityId"],
    184                 "query" : query,
    185                 "preQueries" : json.dumps(pre_queries),

TypeError: 'NoneType' object is not subscriptable

Anyone knows why do these errors occur? or maybe some other methods to try? Thanks!

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi,

    for the first issue, you should update to 9.0.4 to get the fix for this bug. The second issue comes from using exec_recipe_fragment, which can only be used in recipe (as the name implies). Also, even if it had worked, you code would have extracted the table from hive and reloaded it into another dataset, effectively duplicating the data; this is probably not what you're looking for

  • esteban23
    esteban23 Registered Posts: 4 ✭✭✭✭
    Options

    Hi! Got it. Thanks.

    However, it's very unlikely I get to use version 9 given the fact that I'm not the admin, thus can't run the update. Do you know any other way I can solve the mentioned problem?

Setup Info
    Tags
      Help me…