-
How to save a keras model from a python recipe in a folder ?
I would like to save a keras model in a folder. I can not figure out how to save the weights of my models because I do not find the correct filepath. The needed code to achieve this goal is : model.save_weights(filepath) Even with this syntax : path = str(trained_LSTM_info['accessInfo']['root'])…
-
How to read file with Python from HDFS managed folder
Hello Could you give example how to read csv file with Python\panda from HDFS managed folder? Thanks Milko
-
Spark can’t read my HDFS datasets
Hello, Spark won't see hfds:/// and just looks for file:/// when i'm trying to process a HDFS managed dataset. I followed the How-To link on: https://www.dataiku.com/learn/guide/spark/tips-and-troubleshooting.html However couldn't figure out what to edit. Here is my env-spark.sh in DATA_DIR/bin/ ``` export…
-
Spark IllegalArgumentException using partitioning
I am using Dataiku to create partitions on an HDFS dataset, as the result of a Spark recipe. I noticed if the dataframe in the preceding recipe contains the column with which you are trying to partition, Spark throws an illegal argument exception (here the column in the dataframe, and the one I am partitioning is called…
-
Error Running HDFS Command in Python Recipe
I have some code where I need to run an HDFS command in Python to check if a file is present. See below for an example: import subproces command = 'hdfs dfs -ls /sandbox' ssh = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE).communicate() print(ssh) When I run this in a Jupyter notebook in Dataiku, the…
-
DSS unable to connect to Hiveserver2 (MapR)
Hi, I'm unable to get started with establishing connectivity between DSS and Hiveserver2. HDFS integration works, and I have added the jars required for Hive client in a folder owned by dss user, and have added that folder path in Classpath Specifying auth=maprsasl;saslQop=auth-conf in extra URL but the connection isn't…
-
Available engines
Hi, what defines the list of available engines for data processing recipes such as prepare? I have a HDFS dataset created by Impala, then a prepare or sync to another HDFS dataset, but only Spark/MR (or LocalStream) is available. Why the DSS is not allowing to use SQL based engines? The source dataset has a hive synced…
-
Can changes in HDFS datasets be automatically tracked?
Hi, I am using HDFS datasets in my workflow which are updating on a daily basis and I would like to find out if these daily changes can be tracked by DSS and saved in a separate "delta" file through a scenario or some other automation capability. Thanks!
-
[CDH Cluster] Unable to start HiveServer2 Connection
Hi, I am evaluating DSS, so I installed it in my server and added a 2 weeks enterprise trial license. I am facing a Cloudera CDH 5.12 cluster, kerberized. I am able to connect and browse HDFS, but Hive connection is not working. This is the error in log file backlog.log [2018/05/30-07:47:19.991] [qtp1440621772-171] [INFO]…
-
validation failed: Cannot insert into target table because number/types are different.
Hi, I get this message from a hive recipe on a partitioned dataset stored on HDFS: validation failed: Cannot insert into target table because number/types are different "2018-02": Table inclause-0 has 27 columns, but query has 28 columns. my query is: SELECT * FROM MyTable