LDA Mallet model in Dataiku (Python)
Hi,
I have a Python script where I use this: https://radimrehurek.com/gensim/models/wrappers/ldamallet.html module for Topic Modeling. I would like to integrate my Python script into my flow in Dataiku, but I can't manage to find the right path to give as an argument to the gensim.models.wrappers.LdaMallet function. I have put all the files downloaded from the LDA Mallet module into a folder called "LDA_model" in my Dataiku project and I try to access the file via the path 'LDA_model/mallet_folder/bin/mallet'.
When I pass this path to the gensim.models.wrappers.LdaMallet function I get the following error:
CalledProcessError: Command '/opt/dataiku/data/managed_folders/TWEETS/6WUpy7CI/mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input /tmp/d941ed_corpus.txt --output /tmp/d941ed_corpus.mallet' returned non-zero exit status 127.
Does anyone know how to fix this problem?
EDIT: I have imported the necesary files and the paths exist. I am assuming that my problem is linked to setting up env variables.
Best Answer
-
Hi,
In that case, we will need a job diagnosis to understand the root cause of the error.
Could you please follow the steps described on this page and then send us the resulting ZIP file?
Thanks,
Alex
Answers
-
Hi,
To access a file stored in a Dataiku managed folder, you need to use the Dataiku API. You can read more on this documentation.
Assuming your folder is on the local filesystem, you can get the folder path using the Folder.get_path method.
Hope it helps,
Alex
-
Hi Alex,
Thank you for your message! Yes, this is what I have done to acces the path of the folder where my model is stored. However, it is when I enter ths path as an argument to my function that I get the CalledProcessError message.
I have tested this model in a regular jupyter notebook and it works using a os path to access the model saved on my computer.
Oda