Link to a git repo in plugin requirements.txt
Hello,
I'm used to load code from a private git repo into project librairies but now I need to get this code within a plugin.
Is it possible to add the repo link in the requirements.txt ?
I tried to write
git+ssh://git@github.com:chiktika/dss.git#egg=oncrawl_api
but it raises an error
[2020/11/09-17:47:13.310] [FT--5jW0vzAe-2121] [INFO] [dip.code-envs.package-systems] - [ct: 1] Installing from py requirements : git+https://github.com/chiktika/dss.git#egg=my_lib [2020/11/09-17:47:13.310] [FT--5jW0vzAe-2121] [INFO] [dip.code-envs.package-systems] - [ct: 1] Completed requirements : git+https://github.com/chiktika/dss.git#egg=my_lib pandas==0.23.4 python-dateutil==2.8.0 pytz==2019.2 requests==2.22.0 [2020/11/09-17:47:13.395] [qtp506775047-1666] [DEBUG] [dku.tracing] - [ct: 1] Start call: /api/futures/get-update [GET] user=elodie [futureId=5jW0vzAe] [2020/11/09-17:47:13.395] [qtp506775047-1666] [DEBUG] [dku.tracing] - [ct: 1] Done call: /api/futures/get-update [GET] time=1ms user=elodie [futureId=5jW0vzAe] [2020/11/09-17:47:13.990] [qtp506775047-2118] [DEBUG] [dku.tracing] - [ct: 1] Start call: /api/futures/get-update [GET] user=elodie [futureId=5jW0vzAe] [2020/11/09-17:47:13.992] [qtp506775047-2118] [DEBUG] [dku.tracing] - [ct: 3] Done call: /api/futures/get-update [GET] time=3ms user=elodie [futureId=5jW0vzAe] [2020/11/09-17:47:14.065] [null-err-2126] [INFO] [dku.utils] - ERROR: Command errored out with exit status 128: git clone -q https://github.com/chiktika/dss.git /tmp/pip-install-xbi2sq3w/my-lib Check the logs for full command output. [2020/11/09-17:47:14.065] [null-out-2124] [INFO] [dku.utils] - Collecting my_lib [2020/11/09-17:47:14.066] [null-out-2124] [INFO] [dku.utils] - Cloning https://github.com/chiktika/dss.git to /tmp/pip-install-xbi2sq3w/my-lib [2020/11/09-17:47:14.066] [Thread-1082] [INFO] [dku.utils] - Done waiting for return value, got 1 [2020/11/09-17:47:14.066] [FT--5jW0vzAe-2121] [ERROR] [dku.code.envs] - Env update failed com.dataiku.dip.exceptions.ProcessDiedException: /home/dataiku/dss_data/code-envs/python/plugin_oncrawl-projects_managed/bin/python failed (exit code: 1) at com.dataiku.dip.exceptions.ProcessDiedException.getExceptionOnProcessDeath(ProcessDiedException.java:59) at com.dataiku.dip.utils.DKUtils$SimpleExceptionExecCompletionHandler.handle(DKUtils.java:1063) at com.dataiku.dip.utils.DKUtils$ExecBuilder.exec(DKUtils.java:918) at com.dataiku.dip.utils.DKUtils.execAndLogThrowsMirror(DKUtils.java:1244) at com.dataiku.dip.code.CodeEnvPackageSystems$PipPackageSystemMeta.install(CodeEnvPackageSystems.java:132) at com.dataiku.dip.code.DesignNodeCodeEnvsService.updateEnvAccordingToSpec(DesignNodeCodeEnvsService.java:1114) at com.dataiku.dip.code.DesignNodeCodeEnvsService.access$200(DesignNodeCodeEnvsService.java:91) at com.dataiku.dip.code.DesignNodeCodeEnvsService$21.compute(DesignNodeCodeEnvsService.java:1052) at com.dataiku.dip.code.DesignNodeCodeEnvsService$21.compute(DesignNodeCodeEnvsService.java:1041) at com.dataiku.dip.futures.SimpleFutureThread.execute(SimpleFutureThread.java:36) at com.dataiku.dip.futures.FutureThreadBase.run(FutureThreadBase.java:88)
Do someone can help me please?
Many thanks.
Best Answer
-
Hi @tim-wright
, @Ignacio_Toledo
,Many thanks for your help, finally I did it with your advices!!!
First, replacing :<user> with /<user> was the first step.
git+ssh://git@github.com/chiktika/dss.git#egg=oncrawl_api
Then, I had to create setup.py, readme.md and not to forget empty __init__.py in each folders.
And it works
With all my thanks!
C.
Answers
-
tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner
@Chiktika
are you sure you tried the following in your pip install (git+ssh)?git+ssh://git@github.com:chiktika/dss.git#egg=oncrawl_api
I only ask because I see the following in your error trace (git+https):
git+https://github.com/chiktika/dss.git#egg=my_lib
I had an earlier question about git over https and was told that the preferred way was to use SSH (as you have above - https://community.dataiku.com/t5/Setup-Configuration/Git-over-HTTPS/m-p/9961).
Have you successfully managed to connect to your repo from DSS using SSH before? If not see here: https://doc.dataiku.com/dss/latest/collaboration/git.html#setup
I have not tried doing what you are asking about, but in a quick google search some others noted you should replace the ":" with "/" in the requirements.txt (https://stackoverflow.com/questions/4830856/is-it-possible-to-use-pip-to-install-a-package-from-a-private-github-repository) so possibly try:
## replace :<user> with /<user> ## git+ssh://git@github.com/chiktika/dss.git#egg=oncrawl_api
I wish I could say I am confident this will solve your issue, but I am not. If you make sure you can connect to your repo over SSH and then add the last command to your requirements.txt and still have an issue, I'm happy to try to help you sort through it.
@Chiktika
if you do figure it out, I'd love to learn how you do it/ did it because that seems like a really useful thing to do. I encourage you to post on the community so others can leverage your hard work! -
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 412 Neuron
Hi @Chiktika
. Apparently you are doing the right thing, except for adding a `-e` at the beginning, so the line in the requirements.txt should look like-e git+ssh://git@github.com:chiktika/dss.git#egg=oncrawl_api
However, there might be another reason for the failure: your repo is private apparently. Does you DSS machine has the credentials to clone the content from the repo?
I hope this helps!
I.
-
Thanks for you answer.
Yeah, I tried adding a `-e` but have still the same error.
[2020/11/10-09:34:00.859] [null-out-332] [INFO] [dku.utils] - Obtaining oncrawl_api from git+ssh://****@github.com:chiktika/dss.git#egg=oncrawl_api (from -r /home/dataiku/dss_data/tmp/pip-requirements-install/req13902208873652758081.txt (line 1)) [2020/11/10-09:34:00.860] [null-out-332] [INFO] [dku.utils] - Cloning ssh://****@github.com:chiktika/dss.git to /home/dataiku/dss_data/code-envs/python/plugin_oncrawl-projects_managed/src/oncrawl-api [2020/11/10-09:34:00.860] [null-err-334] [INFO] [dku.utils] - ERROR: Command errored out with exit status 128: git clone -q 'ssh://****@github.com:chiktika/dss.git' /home/dataiku/dss_data/code-envs/python/plugin_oncrawl-projects_managed/src/oncrawl-api Check the logs for full command output. [2020/11/10-09:34:00.861] [FT--j21MXpP5-329] [ERROR] [dku.code.envs] - Env update failed com.dataiku.dip.exceptions.ProcessDiedException: /home/dataiku/dss_data/code-envs/python/plugin_oncrawl-projects_managed/bin/python failed (exit code: 1) at com.dataiku.dip.exceptions.ProcessDiedException.getExceptionOnProcessDeath(ProcessDiedException.java:59) at com.dataiku.dip.utils.DKUtils$SimpleExceptionExecCompletionHandler.handle(DKUtils.java:1063) at com.dataiku.dip.utils.DKUtils$ExecBuilder.exec(DKUtils.java:918) at com.dataiku.dip.utils.DKUtils.execAndLogThrowsMirror(DKUtils.java:1244) at com.dataiku.dip.code.CodeEnvPackageSystems$PipPackageSystemMeta.install(CodeEnvPackageSystems.java:132) at com.dataiku.dip.code.DesignNodeCodeEnvsService.updateEnvAccordingToSpec(DesignNodeCodeEnvsService.java:1114) at com.dataiku.dip.code.DesignNodeCodeEnvsService.access$200(DesignNodeCodeEnvsService.java:91) at com.dataiku.dip.code.DesignNodeCodeEnvsService$21.compute(DesignNodeCodeEnvsService.java:1052) at com.dataiku.dip.code.DesignNodeCodeEnvsService$21.compute(DesignNodeCodeEnvsService.java:1041) at com.dataiku.dip.futures.SimpleFutureThread.execute(SimpleFutureThread.java:36) at com.dataiku.dip.futures.FutureThreadBase.run(FutureThreadBase.java:88)
DSS machine can access this private repo, its SSH key is registered within the repo, and I'm able to connect with it in projects libraries
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 412 Neuron
I'm not expert in this particular area, but this part of the error message:
ERROR: Command errored out with exit status 128: git clone -q 'ssh://****@github.com:chiktika/dss.git' /home/dataiku/dss_data/code-envs/python/plugin_oncrawl-projects_managed/src/oncrawl-api Check the logs for full command output.
looks like it has to do with the permission's problem. To double check, you could make your repository public for a while, and then change the line to:
-e git+https://github.com/chiktika/dss.git#egg=oncrawl_api
If the message continues, then the problem is elsewhere (for example, in a test I made, I didn't have my repo configured with a setup.py file, and the command failed.
Cheers!
I.
-
tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner
@Chiktika
Have you tried replacing the ":" with a "/"? I have not done what you are trying to do before, but did see this link: https://stackoverflow.com/questions/4830856/is-it-possible-to-use-pip-to-install-a-package-from-a-private-github-repositoryWhich seems to state that the ":" could be problematic.
In your original post error trace it shows you did do that, but appeared to have used the git+https. Have you tried git+ssh with replacing the ":"? With and without -e flag?
I'm not confident that this will do it, but wanted to help with a possible debugging step if possible.
-
Hi all,
Many thanks for your time.
I confirm that I do not have my repo configured with a setup.py file
Let me some time do look for how to do this and I will let you know
I created a public repo to try: https://github.com/chiktika/test/Just to let you know that I'm not giving up, I will be OOO until next monday and hopefully will come back with a solution.
Many thanks again.
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 412 Neuron
Cool! Thanks for the update, and happy to know it worked!
-
tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner
@Chiktika
Thats awesome. Would you mind marking your last response (explaining how you managed to solve the issue) as the "accepted answer" so that others will have an easier time finding the answer without having to read through my and @Ignacio_Toledo
's long winded responses