Link to a git repo in plugin requirements.txt

Chiktika
Chiktika Registered Posts: 24 ✭✭✭✭

Hello,

I'm used to load code from a private git repo into project librairies but now I need to get this code within a plugin.

Is it possible to add the repo link in the requirements.txt ?

I tried to write

git+ssh://git@github.com:chiktika/dss.git#egg=oncrawl_api

but it raises an error

[2020/11/09-17:47:13.310] [FT--5jW0vzAe-2121] [INFO] [dip.code-envs.package-systems]  - [ct: 1] Installing from py requirements :
git+https://github.com/chiktika/dss.git#egg=my_lib
[2020/11/09-17:47:13.310] [FT--5jW0vzAe-2121] [INFO] [dip.code-envs.package-systems]  - [ct: 1] Completed requirements :
git+https://github.com/chiktika/dss.git#egg=my_lib
pandas==0.23.4
python-dateutil==2.8.0
pytz==2019.2
requests==2.22.0

[2020/11/09-17:47:13.395] [qtp506775047-1666] [DEBUG] [dku.tracing]  - [ct: 1] Start call: /api/futures/get-update [GET] user=elodie [futureId=5jW0vzAe]
[2020/11/09-17:47:13.395] [qtp506775047-1666] [DEBUG] [dku.tracing]  - [ct: 1] Done call: /api/futures/get-update [GET] time=1ms user=elodie [futureId=5jW0vzAe]
[2020/11/09-17:47:13.990] [qtp506775047-2118] [DEBUG] [dku.tracing]  - [ct: 1] Start call: /api/futures/get-update [GET] user=elodie [futureId=5jW0vzAe]
[2020/11/09-17:47:13.992] [qtp506775047-2118] [DEBUG] [dku.tracing]  - [ct: 3] Done call: /api/futures/get-update [GET] time=3ms user=elodie [futureId=5jW0vzAe]
[2020/11/09-17:47:14.065] [null-err-2126] [INFO] [dku.utils]  - ERROR: Command errored out with exit status 128: git clone -q https://github.com/chiktika/dss.git /tmp/pip-install-xbi2sq3w/my-lib Check the logs for full command output.
[2020/11/09-17:47:14.065] [null-out-2124] [INFO] [dku.utils]  - Collecting my_lib
[2020/11/09-17:47:14.066] [null-out-2124] [INFO] [dku.utils]  -   Cloning https://github.com/chiktika/dss.git to /tmp/pip-install-xbi2sq3w/my-lib
[2020/11/09-17:47:14.066] [Thread-1082] [INFO] [dku.utils]  - Done waiting for return value,  got 1
[2020/11/09-17:47:14.066] [FT--5jW0vzAe-2121] [ERROR] [dku.code.envs]  - Env update failed
com.dataiku.dip.exceptions.ProcessDiedException: /home/dataiku/dss_data/code-envs/python/plugin_oncrawl-projects_managed/bin/python failed (exit code: 1)
    at com.dataiku.dip.exceptions.ProcessDiedException.getExceptionOnProcessDeath(ProcessDiedException.java:59)
    at com.dataiku.dip.utils.DKUtils$SimpleExceptionExecCompletionHandler.handle(DKUtils.java:1063)
    at com.dataiku.dip.utils.DKUtils$ExecBuilder.exec(DKUtils.java:918)
    at com.dataiku.dip.utils.DKUtils.execAndLogThrowsMirror(DKUtils.java:1244)
    at com.dataiku.dip.code.CodeEnvPackageSystems$PipPackageSystemMeta.install(CodeEnvPackageSystems.java:132)
    at com.dataiku.dip.code.DesignNodeCodeEnvsService.updateEnvAccordingToSpec(DesignNodeCodeEnvsService.java:1114)
    at com.dataiku.dip.code.DesignNodeCodeEnvsService.access$200(DesignNodeCodeEnvsService.java:91)
    at com.dataiku.dip.code.DesignNodeCodeEnvsService$21.compute(DesignNodeCodeEnvsService.java:1052)
    at com.dataiku.dip.code.DesignNodeCodeEnvsService$21.compute(DesignNodeCodeEnvsService.java:1041)
    at com.dataiku.dip.futures.SimpleFutureThread.execute(SimpleFutureThread.java:36)
    at com.dataiku.dip.futures.FutureThreadBase.run(FutureThreadBase.java:88)

Do someone can help me please?

Many thanks.

Best Answer

  • Chiktika
    Chiktika Registered Posts: 24 ✭✭✭✭
    edited July 17 Answer ✓

    Hi @tim-wright
    , @Ignacio_Toledo
    ,

    Many thanks for your help, finally I did it with your advices!!!

    First, replacing :<user> with /<user> was the first step.

    git+ssh://git@github.com/chiktika/dss.git#egg=oncrawl_api

    Then, I had to create setup.py, readme.md and not to forget empty __init__.py in each folders.

    And it works

    With all my thanks!

    C.

Answers

  • tim-wright
    tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner
    edited July 17

    @Chiktika
    are you sure you tried the following in your pip install (git+ssh)?

    git+ssh://git@github.com:chiktika/dss.git#egg=oncrawl_api

    I only ask because I see the following in your error trace (git+https):

    git+https://github.com/chiktika/dss.git#egg=my_lib

    I had an earlier question about git over https and was told that the preferred way was to use SSH (as you have above - https://community.dataiku.com/t5/Setup-Configuration/Git-over-HTTPS/m-p/9961).

    Have you successfully managed to connect to your repo from DSS using SSH before? If not see here: https://doc.dataiku.com/dss/latest/collaboration/git.html#setup

    I have not tried doing what you are asking about, but in a quick google search some others noted you should replace the ":" with "/" in the requirements.txt (https://stackoverflow.com/questions/4830856/is-it-possible-to-use-pip-to-install-a-package-from-a-private-github-repository) so possibly try:

    ## replace :<user> with /<user> ##
    git+ssh://git@github.com/chiktika/dss.git#egg=oncrawl_api

    I wish I could say I am confident this will solve your issue, but I am not. If you make sure you can connect to your repo over SSH and then add the last command to your requirements.txt and still have an issue, I'm happy to try to help you sort through it.

    @Chiktika
    if you do figure it out, I'd love to learn how you do it/ did it because that seems like a really useful thing to do. I encourage you to post on the community so others can leverage your hard work!

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
    edited July 17

    Hi @Chiktika
    . Apparently you are doing the right thing, except for adding a `-e` at the beginning, so the line in the requirements.txt should look like

    -e git+ssh://git@github.com:chiktika/dss.git#egg=oncrawl_api

    However, there might be another reason for the failure: your repo is private apparently. Does you DSS machine has the credentials to clone the content from the repo?

    I hope this helps!

    I.

  • Chiktika
    Chiktika Registered Posts: 24 ✭✭✭✭
    edited July 17

    Hi @Ignacio_Toledo

    Thanks for you answer.

    Yeah, I tried adding a `-e` but have still the same error.

    Capture d’écran 2020-11-10 103605.png

    [2020/11/10-09:34:00.859] [null-out-332] [INFO] [dku.utils]  - Obtaining oncrawl_api from git+ssh://****@github.com:chiktika/dss.git#egg=oncrawl_api (from -r /home/dataiku/dss_data/tmp/pip-requirements-install/req13902208873652758081.txt (line 1))
    [2020/11/10-09:34:00.860] [null-out-332] [INFO] [dku.utils]  -   Cloning ssh://****@github.com:chiktika/dss.git to /home/dataiku/dss_data/code-envs/python/plugin_oncrawl-projects_managed/src/oncrawl-api
    [2020/11/10-09:34:00.860] [null-err-334] [INFO] [dku.utils]  - ERROR: Command errored out with exit status 128: git clone -q 'ssh://****@github.com:chiktika/dss.git' /home/dataiku/dss_data/code-envs/python/plugin_oncrawl-projects_managed/src/oncrawl-api Check the logs for full command output.
    [2020/11/10-09:34:00.861] [FT--j21MXpP5-329] [ERROR] [dku.code.envs]  - Env update failed
    com.dataiku.dip.exceptions.ProcessDiedException: /home/dataiku/dss_data/code-envs/python/plugin_oncrawl-projects_managed/bin/python failed (exit code: 1)
       at com.dataiku.dip.exceptions.ProcessDiedException.getExceptionOnProcessDeath(ProcessDiedException.java:59)
       at com.dataiku.dip.utils.DKUtils$SimpleExceptionExecCompletionHandler.handle(DKUtils.java:1063)
       at com.dataiku.dip.utils.DKUtils$ExecBuilder.exec(DKUtils.java:918)
       at com.dataiku.dip.utils.DKUtils.execAndLogThrowsMirror(DKUtils.java:1244)
       at com.dataiku.dip.code.CodeEnvPackageSystems$PipPackageSystemMeta.install(CodeEnvPackageSystems.java:132)
       at com.dataiku.dip.code.DesignNodeCodeEnvsService.updateEnvAccordingToSpec(DesignNodeCodeEnvsService.java:1114)
       at com.dataiku.dip.code.DesignNodeCodeEnvsService.access$200(DesignNodeCodeEnvsService.java:91)
       at com.dataiku.dip.code.DesignNodeCodeEnvsService$21.compute(DesignNodeCodeEnvsService.java:1052)
       at com.dataiku.dip.code.DesignNodeCodeEnvsService$21.compute(DesignNodeCodeEnvsService.java:1041)
       at com.dataiku.dip.futures.SimpleFutureThread.execute(SimpleFutureThread.java:36)
       at com.dataiku.dip.futures.FutureThreadBase.run(FutureThreadBase.java:88)

    DSS machine can access this private repo, its SSH key is registered within the repo, and I'm able to connect with it in projects libraries

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
    edited July 17

    I'm not expert in this particular area, but this part of the error message:

    ERROR: Command errored out with exit status 128: git clone -q 'ssh://****@github.com:chiktika/dss.git' /home/dataiku/dss_data/code-envs/python/plugin_oncrawl-projects_managed/src/oncrawl-api Check the logs for full command output.

    looks like it has to do with the permission's problem. To double check, you could make your repository public for a while, and then change the line to:

    -e git+https://github.com/chiktika/dss.git#egg=oncrawl_api

    If the message continues, then the problem is elsewhere (for example, in a test I made, I didn't have my repo configured with a setup.py file, and the command failed.

    Cheers!

    I.

  • tim-wright
    tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner

    @Chiktika
    Have you tried replacing the ":" with a "/"? I have not done what you are trying to do before, but did see this link: https://stackoverflow.com/questions/4830856/is-it-possible-to-use-pip-to-install-a-package-from-a-private-github-repository

    Which seems to state that the ":" could be problematic.

    In your original post error trace it shows you did do that, but appeared to have used the git+https. Have you tried git+ssh with replacing the ":"? With and without -e flag?

    I'm not confident that this will do it, but wanted to help with a possible debugging step if possible.

  • Chiktika
    Chiktika Registered Posts: 24 ✭✭✭✭

    Hi all,

    Many thanks for your time.

    I confirm that I do not have my repo configured with a setup.py file
    Let me some time do look for how to do this and I will let you know

    I created a public repo to try: https://github.com/chiktika/test/

    Just to let you know that I'm not giving up, I will be OOO until next monday and hopefully will come back with a solution.

    Many thanks again.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Cool! Thanks for the update, and happy to know it worked!

  • tim-wright
    tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner

    @Chiktika
    Thats awesome. Would you mind marking your last response (explaining how you managed to solve the issue) as the "accepted answer" so that others will have an easier time finding the answer without having to read through my and @Ignacio_Toledo
    's long winded responses

Setup Info
    Tags
      Help me…