Remap connection to a locally-stored dataset

Solved!
RicSpd
Level 2
Remap connection to a locally-stored dataset

I would like to export a project as a bundle into the Automation Node. However, since my initial files are stored in the DSS default data directory (which I don't actually know where it is located), when I export the bundle without exporting the datasets I get this error on the Automation Node

2020-08-06 13_05_45-train - Explore _ Dataiku (Automation).png

 

Of course, I know that if I export my datasets too everything works; however, my initial uploaded files are quite large, and exporting plus re-importing them takes quite some time.

So I thought of remapping the connection to the initially uploaded files, but the only connection I can remap involves filesystem_managed, which is not where the data has been stored. 

2020-08-06 13_12_10-Bundles settings _ Dataiku (Automation).png

 

Are there any best practises to remap locally-uploaded files from a Design Node to an Automation Node, without exporting them inside bundles?

0 Kudos
1 Solution
Liev
Dataiker Alumni

Hi  @RicSpd 

Thanks for your question. 

If I understand correctly, you have uploaded large files into your design instance and now want to reuse them in automation, but without having to bundle them for deployment.

As you can see, this is not as straight forward, and it boils down to a design decision. If you intend on using large files from multiple places, I would recommend at hosting those in S3 buckets, SFTP servers or other such services, this will make access easier. Normally, uploading files directly into DSS should be done for smaller files which you indeed won't mind bundling in should you need to.

If you REALLY need to use your solution as is, you'd need to create a new FS connection on both servers that points to DESIGN_DSS_HOST/uploads/PROJECT_KEY/datasets/DATASET_NAME. But as I hope this is clear, this is non-advisable.

Good luck!

View solution in original post

2 Replies
Liev
Dataiker Alumni

Hi  @RicSpd 

Thanks for your question. 

If I understand correctly, you have uploaded large files into your design instance and now want to reuse them in automation, but without having to bundle them for deployment.

As you can see, this is not as straight forward, and it boils down to a design decision. If you intend on using large files from multiple places, I would recommend at hosting those in S3 buckets, SFTP servers or other such services, this will make access easier. Normally, uploading files directly into DSS should be done for smaller files which you indeed won't mind bundling in should you need to.

If you REALLY need to use your solution as is, you'd need to create a new FS connection on both servers that points to DESIGN_DSS_HOST/uploads/PROJECT_KEY/datasets/DATASET_NAME. But as I hope this is clear, this is non-advisable.

Good luck!

RicSpd
Level 2
Author

Hi @Liev, you have been very clear with your explanation. Thanks a lot!

Labels

?
Labels (1)
A banner prompting to get Dataiku