Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Read Excel/CSV from Git library

pietvoerman
Level 1
Read Excel/CSV from Git library

Situation: I am building on an existing Python code base where an Excel file with several sheets was used as a settings file. Excel is mainly used because of ease of use for experts who must set the settings. This Excel is stored in Git (I know that we shouldn't store data in Git). The idea is that the Excel is updated and that I can read in the settings from each of the sheets using a Python recipe to use in modelling. I can upload each of the sheets separately by importing the Excel manually but then there is no automated connection with the Excel.

Question: Can I read in Excel (or CSV/text) from 'Library' which I have imported from Git?

If not, does anybody have a workaround in mind where I can update the Excel (while being under version control) and access the contents in Python recipes?

Thanks in advance!


Operating system used: Windows

0 Kudos
1 Reply
AlexT
Dataiker
Dataiker

Hi,

Is the file included in Project Library directly in DSS? If it is then you can read from the path  DATA_DIR/config/projects/PROJECT_KEY/lib/file_name.xls

Once you have the file simply read the various sheets using  read_excel

https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html

It makes more sense in your case to use a managed folder here. 

0 Kudos