Read Excel/CSV from Git library
Situation: I am building on an existing Python code base where an Excel file with several sheets was used as a settings file. Excel is mainly used because of ease of use for experts who must set the settings. This Excel is stored in Git (I know that we shouldn't store data in Git). The idea is that the Excel is updated and that I can read in the settings from each of the sheets using a Python recipe to use in modelling. I can upload each of the sheets separately by importing the Excel manually but then there is no automated connection with the Excel.
Question: Can I read in Excel (or CSV/text) from 'Library' which I have imported from Git?
If not, does anybody have a workaround in mind where I can update the Excel (while being under version control) and access the contents in Python recipes?
Thanks in advance!
Operating system used: Windows
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
Is the file included in Project Library directly in DSS? If it is then you can read from the path DATA_DIR/config/projects/PROJECT_KEY/lib/file_name.xls
Once you have the file simply read the various sheets using read_excel
https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html
It makes more sense in your case to use a managed folder here.