Dataiku Automation For Excel File Refresh
Hello,
I have an excel file which is connected to SQL DWH. Normally everyday, I go in this file to refresh data manually by clicking ctrl + alt + f5. I want to make it automatically with Dataiku.
pip install pywin32 import win32com.client as win32 def refresh_excel_file(file_path):
# Open Excel
excel = win32.Dispatch("Excel.Application")
excel.Visible = False # Keep Excel in the background # Open the workbook
workbook = excel.Workbooks.Open(file_path)
# Refresh all data connections
workbook.RefreshAll()
# Wait for the refresh to finish
excel.CalculateUntilAsyncQueriesDone()
# Save and close
workbook.Save()
workbook.Close()
# Quit Excel
excel.Quit() #Path to your Excel file file_path = 'XXX'
refresh_excel_file(file_path)
Is it possible to make dataiku to run excel application as given above?
As far as I understand
Automating the refresh of an Excel file with data connections using pywin32 relies on automating the Excel application through COM interfaces, which requires a Windows environment with Excel installed. And It is not possible to execute application with Dataiku to automate like this.
If not could you please direct me how i can do this task?
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,000 Neuron
You have some options to automate this from Dataiku but they may or may not work depending on your Windows configuration. One of them is using Ansible and WinRM which runs on Linux and can interface with Windows servers to execute remote commands assuming you can use with one of the supported WinRM authentication options. Powershell can be installed under Linux and supports remote command execution but again you may find that the authentication options provided don't work for you. Probably the most modern and likely to work option is PowerShell remoting over SSH which allows true multiplatform PowerShell remoting as obviously SSH is trivial to handle from Linux. On the Dataiku side you could use a Shell recipe to execute SSH commands.
In any case you should be aware that your solution is a big kludge and that you really should convince your project stakeholders to get access to the source data directly rather than via an Excel file.