Connecting to Redshift in a Python recipe
I want to create Python plugins/recipes that connect to Redshift. Is there any way to do this? I've seen posts about HDFS, Spark, etc. but no documentation or sample code for Python > Redshift.
Answers
-
Hi,
Are you looking to actually connect to Redshift from within your Python code itself? If not, and you are simply trying to use existing Redshift tables in your plugin or recipe, then you can simply create a dataset in DSS pointing to the table(s) and use our existing dataset APIs, similar to the examples that can be found in our documentation here:
https://doc.dataiku.com/dss/latest/code_recipes/python.html
https://doc.dataiku.com/dss/latest/python-api/datasets.html
Otherwise, if you are looking to connect directly via Python, then it's important to keep in mind that "Python in DSS" is essentially the same as "Python outside of DSS". Therefore, this would be similar to asking how you can generically connect to Redshift through Python locally. From doing a quick Google search online, it appears that one common approach is to leverage the psycopg2 package in Python (which you would need to first install in a code environment). Then, you could write additional code to establish the connection itself and either pull the data in, submit queries, write data, etc.
If you do decide to go this route, you may find the following links helpful as well:
https://www.blendo.co/blog/access-your-data-in-amazon-redshift-and-postgresql-with-python-and-r/
https://stackoverflow.com/questions/53891593/python-write-dateframe-to-aws-redshift-using-psycopg2
https://gist.github.com/jaychoo/4e3effdeed3672173b67
Thanks,
Andrew