Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi there,
I want to:
- Use pyspark (NOT SnowSQL or anything else โ Spark is the best solution for the work I'm doing) to query a table in Snowflake (something like this would be perfect):
https://docs.snowflake.com/en/user-guide/spark-connector-use.html#configuring-the-pyspark-script
- From a notebook (NOT a recipe)
- Specifying tables as query strings, e.g. โselect * from <table_name>โ (NOT using a manually defined DSS dataset)
I was able to create the SparkSession etc. and got this far:
spark.sql("select * from my_db.my_schema.my_table limit 3")
AnalysisException: The namespace in session catalog must have exactly one name part: my_db.my_schema.my_table;
It didnโt throw a โTable or view not foundโ error so it can see the table is there; apart from that I don't know what the error means. A Google search didn't return much useful.
So then I turned to the DSS docs and came across this:
https://knowledge.dataiku.com/latest/kb/data-prep/pyspark/index.html
Which has this:
# Point to the DSS datasets
users_ds = dataiku.Dataset("users")
movies_ds = dataiku.Dataset("movies")
ratings_ds = dataiku.Dataset("ratings")
# And read them as a Spark dataframes
users = dkuspark.get_dataframe(sqlContext, users_ds)
movies = dkuspark.get_dataframe(sqlContext, movies_ds)
ratings = dkuspark.get_dataframe(sqlContext, ratings_ds)
Which is not a viable solution as I donโt want to have to define a Dataiku dataset for every table in our database I might want to query (there are 100โs of them). I just want to be able to run a query string and return the results as a Spark dataframe like I would usually do if I weren't using DSS, exactly like Iโve done above.
Is this possible?
Thanks
Operating system used: Linux
Hi
the spark-snowflake integration is documented on Snowflake's site at https://docs.snowflake.com/en/user-guide/spark-connector-use.html#using-the-connector-with-python . In DSS you'll need to have the spark integration checkbox checked on the Snowflake connection.
In essence, instead of doing df = spark.sql(....) like you planned, you need to do df = spark.read.format("snowflake").option("query", ....).load()