Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Scraping Pyspark jobs without input data sets

Level 1
Scraping Pyspark jobs without input data sets


I am currently creating PySpark jobs that do not have defined input data sets within my Pyspark notebooks. The tables are executed with spark sql within the actual notebook itself. I am wanting to see if there is a way to access all the tables executed within the spark sql across multiple projects. See screen shot below. The 'SELECT * FROM DB.TABLE' is where I am trying to grab the data sets being used. As you can see in the screen shot there is no inputs within the Pyspark notebook.


0 Kudos
1 Reply

Hi @zjacobs23 ,
In DSS you should ideally  add the required datasets as input to your recipe where possible if there are in other projects, then you should use the sharing dataset feature:

Once added you can read these as per steps described here :

If you want to execute Spark SQL you can use Spark SQL recipe: : 


0 Kudos