starting a project with pyhon script that generate data, is it possible ?

Options
gto
gto Registered Posts: 2

Hello everyone, I'm new to dataiku.

I developped a python script that is collecting data from differentes (web scrapping, local files...) sources before generating a pandas dataframe, then I performe my analysis on it.

I would like to switch this project into dataiku. BUT, when I start a project, I need a dataset whereas I don't have it yet.

Question 1 : is it possible to start a flow with my python script to generate a dataset ?
Question 2 : if not, can I start my project with an empty dataset, then include a python code that fill the dataset, then reload the dataset ?

Thank you for your help!


Operating system used: ubuntu 22.04

Tagged:

Best Answer

  • PaulK
    PaulK Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 1 Dataiker
    edited July 17 Answer ✓
    Options

    Hello @gto
    ,

    It is possible to start a flow with a python script.
    In your project, on the top right of the flow view, select +RECIPE > CODE > Python in order to create a new Python recipe. This recipe can be created without an input and with an output (or more if you want several output datasets).

    Once your code recipe is created, you will have a python code sample, which should end with something like this :

    # Write recipe outputs
    outputDataset = dataiku.Dataset("outputDataset")
    outputDataset.write_with_schema(outputDataset_df)

    You will need to correctly fill outputDataset_df with the panda dataframe output of your script.

    If you need more information on the python API, please have a look at the Dataiku documentation. Here is the link to the documentation on how to write the output schema.

    Please let me know if that works for you.

    Best regards,

    Paul


Answers

Setup Info
    Tags
      Help me…