Discover the winners & finalists of the 2022 Dataiku Frontrunner Awards!READ THEIR USE CASES

demande d'aide pattern de partitionnement

Solved!
utilisateurrand
Level 1
demande d'aide pattern de partitionnement

Bonjour,

je me permets de vous contacter, car, j'essaie de faire un pattern de partitionning qui sera utilisé avec le recipe "Sync".
Cependant, j'ai l'impression que je fais mal les choses.
voilà ce que j'ai en données de départ

utilisateurrand_0-1664955085856.png

voilà le pattern que j'utilise

utilisateurrand_1-1664955085999.png

et ça me fait cette erreur.

utilisateurrand_2-1664955086562.png

je souhaite avoir comme résultat:
des dossiers de type "dt=YYYY-MM-DD"
et pour chaque dossier, les données dont les fichiers ont  la même heure dans un seul fichier avec l'heure (cf colonne "date3" de  mes données de départ).
Savez vous, s'il vous plaît comment je peux faire, car, je suis vraiment bloqué sur cela.
Merci d'avance de votre aide.

Bien cordialement,

0 Kudos
1 Solution
utilisateurrand
Level 1
Author

Hello,

yes, exactly, You resolved my issue by support.

However, I will explain the solution, if others users need to do the same thing:

 

Solution 1:

If you need a specific path which correspond to some columns of your dataset, you create a partitionning on your origin dataset with the path you want (in ma case dt=YYYY-MM-DD/YYYYMMDD_HHmmSS.csv)

with a recipe "sync", it will create a splitted dataset on  column 1(dt=YYYY-MM-DD) and column 2(YYYYMMDD_HHmmSS) and files are named out*.csv

if you  need to specify the csv finename

solution 2:

you should create a recipe "python".

you can export to specific folder with specific name.

example of code:

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
base_64 = dataiku.Dataset("NAME_OF_YOUR_DATASET")
df = base_64.get_dataframe()

managed_folder_id = "NAME_FOLDER_YOU_WANT"
output_folder = dataiku.Folder(managed_folder_id)
filename = "NAME_FILE_YOU_WANT.csv"
output_folder.upload_data(filename, df.to_csv(index=False).encode("utf-8"))

 

View solution in original post

3 Replies
AlexT
Dataiker

Hi @utilisateurrand ,

I believe we resolved this issue over support. 

NPE -> java.lang.NullPointerException at com.dataiku.dip.dataflow.pdep.EqualsEvaluator.getDependent(EqualsEvaluator.java:12)
was due to the fact you are trying to perform a redispatch on an already partitioned dataset. Removing the partitioning from the input dataset resolved the issue.  

Redispatch works from non-partitioned ->  partitioned datasets

https://knowledge.dataiku.com/latest/kb/data-prep/partitions/partitioning-redispatch.html

 

Thanks,

 

 

 

0 Kudos
utilisateurrand
Level 1
Author

Hello,

yes, exactly, You resolved my issue by support.

However, I will explain the solution, if others users need to do the same thing:

 

Solution 1:

If you need a specific path which correspond to some columns of your dataset, you create a partitionning on your origin dataset with the path you want (in ma case dt=YYYY-MM-DD/YYYYMMDD_HHmmSS.csv)

with a recipe "sync", it will create a splitted dataset on  column 1(dt=YYYY-MM-DD) and column 2(YYYYMMDD_HHmmSS) and files are named out*.csv

if you  need to specify the csv finename

solution 2:

you should create a recipe "python".

you can export to specific folder with specific name.

example of code:

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
base_64 = dataiku.Dataset("NAME_OF_YOUR_DATASET")
df = base_64.get_dataframe()

managed_folder_id = "NAME_FOLDER_YOU_WANT"
output_folder = dataiku.Folder(managed_folder_id)
filename = "NAME_FILE_YOU_WANT.csv"
output_folder.upload_data(filename, df.to_csv(index=False).encode("utf-8"))

 

CoreyS
Dataiker Alumni

Thank you for sharing your solution with the Community, @utilisateurrand!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos