Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Bonjour,
je me permets de vous contacter, car, j'essaie de faire un pattern de partitionning qui sera utilisรฉ avec le recipe "Sync".
Cependant, j'ai l'impression que je fais mal les choses.
voilร ce que j'ai en donnรฉes de dรฉpart
voilร le pattern que j'utilise
et รงa me fait cette erreur.
je souhaite avoir comme rรฉsultat:
des dossiers de type "dt=YYYY-MM-DD"
et pour chaque dossier, les donnรฉes dont les fichiers ont la mรชme heure dans un seul fichier avec l'heure (cf colonne "date3" de mes donnรฉes de dรฉpart).
Savez vous, s'il vous plaรฎt comment je peux faire, car, je suis vraiment bloquรฉ sur cela.
Merci d'avance de votre aide.
Bien cordialement,
Hello,
yes, exactly, You resolved my issue by support.
However, I will explain the solution, if others users need to do the same thing:
Solution 1:
If you need a specific path which correspond to some columns of your dataset, you create a partitionning on your origin dataset with the path you want (in ma case dt=YYYY-MM-DD/YYYYMMDD_HHmmSS.csv)
with a recipe "sync", it will create a splitted dataset on column 1(dt=YYYY-MM-DD) and column 2(YYYYMMDD_HHmmSS) and files are named out*.csv
if you need to specify the csv finename
solution 2:
you should create a recipe "python".
you can export to specific folder with specific name.
example of code:
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
base_64 = dataiku.Dataset("NAME_OF_YOUR_DATASET")
df = base_64.get_dataframe()
managed_folder_id = "NAME_FOLDER_YOU_WANT"
output_folder = dataiku.Folder(managed_folder_id)
filename = "NAME_FILE_YOU_WANT.csv"
output_folder.upload_data(filename, df.to_csv(index=False).encode("utf-8"))
Hi @utilisateurrand ,
I believe we resolved this issue over support.
NPE -> java.lang.NullPointerException at com.dataiku.dip.dataflow.pdep.EqualsEvaluator.getDependent(EqualsEvaluator.java:12)
was due to the fact you are trying to perform a redispatch on an already partitioned dataset. Removing the partitioning from the input dataset resolved the issue.
Redispatch works from non-partitioned -> partitioned datasets
https://knowledge.dataiku.com/latest/kb/data-prep/partitions/partitioning-redispatch.html
Thanks,
Hello,
yes, exactly, You resolved my issue by support.
However, I will explain the solution, if others users need to do the same thing:
Solution 1:
If you need a specific path which correspond to some columns of your dataset, you create a partitionning on your origin dataset with the path you want (in ma case dt=YYYY-MM-DD/YYYYMMDD_HHmmSS.csv)
with a recipe "sync", it will create a splitted dataset on column 1(dt=YYYY-MM-DD) and column 2(YYYYMMDD_HHmmSS) and files are named out*.csv
if you need to specify the csv finename
solution 2:
you should create a recipe "python".
you can export to specific folder with specific name.
example of code:
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
base_64 = dataiku.Dataset("NAME_OF_YOUR_DATASET")
df = base_64.get_dataframe()
managed_folder_id = "NAME_FOLDER_YOU_WANT"
output_folder = dataiku.Folder(managed_folder_id)
filename = "NAME_FILE_YOU_WANT.csv"
output_folder.upload_data(filename, df.to_csv(index=False).encode("utf-8"))
Thank you for sharing your solution with the Community, @utilisateurrand!