Sharepoint Files with Different Field Counts, but Same Field Names
Hello everyone,
I am currently importing files from SharePoint that are sent to me via email on a weekly basis.
Each file may have a different number of fields each time it is sent.
I notices that Dataiku is not appending on the name each time and simply creating null values on the fields that arent visible on each dataset.
Instead it pushes values to the left to fill columns in.
example:
data1.csv
name,lastname,money,activity
jj,mounts,1000,100
data2.csv
name,lastname,activity
jj,mounts,100
sharepoint dataset built in dataiku
name lastname money activity
jj mounts 1000 100
jj mounts 100
the above result would be wrong and its what Dataiku is doing.
It should have filled money with a blank and activity with 100
How can i fix this error?
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,000 Neuron
How exactly are you loading these files? Describe your flow.
-
I built a flow that creates a dataset from the Sharepoint plug in, then i use R to pull the data from that dataset to make transformations and join to other datasets.
The issue is at the very beginning when Dataiku pulls in the data from Sharepoint though.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,000 Neuron
It's still not clear to me how your flow works. Do you have 1 dataset per file? Are you re-using the same dataset for all files? If so that's the problem. Datasets have a defined schema. You can't just change files behind it without updating the schema. This might be a case for a Python recipe where you can handle different schemas programatically.
-
I am reusing the same dataset for all files.
I thought the process would give me an option to append the files based on the field names which in context is a very simple ask.
Do you have any documentation on the Python solution, i would love to entertain that option.
Thanks!
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,000 Neuron
This wil guide on how to do a Python recipe:
https://doc.dataiku.com/dss/latest/code_recipes/python.html#python-recipes
And this on how to merge the files with different structure:
-
thank you very much, i will look into this!