Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on November 6, 2017 1:33PM
Likes: 0
Replies: 2
Hi,
I try some Python code in recipe to read csv / excel file from html and local csv. The code link below:
read from html
link_down = link_share + '/download'
df_proj_list = pandas.read_excel(link_down, skiprows=1) #1 empty row
# Recipe outputs
digi_PAY_Master_Data_original = dataiku.Dataset("DIGI-PAY_Master_Data_original")
digi_PAY_Master_Data_original.write_with_schema(df_proj_list)
But when explore the result, the header row was missing, and it uses the first data row as header instead.
I try this codes in my jupyter notebook and it work well.
So I try again, by using local jupyter notebook to read the file from url, then export to a csv file, then use Dataiku recipe to read the exported file
The code in Dataiku recipe:
read from csv
olap_folder = '/home/giangtm/Work/Projects/DataScience/olap/'
file_master_data = 'DIGI-PAY_Master_Data.csv'
df_proj_list = pandas.read_csv(olap_folder + file_master_data)
# Recipe outputs
digi_PAY_Master_Data_original = dataiku.Dataset("DIGI-PAY_Master_Data_original")
digi_PAY_Master_Data_original.write_with_schema(df_proj_list)
But the output data was missing the header again (means that the header was lost, and it use the first data row as header)
Is it a bug?