Python recipe missing first row (header)

tmgiang
tmgiang Registered Posts: 15 ✭✭✭✭

Hi,

I try some Python code in recipe to read csv / excel file from html and local csv. The code link below:

read from html

link_down = link_share + '/download'

df_proj_list = pandas.read_excel(link_down, skiprows=1) #1 empty row

# Recipe outputs

digi_PAY_Master_Data_original = dataiku.Dataset("DIGI-PAY_Master_Data_original")

digi_PAY_Master_Data_original.write_with_schema(df_proj_list)

But when explore the result, the header row was missing, and it uses the first data row as header instead.

I try this codes in my jupyter notebook and it work well.

So I try again, by using local jupyter notebook to read the file from url, then export to a csv file, then use Dataiku recipe to read the exported file

The code in Dataiku recipe:

read from csv

olap_folder = '/home/giangtm/Work/Projects/DataScience/olap/'

file_master_data = 'DIGI-PAY_Master_Data.csv'

df_proj_list = pandas.read_csv(olap_folder + file_master_data)

# Recipe outputs

digi_PAY_Master_Data_original = dataiku.Dataset("DIGI-PAY_Master_Data_original")

digi_PAY_Master_Data_original.write_with_schema(df_proj_list)

But the output data was missing the header again :) (means that the header was lost, and it use the first data row as header)

Is it a bug?

Tagged:

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Hi,

    It looks like an issue with your reading code. I would suggest that you print the column names and the data in the df after using read_excel or read_csv, before writing it to a dataset.
  • tmgiang
    tmgiang Registered Posts: 15 ✭✭✭✭
    Actually I copy the code into my jupyter notebook and it still work well.

    Back to your comment, I try to print the columns names after read, and it prints corrects columns's name. :) But when I explore the data in table view, it is still missing the first row (the column name)
Setup Info
    Tags
      Help me…