Modify an excel file using Python

me2
me2 Registered Posts: 54 ✭✭✭✭✭
edited July 16 in Using Dataiku

Dataikuers, I need your help!

Thanks to @AlexB
I was able to convert a dataset into an excel file using Python.

Now I want to modify that file in the same script. In this case I'm just looking to add a few cells to new Excel Sheets but also looking to expand and do more. So I added some code that I thought would work. I don't get any errors but I also don't get any of the changes.

What can I do to fix the code?

Also, it seems like the none of the openpxl functions work. I had to use Pandas Excelwriter to make it work. Even with that, some of the features within Excelwriter don't seem to be recognized. Can someone explain why?

import dataiku
import pandas as pd, numpy as np
import openpyxl
from dataiku import pandasutils as pdu
from io import BytesIO

# Read recipe inputs
data = dataiku.Dataset("data")
data_df = data.get_dataframe()

# Get a sample of records
tester_df = data_df.head(n=100)

# Save the data as excel format into a bytes string
stream = BytesIO()
excel_writer = pd.ExcelWriter(stream, engine='openpyxl')
tester_df.to_excel(excel_writer, sheet_name='data')
excel_writer.save()

# Rewind to the begining of the bytes string
stream.seek(0)

# Write recipe outputs
target_folder = dataiku.Folder("folder")
with target_folder.get_writer("excelfile.xlsx") as writer:
    writer.write(stream.read())
    
# Add worksheet to same file
target_folder = dataiku.Folder("sample")
df1 = pd.DataFrame([["AAA", "BBB"]], columns=["Spam", "Egg"]) 
df2 = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
with pd.ExcelWriter("excelfile.xlsx") as writer:
    df1.to_excel(writer, sheet_name="Sheet1")  
    df2.to_excel(writer, sheet_name="Sheet2") 


Operating system used: Microsoft Windows

Tagged:

Best Answer

  • me2
    me2 Registered Posts: 54 ✭✭✭✭✭
    Answer ✓

    I was able to do the changes to an excel file using Microsoft's TypeScript. Dataiku sends the file to MS 365, PowerAutomate is triggered and runs the script.

    MS made coding in TypeScript was pretty easy. There is a record function (although not available to some Enterprise users currently) and many examples some can copy/paste the code to do simple Excel file manipulations.

Answers

  • me2
    me2 Registered Posts: 54 ✭✭✭✭✭
    edited July 17

    I was able to fix the code and create one file with the 3 worksheets through trial and error.

    What options can I explore to take the next step including:

    1) Format column headers on a sheet? In this example, Sheet1 Column "Spam" I would like to bold and center top.

    2) Add a column to a sheet with a formula that extends throughout the dataset? In this example, Sheet2 add Column "Count" with CountIf(A2:A3,"ABC")?

    3) Convert the data in a sheet into an Excel Table with a Table name? Convert data in Sheet data into a table and name it "tbl_data"?

    # Read recipe inputs
    data_name = dataiku.Dataset("data_name")
    data_name_df = data_name.get_dataframe()
    
    # Get a sample of records
    tester_df = data_name_df.head(n=100)
    
    df1 = pd.DataFrame([["AAA", "BBB"]], columns=["Spam", "Egg"]) 
    df2 = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"]) 
    
    # Save the data as excel format into a bytes string
    stream = BytesIO()
    
    with pd.ExcelWriter(stream) as writer:
            tester_df.to_excel(writer, sheet_name='data')
            df1.to_excel(writer, sheet_name="Sheet1") 
            df2.to_excel(writer, sheet_name="Sheet2")  
    writer.save()
    
    # Rewind to the begining of the bytes string
    stream.seek(0)
    
    # Write recipe outputs, with the Dataiku API in case the folder is remote
    target_folder = dataiku.Folder("sample2")
    with target_folder.get_writer("excelfile.xlsx") as writer:
        writer.write(stream.read())

    Any help or guidance would be greatly appreciated!

  • me2
    me2 Registered Posts: 54 ✭✭✭✭✭

    @AlexB
    , @Alex_Combessie
    , @VitaliyD

    I read previous posts dealing with "Excel with Python" where you provided great insight. Could you look at these questions and give us your thoughts?

    What options can I explore to use Python to modify the output of an excel file that I created in Python.

    1) Format column headers on a sheet? In this example, Sheet1 Column "Spam" I would like to bold and center top.

    2) Add a column to a sheet with a formula that extends throughout the dataset? In this example, Sheet2 add Column "Count" with CountIf(A2:A3,"ABC")?

    3) Convert the data in a sheet into an Excel Table with a Table name? Convert data in Sheet data into a table and name it "tbl_data"?

    Any suggestions would be greatly appreciated.

    Thank you.

Setup Info
    Tags
      Help me…