Encrypted Excel files

Options
Gwosa-Sweden
Gwosa-Sweden Registered Posts: 1

Does anyone has the experience to load an excrypted Excel file?

Tagged:

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,757 Neuron
    Options

    The msoffcrypto-tool Python package may be one approach but it doesn't support the latest Excel formats. What format is your Excel file on?

    https://github.com/nolze/msoffcrypto-tool

  • greistaen
    greistaen Dataiku DSS Core Designer, Registered Posts: 2
    Options

    Could you provide python code to

    1. read from non-local source folders of excel files

    2. test if they are password protected.

    3. if yes, perform password decrypt using msoffcrypto-tool

    4. save password unprotected excel back to source folder

    Thanks.

  • greistaen
    greistaen Dataiku DSS Core Designer, Registered Posts: 2
    edited July 17
    Options

    I tried the below. however, the output excel file could not be read by the Create Dataset , error message

    • Used /NEW_SPREADSHEET.xlsx (244.18 KB) to parse data
    • Failed to detect file format. Please manually fix

    Even if i manually selected excel as format, it still cannot load it into Preview.

    ==code==

    import io
    import shutil
    import dataiku
    import msoffcrypto, openpyxl
    
    
    # Read recipe inputs
    source = dataiku.Folder("input")
    source_info = source.get_info()
    
    paths = source.list_paths_in_partition()
    
    # Write recipe outputs
    target = dataiku.Folder("output")
    target_info = target.get_info()
    
    
    for path in paths:
        decrypted = io.BytesIO()
        with source.get_download_stream(path) as input_file:
            with io.BytesIO() as seekable:
                shutil.copyfileobj(input_file, seekable)
                file = msoffcrypto.OfficeFile(seekable)
                file.load_key(password="xxxxx")  # Use password
                file.decrypt(decrypted)
                
                xlfile = openpyxl.load_workbook(decrypted)
                xlfile.save(decrypted)
                decrypted.seek(0)
                target.upload_stream("NEW_SPREADSHEET.xlsx", decrypted)

Setup Info
    Tags
      Help me…