Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Selecting the latest file added in a folder using Python

Level 3
Selecting the latest file added in a folder using Python

Hi All,

I would appreciate it if someone can provide me with a python script to select (read_csv) the latest csv file in a SFTP folder?

Currently I am using the following script to read csv files from a SFTP folder, however when we have multiple files added in different dates, I would like to select only the latest one added. 


import dataiku
import pandas as pd
import numpy as np
from dataiku import pandasutils as pdu

FOLDER_NAME = 'folder_1'
FILE_NAME = 'file_1.csv'
DATASET_NAME = 'dataset_1'

folder = dataiku.Folder(FOLDER_NAME)
with folder.get_download_stream(FILE_NAME) as f:
    df = pd.read_csv(f)


Lets assume I have two files in the folder: file_1.csv and file_2.csv and file_2.csv has been added to the folder today and file_1 added last month. How can I select file_2?


0 Kudos
3 Replies


What you are trying to do is achievable.  There are a number of posts in the Dataiku community where folks are doing this kind of thing. 

There is also documentation Like this:

in general you might do something like this:


input_folder = dataiku.Folder("AAAAAAAA")
paths = input_folder.list_paths_in_partition()


path_details = []
for path in paths:

note that the folder “AAAAAAAA” is what ever folder name you gave the ftp connected folder

in path_details you have a bunch of data about your files including modify times. 

This seems to work ok for up to moderate sized data repository of less than 100,000 files.  After that size things start breaking down and moving to shell scripts is faster and more reliable in my experience.  

I hope this helps.   Let us know how you get on with your project.  



Level 3

Hi @tgb417 ,

Thank you for taking the time and answering my question. 

It was really helpful and I'm using the following script now: 

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import time

input_folder = dataiku.Folder(FOLDER_NAME)

current_epoch = int(time.time())*1000

for item in input_folder.get_path_details()["children"]:


This script list out all the files along with their last modified date which is perfect. Now how can I select the max file (latest file)?

0 Kudos
Level 1


Were you able to find a solution on this where DataIKU reads the latest uploaded file out of all the list of uploaded files?


0 Kudos