We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

How to reduce DATA_DIR size?

Solved!
UserBird
Dataiker
Dataiker
How to reduce DATA_DIR size?
The DSS Data Directory is huge (several GB). Is there a way to optimize it without any data loss?
1 Solution
jereze
Dataiker
Dataiker

The DSS data directory (or DATA_DIR) stores all your configuration, settings, definitions of datasets, recipes, logs, etc. of your DSS installation. You can read more about the DATA_DIR.



This is a critical directory. You should make regular back-ups.



If you really need to gain some space, you have the following options:




  • Look at the managed datasets in DATA_DIR/managed_datasets and verify that there is no dataset no longer used and that has not been deleted.

  • Delete the logs of old jobs that are in DATA_DIR/jobs.



For instance:




#open the jobs directory
cd DATA_DIR/jobs
#find all directories containing logs of jobs older than 30 days
find . -mindepth 2 -maxdepth 2 -type d -mtime +30
#delete these directories
find . -mindepth 2 -maxdepth 2 -type d -mtime +30 -exec rm -rf {} \;


Keep in mind it's a touchy operation!

Jeremy, Product Manager at Dataiku

View solution in original post

0 Kudos
1 Reply
jereze
Dataiker
Dataiker

The DSS data directory (or DATA_DIR) stores all your configuration, settings, definitions of datasets, recipes, logs, etc. of your DSS installation. You can read more about the DATA_DIR.



This is a critical directory. You should make regular back-ups.



If you really need to gain some space, you have the following options:




  • Look at the managed datasets in DATA_DIR/managed_datasets and verify that there is no dataset no longer used and that has not been deleted.

  • Delete the logs of old jobs that are in DATA_DIR/jobs.



For instance:




#open the jobs directory
cd DATA_DIR/jobs
#find all directories containing logs of jobs older than 30 days
find . -mindepth 2 -maxdepth 2 -type d -mtime +30
#delete these directories
find . -mindepth 2 -maxdepth 2 -type d -mtime +30 -exec rm -rf {} \;


Keep in mind it's a touchy operation!

Jeremy, Product Manager at Dataiku

View solution in original post

0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku DSS