DB上にテーブルが乱立するのを避けたい

HiroshiKawasaki · August 7

DSSでレシピを実行すると必ずデータセットの指定が必要になり、レシピ数分のデータセットが作られることになります。

最終的な処理結果として不要な中間テーブルが残存する形になり、複雑なフローでレシピ数が増えてくるとDBのストレージを圧迫してしまいます。

中間テーブルをビュー化する、中間テーブルを作成せずに連続してレシピを実行するなどの解決方法などはありますでしょうか。

Alexandru · August 7

If I understand correctly, you were asking whether there are solutions such as turning intermediate tables into views or running recipes consecutively without creating intermediate tables.

Yes, you can enable SQL if your dataset + recipes are compatible with SQL engine.

https://doc.dataiku.com/dss/latest/sql/pipelines/sql_pipelines.html

If you are using Cloud Storage, you can use Spark engine + Spark Pipelines

By default, with pipelines enabled, you will avoid materializing intermediate datasets.

https://doc.dataiku.com/dss/latest/spark/pipelines.html

https://community.dataiku.com/discussion/45167/db%E4%B8%8A%E3%81%AB%E3%83%86%E3%83%BC%E3%83%96%E3%83%AB%E3%81%8C%E4%B9%B1%E7%AB%8B%E3%81%99%E3%82%8B%E3%81%AE%E3%82%92%E9%81%BF%E3%81%91%E3%81%9F%E3%81%84

DB上にテーブルが乱立するのを避けたい

Best Answer

Categories

Setup Info

Tags