store live data to record data

sasidharp · October 2020

I recieve live data through a sql connection for a interval of 15 min for past hour.

i want to store a week data into my hdfs dataset, which is the best way to do that.

Every 15 min, 1 row disappears and a row gets added, please suggest me the best way.

tgb417 · October 2020

One sort of brute force method would be to use scenarios.
If you are using a full license of DSS, you should have access to the scenario feature. This will allow you to poll your SQL dataset every 15 minutes or slightly more frequently. This should allow you to successfully see when the new data arrives in the SQL data set by checking against what you have in hdfs. If you have new data in SQL you can then import the data, otherwise you do nothing.

Just a thought.

sasidharp · November 2020

I want to append the new timestamp row to my hdfs dataset. for every 15 min i will get a new row added deletes one row. i just needed the new row which got added to be appended to the flow.

tgb417 · November 2020

@sasidharp

If your recipe can produce just the needed new results. Then you can see if you can use the "Append instead of overwrite" option in the recipe input / output section.

I'm not using HDFS so I don't know if this option is available to you with this data management type.

I hope that this might help a bit. If not. If you can say a little bit more about what you are trying to do. Or show a little bit more. I or someone else might be able to help you a little further.

Does anyone know if Partiticians will help with this process?

sasidharp · November 2020

Dear @tgb417

HDFS Datasets doesn't come with Append option.

tgb417 · November 2020

@sasidharp

Here is an earlier discussion thread about a similar issue.

https://community.dataiku.com/t5/Using-Dataiku-DSS/how-can-select-the-append-mode-in-a-dataset/m-p/3367

As you have discovered this says that HDFS does not have an append function. It does suggest partitioning.

I don't think you have said whether you are using a community edition or the Paid edition of DSS. If you are on a paid edition, you will have the opportunity to use Partitioning. Apparently, Partitioning does work with HDFS.

However, I'm not clear if you will want to create a new HFPS partition for a single row of data. If that's the case could you save a temporary dataset and create partitions daily.

In case you have not come across this; Here is the learning module on Partitioning.

https://academy.dataiku.com/partitioned-models-open

I'm not going to be of much additional help.

@DvMg
where did you end up going with your append issue in the thread linked above? Can you be of any help to @sasidharp
?

@tomas
I see you were giving @DvMg
some help. Can you be of any help to @sasidharp
?

store live data to record data

Answers

Categories

Setup Info

Tags