store live data to record data

sasidharp
Level 3
store live data to record data

I recieve live data through a sql connection for a interval of 15 min for past hour.

time.PNG

i want to store a week data into my hdfs dataset, which is the best way to do that.

Every 15 min, 1 row disappears and a row gets added, please suggest me the best way.

0 Kudos
5 Replies
tgb417

@sasidharp 

One sort of brute force method would be to use scenarios.  
If you are using a full license of DSS, you should have access to the scenario feature.  This will allow you to poll your SQL dataset every 15 minutes or slightly more frequently.  This should allow you to successfully see when the new data arrives in the SQL data set by checking against what you have in hdfs.  If you have new data in SQL you can then import the data, otherwise you do nothing.

Just a thought.  

 

--Tom
0 Kudos
sasidharp
Level 3
Author

I want to append the new timestamp row to my hdfs dataset. for every 15 min i will get a new row added deletes one row. i just needed the new row which got added to be appended to the flow.

0 Kudos
tgb417

@sasidharp 

If your recipe can produce just the needed new results.  Then you can see if you can use the "Append instead of overwrite" option in the recipe input / output section.  

2020-11-03_16-18-56.jpg

I'm not using HDFS so I don't know if this option is available to you with this data management type.

I hope that this might help a bit.  If not.  If you can say a little bit more about what you are trying to do. Or show a little bit more. I or someone else might be able to help you a little further.

Does anyone know if Partiticians will help with this process?

 

--Tom
0 Kudos
sasidharp
Level 3
Author

Dear @tgb417 

HDFS Datasets doesn't come with Append option.

 

0 Kudos
tgb417

@sasidharp 

Here is an earlier discussion thread about a similar issue.

https://community.dataiku.com/t5/Using-Dataiku-DSS/how-can-select-the-append-mode-in-a-dataset/m-p/3...

As you have discovered this says that HDFS does not have an append function.  It does suggest partitioning.  

I don't think you have said whether you are using a community edition or the Paid edition of DSS.  If you are on a paid edition, you will have the opportunity to use Partitioning.  Apparently, Partitioning does work with HDFS. 

However, I'm not clear if you will want to create a new HFPS partition for a single row of data.  If that's the case could you save a temporary dataset and create partitions daily.

In case you have not come across this; Here is the learning module on Partitioning.

https://academy.dataiku.com/partitioned-models-open

I'm not going to be of much additional help.

@DvMg where did you end up going with your append issue in the thread linked above?  Can you be of any help to @sasidharp ?

@tomas I see you were giving @DvMg some help.  Can you be of any help to @sasidharp?

 

 

--Tom