Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I recieve live data through a sql connection for a interval of 15 min for past hour.
i want to store a week data into my hdfs dataset, which is the best way to do that.
Every 15 min, 1 row disappears and a row gets added, please suggest me the best way.
One sort of brute force method would be to use scenarios.
If you are using a full license of DSS, you should have access to the scenario feature. This will allow you to poll your SQL dataset every 15 minutes or slightly more frequently. This should allow you to successfully see when the new data arrives in the SQL data set by checking against what you have in hdfs. If you have new data in SQL you can then import the data, otherwise you do nothing.
Just a thought.
I want to append the new timestamp row to my hdfs dataset. for every 15 min i will get a new row added deletes one row. i just needed the new row which got added to be appended to the flow.
If your recipe can produce just the needed new results. Then you can see if you can use the "Append instead of overwrite" option in the recipe input / output section.
I'm not using HDFS so I don't know if this option is available to you with this data management type.
I hope that this might help a bit. If not. If you can say a little bit more about what you are trying to do. Or show a little bit more. I or someone else might be able to help you a little further.
Does anyone know if Partiticians will help with this process?
Here is an earlier discussion thread about a similar issue.
As you have discovered this says that HDFS does not have an append function. It does suggest partitioning.
I don't think you have said whether you are using a community edition or the Paid edition of DSS. If you are on a paid edition, you will have the opportunity to use Partitioning. Apparently, Partitioning does work with HDFS.
However, I'm not clear if you will want to create a new HFPS partition for a single row of data. If that's the case could you save a temporary dataset and create partitions daily.
In case you have not come across this; Here is the learning module on Partitioning.
I'm not going to be of much additional help.