Ready for Dataiku 9? Try out the Crash Course on new features!

# Conundrum 5: Natural Followers

Community Manager
###### Conundrum 5: Natural Followers

Welcome to the fifth Community Conundrum! This week we have a Data Prep puzzle for you - give it a go!

Situation: A twitter user has a dataset (attached) of new followers by date. The problem is the new followers (of type “natural language”) contains values with all the new followers on a particular date.

Objective: Given the dataset "followers_count", create a single dataset output that splits up these natural followers into their own rows and determine the number of weeks since the new followers were counted.

If you would like to share your results with the community feel free to do so here either in the form of screenshots, a text based description,  a project upload, or all three! If you would like to upload your project please refer to Submission guidelines .

I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

11 Replies
Neuron

2017-10-01-19

is it

yyyy-MM-dd-HH

Or is this some sort of odd date range?  Or something else?

What is the 4th part of this date column?

I'm going to assume the above.

I'm going to assume local time EDT.

However, when I did this I got an exception on a date with Daylight Saving time there was an invalid hour listed.

So I'm going to go back to UTC for my time.

--Tom
Dataiker

the date format yyyy-MM-dd-HH is correct! 😁

Neuron

The range of values in nb_followers is 643-15540

if we use space as a separator on new_followers.

The range of new_followers is 2590 - 1000701322063204352

with a strange distribution.

So I'm wondering if I'm missing something here.

--Tom
Neuron

What are you asking for when you say

"determine the number of weeks since the new followers were counted."

The latest date in the date column is:

2018-06-07T01:00:00.000Z

If I assume that this is local time in my time zone.  (EDT).

You can find the exact number of weeks days hours since that date at the URL below.

As of the point in time when I was putting this into

However, I'm really wondering if I'm answering the right question.  And If I am answering the right question above if there is a way of using the community edition to calculate these values in DSS.

-------------------

That said I feel like the answer I'm giving is not answering the question being asked.  I did not need to break up data in the column new_followers in order to do the calculation above.  I just needed to do a string to date conversion, and then a Max.

If on the other hand, if the question in weeks, how long has it been since each nb_follower <-> new_followers pair was added.

Then maybe that makes more sense.

--Tom
Dataiker

Yes this is really a separate question. You can use "now" as the time reference, then in the Prepare Recipe, find the time difference between the parsed date and now.

Neuron

You are correct.  In my version, I did not do the right thing.  I got the date.  Not the number of weeks.

let's see what I can do about that.

--Tom
Neuron

So here is the solution I ended up with.

### The Flow.

#### Prepare Steps.

##### Splitting the new_followers.

I used the Split & Fold option.

That was put into the Prepare Recipe this way.

I then ran the prepare script.

Here is what grouping looked like.

And here is an example of the results.

--Tom
Dataiker

Your "split and fold" is the solution! Great job!

Neuron

So, here is a possible step to get to what is needed for the "number of weeks since".

In this case, I defined the week as 7 days and simply rounded.  Maybe there is a bit better way to the date calculations.

--Tom
Neuron

Again I like your answer to the date calculation.  Thanks.  Here is a brief graphic of the result.

Attached is my flow.  See how this opens for you?

--Tom
Level 3

I was just thinking, for the number of weeks since new follower, there is a 'Compute difference between dates' processor which is more straightforward in my opinion since you can specify that you want the difference in weeks and 'now' is one of the options for the end time. But your solution works too!

-Anita