This website uses cookies. By clicking OK, you consent to the use of cookies. Read our cookie policy.

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Conundrum 5: Natural Followers

Welcome to the fifth Community Conundrum! This week we have a Data Prep puzzle for you - give it a go!

**Situation: **A twitter user has a dataset (attached) of new followers by date. The problem is the new followers (of type “natural language”) contains values with *all* the new followers on a particular date.

**Objective: **Given the dataset "followers_count", create a single dataset output that splits up these natural followers into their own rows and determine the number of weeks since the new followers were counted.

If you would like to share your results with the community feel free to do so here either in the form of screenshots, a text based description, a project upload, or all three! If you would like to upload your project please refer to Submission guidelines .

I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?

*Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base *

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!

11 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

What do we know about this date format?

2017-10-01-19

is it

yyyy-MM-dd-HH

Or is this some sort of odd date range? Or something else?

What is the 4th part of this date column?

I'm going to assume the above.

In what time zone was this date and time recorded.

Without some more information about the data, this can be difficult.

I'm going to assume local time EDT.

However, when I did this I got an exception on a date with Daylight Saving time there was an invalid hour listed.

So I'm going to go back to UTC for my time.

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

the date format yyyy-MM-dd-HH is correct! 😁

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

The range of values in nb_followers is 643-15540

if we use space as a separator on new_followers.

The range of new_followers is 2590 - 1000701322063204352

with a strange distribution.

In looking at the twitter API I found nothing specific about this data column.

So I'm wondering if I'm missing something here.

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

What are you asking for when you say

"determine the number of weeks since the new followers were counted."

The latest date in the **date** column is:

2018-06-07T01:00:00.000Z

If I assume that this is local time in my time zone. (EDT).

You can find the exact number of weeks days hours since that date at the URL below.

As of the point in time when I was putting this into

However, I'm really wondering if I'm answering the right question. And If I am answering the right question above if there is a way of using the community edition to calculate these values in DSS.

-------------------

That said I feel like the answer I'm giving is not answering the question being asked. I did not need to break up data in the column new_followers in order to do the calculation above. I just needed to do a string to date conversion, and then a Max.

If on the other hand, if the question in weeks, how long has it been since each nb_follower <-> new_followers pair was added.

Then maybe that makes more sense.

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

You are correct. In my version, I did not do the right thing. I got the date. Not the number of weeks.

let's see what I can do about that.

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

So here is the solution I ended up with.

I used the Split & Fold option.

That was put into the Prepare Recipe this way.

I then ran the prepare script.

Here is what grouping looked like.

And here is an example of the results.

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Your "split and fold" is the solution! Great job!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

So, here is a possible step to get to what is needed for the "number of weeks since".

In this case, I defined the week as 7 days and simply rounded. Maybe there is a bit better way to the date calculations.

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Again I like your answer to the date calculation. Thanks. Here is a brief graphic of the result.

Attached is my flow. See how this opens for you?

P.S. Will you be able to join us for the discussion on Wednesday?

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

@tgb417 ,

I was just thinking, for the number of weeks since new follower, there is a 'Compute difference between dates' processor which is more straightforward in my opinion since you can specify that you want the difference in weeks and 'now' is one of the options for the end time. But your solution works too!

-Anita