Community Conundrum 28: News Engagement is live! Read More

Constraining Twitter data stream

Level 1
Constraining Twitter data stream
How can one filter out tweets satisfying a criterium? Say, only tweets with a location or a user with more than x followers?

I suppose one can listen to everything and filter things out thereafter but it's a waste of storage.

Thank you.
0 Kudos
2 Replies
Dataiker
Dataiker

Hi,



1) If you use the Twitter REST API:



You can try to use some "query operators", as defined in the Twitter REST API documentation: https://dev.twitter.com/rest/public/search Some are not listed but you can find them running an advanced search on the website: https://twitter.com/search-advanced



For example:




  • from:atwitteraccountname

  • near:"Paris, France" within:15mi



Selecting users with more than x followers is not an available option.



2) If you use the Twitter Streaming API (used by the DSS built-in connector):



https://dev.twitter.com/streaming/overview/request-parameters



Options look more limited.

Jeremy, Product Manager at Dataiku
0 Kudos
Level 1
Author
Precisely, if one uses the full Twitter API there is no issue. I mean, one can use Python or R to fill a dataset. My question was related to the DSS built-in dataset definition where one (as far as I can tell) only configure a filter on the content of the "text" field. This is a bit too simple and at the same time too much AFAIC. Too simple as a filter and too much data being streamed in as a result.
0 Kudos
Labels (2)
A banner prompting to get Dataiku DSS