Sunday, April 10, 2016

Downloading Images from Twitter

    So, continuing my exploration of the twitter API using tweepy, I realized exploring different user's tweets is ridiculously easy. So, what exciting things can one do with these tweets??

    So here I was, scrolling twitter in my free time when I came across a lovely Özil wallpaper. Generally Arsenal's twitter feed includes some great pictures. Similarly for movie stars or artists, who post their pictures on twitter. It takes a lot of time to download them manually. With the power of tweepy, why not just automate the entire process?

Downloading Images from a user can be divided into 3 major sub-routines:
  • Downloading tweets by the user, and filtering out the tweets having some media. 
  • Extracting out the links from these tweets. 
  • Downloading each of links. 

    Since twitter limits the number tweets one can download at a time, we need to keep track of the ID of the last tweet and then use max_id to download more tweets.

1
2
3
4
5
6
7
temp_raw_tweets = api.user_timeline(screen_name=username, max_id=last_tweet_id, include_rts=False, exclude_replies=True)

if len(temp_raw_tweets) == 0:
    break
else:
    last_tweet_id = int(temp_raw_tweets[-1].id-1)
    raw_tweets = raw_tweets + temp_raw_tweets

    Once the tweets are downloaded, extract the ones which have media links. This can be done by checking if tweet's entity has a media value or not. If there is no media attached with the tweet, an empty list is returned and the process continues, otherwise, it is added to the list containing all the links.

1
2
3
4
5
6
tweets_with_media = set()

for tweet in all_tweets:
    media = tweet.entities.get('media',[])
    if (len(media)>0):
        tweets_with_media.add(media[0]['media_url'])

    After the links have been extracted, download all the the links. Urllib2 or requests are the primary choices for downloading files and writing the data in the file binary. However, the module wget makes it completely easy and hassle-free. All the downloaded files are stored in the directory "twitter_images", inside the folder named "user _ handle".

1
2
for url in media_url:
    wget.download(url)




Ah! Scripting makes life so much more easier :)


Link to the code: Github


Friday, April 1, 2016

Streaming tweets with python using tweepy

    So, I recently started experimenting with the Twitter's API. Twitter provides many API's for users to work with, the major ones being Streaming API, REST API and Search API. Twitter requires special tokens before it gives out its data. You need to generate your own tokens should you want to use Twitter's data.

    Now there are plenty of modules available for manipulating twitter's data, the one I used was tweepy, as it was recommended by many developers and had a very good documentation available. Another quite useful module was twython, but I decided to stick with tweepy.

    First step: Authentication via tokens. Goto apps.twitter.com and create a new app to get your tokens. If you're having troubles, this link shows precisely what to do. Then, import OAuthHandler from tweepy and pass the keys and tokens to this handler to create an API object.

1
2
3
auth = OAuthHandler(t.CONSUMER_KEY, t.CONSUMER_SECRET)
auth.set_access_token(t.ACCESS_TOKEN,t.ACCESS_TOKEN_SECRET)
api = API(auth)

    Tweepy has a simple class designed especially for grabbing real time streaming data called StreamListener, which can be inherited and tailored as per our requirement. This data sent by twitter's servers are gigantic in size,they send every information there is available regarding that tweet and hence there is a lot of data you might not need.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{"created_at":"Thu Mar 24 21:07:52 +0000 2016","id":713110104418172928,"id_str":"713110104418172928",
"text":"New Offer Bet \u00a310 Get \u00a320 FREE - Bet Now: https:\/\/t.co\/8OLizB0qka #twitter92 #Arsenal https:\/\/t.co\/y2HKha3IFC","source":"\u003ca href=\"https:\/\/www.socialoomph.com\" rel=\"nofollow\"\u003eSocialOomph\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,
"user":{"id":2762185416,"id_str":"2762185416","name":"Cash Offers","screen_name":"betting_Big","location":null,"url":null,"description":"#1 Twitter for Free Money Bets","protected":false,"verified":false,"followers_count":2797,"friends_count":1674,"listed_count":112,"favourites_count":2,"statuses_count":38601,"created_at":"Sun Aug 24 12:04:07 +0000 2014","utc_offset":-25200,"time_zone":"Pacific Time (US & Canada)",
"geo_enabled":false,"lang":"en","contributors_enabled":false,
"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6",
"profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/619882722266357760\/gctGZjOJ_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/619882722266357760\/gctGZjOJ_normal.jpg",
"profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/2762185416\/1436627381",
"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,
"entities":{"hashtags":[{"text":"twitter92","indices":[66,76]},{"text":"Arsenal","indices":[77,85]}],"urls":[{"url":"https:\/\/t.co\/8OLizB0qka","expanded_url":"http:\/\/bit.ly\/Bet10Gt20","display_url":"bit.ly\/Bet10Gt20","indices":[42,65]}],"user_mentions":[],"symbols":[],"media":[{"id":713110103898132480,"id_str":"713110103898132480",
"indices":[86,109],"media_url":"http:\/\/pbs.twimg.com\/media\/CeV53HyXEAAjBl7.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/CeV53HyXEAAjBl7.jpg","url":"https:\/\/t.co\/y2HKha3IFC","display_url":"pic.twitter.com\/y2HKha3IFC","expanded_url":"http:\/\/twitter.com\/betting_Big\/status\/713110104418172928\/photo\/1","type":"photo",
"sizes":{"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":520,"h":214,"resize":"fit"},"small":{"w":340,"h":140,"resize":"fit"},"medium":{"w":520,"h":214,"resize":"fit"}}}]},"extended_entities":{"media":[{"id":713110103898132480,"id_str":"713110103898132480","indices":[86,109],
"media_url":"http:\/\/pbs.twimg.com\/media\/CeV53HyXEAAjBl7.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/CeV53HyXEAAjBl7.jpg","url":"https:\/\/t.co\/y2HKha3IFC","display_url":"pic.twitter.com\/y2HKha3IFC","expanded_url":"http:\/\/twitter.com\/betting_Big\/status\/713110104418172928\/photo\/1",
"type":"photo","sizes":{"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":520,"h":214,"resize":"fit"},"small":{"w":340,"h":140,"resize":"fit"},
"medium":{"w":520,"h":214,"resize":"fit"}}}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1458853672495"}


    This data has a dictionary type structure, and hence can be easily manipulated by using a json parser. Twitter's developer page comes in handy when you need to find what data you need to extract from the data.


1
2
3
4
5
j = json.loads(data)
line1 = "@" + str(j['user']['screen_name']) + " on " + j['created_at'][:-11] + ", language= "+ j["lang"] + ": "
line2 = '\n' + j['text']
text = line1 + line2
print text + "\n\n"


    The tweets can be filtered to give tweets relating to a particular topic or keyword, by setting the search settings in the data streams.




    The REST API works a bit differently than the the Streaming API, it goes and searches for tweets that have already been posted before instead of taking them in real time. Twitter's rate limiting can be a real issue here, as it only allows upto 100 or 200 queries(maybe even less?) during one search. Just like in Streaming API, the data sent here is immense and we need to filter out the required data. All of these capabilities can allow us to create a pretty powerful python based twitter client. The possibilities are, as they say, is endless!


Link to the code: Github