So, continuing my exploration of the twitter API using tweepy, I realized exploring different user's tweets is ridiculously easy. So, what exciting things can one do with these tweets??
So here I was, scrolling twitter in my free time when I came across a lovely Özil wallpaper. Generally Arsenal's twitter feed includes some great pictures. Similarly for movie stars or artists, who post their pictures on twitter. It takes a lot of time to download them manually. With the power of tweepy, why not just automate the entire process?
Downloading Images from a user can be divided into 3 major sub-routines:
- Downloading tweets by the user, and filtering out the tweets having some media.
- Extracting out the links from these tweets.
- Downloading each of links.
Since twitter limits the number tweets one can download at a time, we need to keep track of the ID of the last tweet and then use max_id to download more tweets.
1 2 3 4 5 6 7 | temp_raw_tweets = api.user_timeline(screen_name=username, max_id=last_tweet_id, include_rts=False, exclude_replies=True) if len(temp_raw_tweets) == 0: break else: last_tweet_id = int(temp_raw_tweets[-1].id-1) raw_tweets = raw_tweets + temp_raw_tweets |
Once the tweets are downloaded, extract the ones which have media links. This can be done by checking if tweet's entity has a media value or not. If there is no media attached with the tweet, an empty list is returned and the process continues, otherwise, it is added to the list containing all the links.
1 2 3 4 5 6 | tweets_with_media = set() for tweet in all_tweets: media = tweet.entities.get('media',[]) if (len(media)>0): tweets_with_media.add(media[0]['media_url']) |
After the links have been extracted, download all the the links. Urllib2 or requests are the primary choices for downloading files and writing the data in the file binary. However, the module wget makes it completely easy and hassle-free. All the downloaded files are stored in the directory "twitter_images", inside the folder named "user _ handle".
1 2 | for url in media_url: wget.download(url) |
Ah! Scripting makes life so much more easier :)
Link to the code: Github