Categories
Python

Extracting links and their page title from your Twitter Archive

Twitter allows us to download our Tweets from the account settings page. Once we request our archive, Twitter will take some time to prepare it and send us an email once this is ready. We will get a download link in the email. After unpacking the archive, we shall find a csv file that contains our tweets – tweets.csv. The archive also contains a html page (index.html) that displays our tweets on a nice UI. While this is nice to look at, our primary objective is to extract the links from our tweets.

If we look at the CSV file closely, we shall find a field named expanded_urls which generally contains the urls we use in our tweets. We will work with the values in this field. With the url, we also want to fetch their title. For this we will use Python 3 (I am using 3.5) and we need the requests and beautifulsoup4 packages to download and parse the pages. Let’s install them:

We will follow these steps to extract links and their page titles from the tweets:

  • Open the csv file and read row by row
  • Each row contains a tweet, we take the expanded_urls field
  • This field can contain multiple urls, separated by a comma. We need to iterate over them all
  • We will skip some domains, for example, we don’t want to visit links to twitter status updates
  • We fetch the html content using the requests library. If the page doesn’t return a HTTP 200, we ignore the response
  • We extract the title using beautiful soup and display it

Now let’s convert these steps to codes. Here’s the final script I came up with:

I am actually using this for a personal project I am doing here – https://github.com/masnun/bookmarks – it’s basically a bare bone django admin app where I intend to store the links I visit/share. I come across a lot of interesting projects, articles, videos and then later lose track of them. Hope this app will remedy that. This piece of code is part of a twitter import functionality of the mentioned app.

Categories
Python

Python 3: Using blocking functions or codes with asyncio

We know we can do a lot of async stuff with asyncio but have you ever wondered how to execute blocking codes with it? It’s pretty simple actually, asyncio allows us to run blocking code using BaseEventLoop.run_in_executor method. It will run our functions in parallel and provide us with Future objects which we can await or yield from.

Let’s see an example with the popular requests library:

If you run the code snippet, you can see how the two responses are fetched asynchronously 🙂

Categories
Python

Creating a Twitter Retweet Bot in Python

We want to create a bot that will track specific topics and retweet them. We shall use the Twitter Streaming API to track topics. We will use the popular tweepy package to interact with Twitter.

Let’s first install Tweepy

We need to create a Twitter app and get the tokens. We can do that from : https://apps.twitter.com/.

Now let’s see the codes:

The code is pretty much self explanatory:

  • We create a Twitter API client using the oAuth details we got earlier
  • We subclass StreamListener to implement our own on_data method
  • We create an instance of this class, then create a new Stream by passing the auth handler and the listener
  • We use the track method to track a number of topics we are interested in
  • When we start to track the topics, it will pass the data to on_data method where we parse the tweet, check some common words to avoid, check language and then retweet it.