Web Scraping allows us to download data from different websites over the internet to our local system. It is data mining from different online portals using Hypertext Transfer Protocols and uses this data according to our requirements. Many companies use this for data harvesting and for creating search engine bots.
Python has a large variety of packages/modules that can help in the process of web scraping like beautiful soup, selenium. Several libraries are there which can automate the process of web scraping like Autoscraper. All these libraries use different APIs through which we can scrape data and store it into a data frame in our local machine.
Twint is an open-source python library that is used for twitter scraping i.e we can use twint in order to extract data from twitter and that too without using the twitter API. There are certain features of twint which makes it more useable and unique from other twitter scraping API, namely:
- Twitter API has a limit of fetching only 3200(last) tweets while twint has no limit of downloading tweets, it can download almost all the tweets.
- Easy to use and very fast.
- No initial Sign-in or Sign-up required for fetching data.
Twint can be used to scrape tweets using different parameters like hashtags, usernames, topics, etc. It can even extract information like phone number and email id’s from the tweets.
In this article, we will explore twint and see what different functionalities it offers for scraping data from twitter.
Implementation:
We will start by installing twint using pip install twint.
- Importing required libraries
We will be scraping data from twitter using twint so we will import twint other than this we need to import net_asyncio which will handle all the notebook and runtime errors. Also, we will initiate the net_syncio in this step only.
import twint
import nest_asyncio
net_asyncio.apply()
- Configuring Twint
We need to scrape data from twitter using twint before that we need to configure the twint object and call it whenever required.
t = twint.Config()
Now let us start scraping different types of data from twitter.
- Scraping Data
- Followers on Twitter
Here, we will see how we can download the names of the followers of a particular user by using their username. Here I am using my own twitter username.
t.Username = "Himansh70809561"
twint.run.Followers(t)
Here you can see a list of my followers on twitter because I used my username, similarly, you can use the different usernames of different users and download the follower’s name.
- Storing info to Dataframe
We can also store the information into a data frame. Let us see how to store the follower’s details in a data frame.
t.Limit = 30
t.Username = 'Analyticsindiam'
t.Pandas = True
twint.run.Followers(t)
follow_df = twint.storage.panda.User_df
Here we saw that the top 30 followers are stored in a data frame. We can set the number of followers to the desired number.
- Extracting tweets with a particular word
Here we will try and extract all tweets which have a particular word in them which we define.
t.Search = "analytics"
t.Store_object = True
t.Limit = 10
twint.run.Search(t)
tlist = t.search_tweet_list
print(tlist)
The output contains tweet from different users with their usernames and tweet along with the date when a tweet is published.
- Tweets of a particular User
We can also extract tweets from different users by entering their username as the parameter.
t.Search = "from:@Analyticsindiam"
t.Store_object = True
t.Limit = 10
twint.run.Search(t)
tlist = t.search_tweet_list
Here we can see some recent tweets from Analytics India Magazine along with their username and date on which they were published.
These are some of the ways with which we can extract data or scrape data from twitter using twint. Twint contributors are actively contributing to making it better and better day by day.
Conclusion:
In this article, we saw how we can use twint to extract data from twitter. We started with scraping the followers a person has on twitter further we saw how we can store them in a data frame. We also saw how to extract tweets with a particular string or tweets from a particular user. Twint is easy to easy and is blazingly fast with frequent updates.