BeautifulSoup 4 Python Web Scraping to CSV Excel File

In this tutorial we do some web scraping with Python and Beautiful Soup 4. The results are then saved to a CSV file which can be opened and analyzed in Microsoft Excel or another spreadsheet program. I show you how to select elements from the page, deal with 403 Forbidden errors by faking your user agent, and overcome cases where the website is poorly laid out for web scraping. The example used here is SocialBlade.

Installation:

$ pip install beautifulsoup4

Source Code:

import urllib2
import csv
import re
from bs4 import BeautifulSoup

rank_page = 'https://socialblade.com/youtube/top/50/mostviewed'
request = urllib2.Request(rank_page, headers={'User-Agent': 'your user-agent'})
page = urllib2.urlopen(request)
soup = BeautifulSoup(page, 'html.parser')

channels = soup.find('div', attrs={'style': 'float: right; width: 900px;'}).find_all('div', recursive=False)[4:]

file = open('topyoutubers.csv', 'wb')
writer = csv.writer(file)

# write title row
writer.writerow(['Username', 'Uploads', 'Views'])

for channel in channels:
    username = channel.find('div', attrs={'style': 'float: left; width: 350px; line-height: 25px;'}).a.text.strip()
    uploads = channel.find('div', attrs={'style': 'float: left; width: 80px;'}).span.text.strip()
    views = channel.find_all('div', attrs={'style': 'float: left; width: 150px;'})[1].span.text.strip()

    print username + ' ' + uploads + ' ' + views
    writer.writerow([username.encode('utf-8'), uploads.encode('utf-8'), views.encode('utf-8')])

file.close()

2 thoughts on “BeautifulSoup 4 Python Web Scraping to CSV Excel File”

Comments are closed.