Tech Monger

Programming, Web Development and Computer Science.

Skip to main content| Skip to information by topic

Configure Fake User Agent in Scrapy Project

If sites you are crawling with scrapy dont respond to your request then you should use randomly generated user agent in your request. Scrapy Fake User Agent is one of the open source and useful extension which will help you evade bot detection programs easily.


Install Scrapy Fake Useragent

pip install scrapy-fake-useragent

Configure Fake User Agent

Fake User Agent can be configured in scrapy by disabling scapy's default UserAgentMiddleware and activating RandomUserAgentMiddleware inside DOWNLOADER_MIDDLEWARES.

You can configure random user agent middleware in a couple of ways.


Spider Level Configuration

To configure site specific random user agent you should override global settings by defining DOWNLOADER_MIDDLEWARES inside custom_settings of site's spider like below.

import scrapy

class MySpider(scrapy.Spider):
    name = "myspider"

    custom_settings = {
        'DOWNLOADER_MIDDLEWARES' : {
                'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
                'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
            }
    }

    def start_requests(self):
        urls = ["https://example.com"]

        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        print response.request.headers

Project Level Configuration

To configure fake user agent globally at project level you should modify global settings.py present inside project directory. This will make sure that all sites crawled using current scrapy project will be requested using fake user agent middleware.

# Enable or disable downloader middlewares
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
}

Tagged Under : Open Source Python Scrapy Web