Launch Week Day 1: Announcing Security Design Review
HIGH 7.5 PyPI

Scrapy: Arbitrary Module Import via Referrer-Policy Header in RefererMiddleware

GHSA-cwxj-rr6w-m6w7

Published ยท Modified

Description

Impact

Since version 1.4.0, Scrapy respects the Referrer-Policy response header to decide whether and how to set a Referer header on follow-up requests.

If the header value looked like a valid Python import path, Scrapy would import the referenced object and call it, assuming it referred to a referrer policy class (for example, scrapy.spidermiddlewares.referer.DefaultReferrerPolicy) and attempting to instantiate it to handle the Referer header.

A malicious site could exploit this by setting Referrer-Policy to a path such as sys.exit, causing Scrapy to import and execute it and potentially terminate the process.

Patches

Upgrade to Scrapy 2.14.2 (or later).

Workarounds

If you cannot upgrade to Scrapy 2.14.2, consider the following mitigations.

  • Disable the middleware: If you don't need the Referer header on follow-up requests, set REFERER_ENABLED to False.
  • Set headers manually: If you do need a Referer, disable the middleware and set the header explicitly on the requests that require it.
  • Set referrer_policy in request metadata: If disabling the middleware is not viable, set the referrer_policy request meta key on all requests to prevent evaluating preceding responses' Referrer-Policy. For example:
Request(
    url,
    meta={
        "referrer_policy": "scrapy.spidermiddlewares.referer.DefaultReferrerPolicy",
    },
)

Instead of editing requests individually, you can:

  • implement a custom spider middleware that runs before the built-in referrer policy middleware and sets the referrer_policy meta key; or
  • set the meta key in start requests and use the scrapy-sticky-meta-params plugin to propagate it to follow-up requests.

If you want to continue respecting legitimate Referrer-Policy headers while protecting against malicious ones, disable the built-in referrer policy middleware by setting it to None in SPIDER_MIDDLEWARES and replace it with the fixed implementation from Scrapy 2.14.2.

If the Scrapy 2.14.2 implementation is incompatible with your project (for example, because your Scrapy version is older), copy the corresponding middleware from your Scrapy version, apply the same patch, and use that as a replacement.

Ready to move

Start Securing

Free, no credit card | First findings in minutes