Spider metadata
This library allows retrieving spider metadata defined in spider classes.
If a spider class defines spider parameters, their schema will also be included in the retrieved metadata.
Defining spider metadata
You can declare arbitrary metadata in your spider classes as a dictionary
attribute named metadata
:
from scrapy import Spider
class MySpider(Spider):
name = "my_spider"
metadata = {
"description": "This is my spider.",
"category": "My basic spiders",
}
As this attribute is shared between instances of the class and of its subclasses, be careful not to modify it in place. Here is a simple way to add or change some values in a subclass:
from scrapy import Spider
class BaseSpider(Spider):
metadata = {
"description": "Base spider.",
"category": "Base spiders",
}
class BaseNewsSpider(BaseSpider):
metadata = {
**BaseSpider.metadata,
"description": "Base news spider.",
}
class CNNSpider(BaseNewsSpider):
metadata = {
**BaseNewsSpider.metadata,
"description": "CNN spider.",
"category": "Concrete spiders",
"website": "CNN",
}
Getting spider metadata
scrapy-spider-metadata provides the following function for retrieving the metadata for a specific spider class:
- scrapy_spider_metadata.get_spider_metadata(spider_cls: Type[Spider], *, normalize: bool = False) Dict[str, Any] [source]
Return the metadata for the spider class.
Return a copy of the
metadata
dict. If the spider class defines spider parameters, the returned dict will have an additionalparam_schema
key which value is the JSON Schema for the parameters.- Parameters:
spider_cls – The spider class.
normalize – Normalize the returned schema.
- Returns:
The complete spider metadata.