Building BlocksThese are the basic nouns of scrapeR. A spider is made of a series of steps that are run sequentially on each item in its queue. A pipeline can contain a series of generic steps to be reused at the end of each scrapeR. A runner can be used to run multiple spiders at once. |
|
---|---|
Spider for crawling urls |
|
Collection of generic steps to append to spider |
|
runner |
|
Steps for Spiders and Pipelines |
|
Helpers |
|
Add items to a spider queue |
|
Add steps to a pipeline or spider. |
|
Run a spider or a runner |
|
Rename a spider |
|
pipeline sets the pipeline used by the given spider |
|
Prebuilt StepsThese build in steps can be added to different to a spider or pipeline using the add_step function. |
|
bind rows |
|
clean names |
|
write results to aws s3 |
|
save output |
|
read html |