Building Blocks

These are the basic nouns of scrapeR. A spider is made of a series of steps that are run sequentially on each item in its queue. A pipeline can contain a series of generic steps to be reused at the end of each scrapeR. A runner can be used to run multiple spiders at once.

spider()

Spider for crawling urls

pipeline()

Collection of generic steps to append to spider

runner()

runner

parser() transformer()

Steps for Spiders and Pipelines

Helpers

add_queue()

Add items to a spider queue

add_step() add_parser() add_transformer()

Add steps to a pipeline or spider.

run()

Run a spider or a runner

set_name()

Rename a spider

set_pipeline()

pipeline sets the pipeline used by the given spider

Prebuilt Steps

These build in steps can be added to different to a spider or pipeline using the add_step function.

t_bind_rows()

bind rows

t_clean_names()

clean names

t_save_aws()

write results to aws s3

t_save_output()

save output

p_read_html()

read html