Skip to content

Commit

Permalink
add script to run all crawls and generate csv
Browse files Browse the repository at this point in the history
  • Loading branch information
shaneaevans committed Apr 15, 2014
1 parent 64ed962 commit b8803eb
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@ Run a spider:

scrapy crawl us.pycon.org

Run all spiders and generate a data.csv file:

run.sh

Scrapy Cloud Test Project
-------------------------

Expand Down
14 changes: 14 additions & 0 deletions run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/bin/bash

for spider in $(scrapy list)
do
# put all the data in separate files to make it easier to trace
# data back to the spider
scrapy crawl $spider -o data/$spider.csv -t csv
done

# dedupe and merge

# should generate a single header (if not, we have inconsistent data)
head -n 1 -q data/*.csv | sort -u > alldata.csv
tail -q -n +2 data/*.csv | sort -u >> data.csv

0 comments on commit b8803eb

Please sign in to comment.