########### A Web crawler using asyncio coroutines ##############

This program is based on python asyncio library and requires python 3.

After implementing a crawler which was processing urls in a serial way, I decided to adopt an approach that would allow multiple urls to be fetched at the same time.

This crawler is based on the following tutorial:

http://aosabook.org/en/500L/a-web-crawler-with-asyncio-coroutines.html

Create a local virtual environment. Change path/to/python3 so that it points to your python3 interpreter.

virtualenv --python=path/to/python3 env/
Install requirements:

env/bin/pip install -r requirements.txt
Run as:

env/bin/python3 main.py --target='www.bbc.co.uk' --max_redirect=10 --max_tasks=10

--target defaults to 'http://www.bbc.co.uk/' if omitted --max_redirect defaults to 10 if omitted --max_tasks defaults to 10 if omitted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls