GitHub

########### A Web crawler using asyncio coroutines ##############

This program is based on python asyncio library and requires python 3.

After implementing a crawler which was processing urls in a serial way, I decided to adopt an approach that would allow multiple urls to be fetched at the same time.

This crawler is based on the following tutorial:

http://aosabook.org/en/500L/a-web-crawler-with-asyncio-coroutines.html

Create a local virtual environment. Change path/to/python3 so that it points to your python3 interpreter.

virtualenv --python=path/to/python3 env/
Install requirements:

env/bin/pip install -r requirements.txt
Run as:

env/bin/python3 main.py --target='www.bbc.co.uk' --max_redirect=10 --max_tasks=10

--target defaults to 'http://www.bbc.co.uk/' if omitted --max_redirect defaults to 10 if omitted --max_tasks defaults to 10 if omitted

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
crawler.py		crawler.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

RiccardoTonini/async_crawler

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages