First, ensure that the config file exists at the following location (per your OS):
OS | Path |
---|---|
Linux | $HOME/.config/Rustle/config.toml |
MacOS | $HOME/Library/Application Support/Rustle/config.toml |
Then just use:
rustle
Example config.toml
file:
origin_url = "https://example.com"
depth = 6
database_name = "crawler"
-
To configure logging, this program uses the
RUST_LOG
environment variable, with options:error
warn
info
debug
trace
-
Example:
RUST_LOG=info rustle
- Abstract code & functionality into structs & other files
- Use SQLite to store information about websites, instead of downloading HTML
- Recursion fix, specify depth
- config file parsing to specify origin url & depth
- Parallel / distributed crawling
- Obey
robots.txt