GitHub

Python webscrapper for the website AsuraScans.https://asuratoon.com

This only works on this website and will not work anywhere else without modifing the main data pulling methods.

Libraries: -json -requests -os -re -bs4 -PIL -io

The info for each book is stored in 3 classes, Page/Chapter/Manga. Each class has functions meant for debugging perposes, for sorting the info, and for managing if this specific info is downloaded or not(check).

The info from those classes are stored in a json file and saved in the "Data-Folder" that is later opened in the code to get the links so that the webscrapper knows where to go to look for the info to scrape.

The info that is saved for Manga: -title -url -check

Chapter: -title -url -check

Page: -number -url -check

The use of check is so that if it ever is false then the code will use the url to download the info that it needs be it the page, chapter, or book. Reason was that without this meathod it was really slow when checking to see if something is missing but now it use has to see in the json which is faster then loading it one by one in the scrape because of the lag in the response from the website.

The most used functions in the code are: -get_page_soup(url): returns an html element -get_soup(html) -convert(url) -load_manga(title)

That is it for now, there is still alot more to explain, this code is way more complex then I remeber it being when I wrote it but for now this should explain the basics of the info gathering storing and loading.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
SuperMangaDownloader.py		SuperMangaDownloader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

GoldenDragonJD/webscrapper

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages