v1.4.1
Extraction:
- extraction bugs fixed (#263, #266), more robust HTML doctype parsing
- XML output improvements by @knit-bee (#273, #274)
- adjust thresholds for link density in paragraphs
Metadata:
- improved title and sitename detection (#284)
- faster author, categories, domain name, and tags extraction
- fixes to author emoji regexes by @felipehertzer (#269)
Command-line interface:
- review argument consistency and add deprecation warnings (#261)
Setup:
- make download timeout configurable (#263)
- updated dependencies, use of faust-cchardet for Python 3.11
Full Changelog: v1.4.0...v1.4.1