Skip to content
Change the repository type filter

All

    Repositories list

    • A client interface for Scrapinghub's API
      Python
      BSD 3-Clause "New" or "Revised" License
      63204242Updated Feb 6, 2025Feb 6, 2025
    • Extract price amount and currency symbol from a raw text string
      Python
      BSD 3-Clause "New" or "Revised" License
      50320179Updated Feb 5, 2025Feb 5, 2025
    • frontera

      Public
      A scalable frontier for web crawlers
      Python
      BSD 3-Clause "New" or "Revised" License
      2181.3k7919Updated Feb 5, 2025Feb 5, 2025
    • extruct

      Public
      Extract embedded metadata from HTML markup
      Python
      BSD 3-Clause "New" or "Revised" License
      1138783814Updated Feb 5, 2025Feb 5, 2025
    • andi

      Public
      Library for annotation-based dependency injection
      Python
      BSD 3-Clause "New" or "Revised" License
      52231Updated Feb 5, 2025Feb 5, 2025
    • Crawl Frontier HCF backend
      Python
      BSD 3-Clause "New" or "Revised" License
      5722Updated Feb 5, 2025Feb 5, 2025
    • python parser for human readable dates
      Python
      BSD 3-Clause "New" or "Revised" License
      4682.6k29150Updated Feb 5, 2025Feb 5, 2025
    • web-poet

      Public
      Web scraping Page Objects core library
      Python
      BSD 3-Clause "New" or "Revised" License
      15961613Updated Jan 30, 2025Jan 30, 2025
    • Page Object pattern for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      28117104Updated Jan 29, 2025Jan 29, 2025
    • Python
      BSD 3-Clause "New" or "Revised" License
      151332Updated Jan 24, 2025Jan 24, 2025
    • Software stack with latest Scrapy and updated deps
      Dockerfile
      BSD 3-Clause "New" or "Revised" License
      206321Updated Jan 6, 2025Jan 6, 2025
    • Scrapy entrypoint for Scrapinghub job runner
      Python
      BSD 3-Clause "New" or "Revised" License
      162581Updated Jan 6, 2025Jan 6, 2025
    • spidermon

      Public
      Scrapy Extension for monitoring spiders execution.
      Python
      BSD 3-Clause "New" or "Revised" License
      100536427Updated Dec 10, 2024Dec 10, 2024
    • More flexible and featured Frontera scheduler for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      53622Updated Nov 29, 2024Nov 29, 2024
    • Python Social Auth - Application - Django
      Python
      BSD 3-Clause "New" or "Revised" License
      383201Updated Nov 18, 2024Nov 18, 2024
    • Formasaurus tells you the type of an HTML form and its fields using machine learning
      HTML
      48710Updated Nov 7, 2024Nov 7, 2024
    • Parse numbers written in natural language
      Python
      BSD 3-Clause "New" or "Revised" License
      23109126Updated Oct 23, 2024Oct 23, 2024
    • A python binding for crfsuite
      Python
      MIT License
      222772453Updated Oct 1, 2024Oct 1, 2024
    • streamparse lets you run Python code against real-time streams of data. Integrates with Apache Storm.
      Python
      Apache License 2.0
      218201Updated Sep 20, 2024Sep 20, 2024
    • splash

      Public
      Lightweight, scriptable browser as a service with an HTTP API
      Python
      BSD 3-Clause "New" or "Revised" License
      5124.1k37726Updated Aug 2, 2024Aug 2, 2024
    • A Postgres-backed ContentsManager implementation for IPython
      Python
      Apache License 2.0
      85201Updated Jul 18, 2024Jul 18, 2024
    • shublang

      Public
      Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
      Python
      BSD 3-Clause "New" or "Revised" License
      815236Updated Jul 9, 2024Jul 9, 2024
    • An opinionated fork of the Drone CI system
      Go
      Other
      385005Updated Jul 7, 2024Jul 7, 2024
    • varanus

      Public
      A command line spider monitoring tool
      Python
      7822Updated Jul 6, 2024Jul 6, 2024
    • scrapyrt

      Public
      HTTP API for Scrapy spiders
      Python
      BSD 3-Clause "New" or "Revised" License
      161847246Updated Jun 28, 2024Jun 28, 2024
    • portia

      Public
      Visual scraping for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      1.4k9.3k11119Updated Jun 26, 2024Jun 26, 2024
    • scikit-learn inspired API for CRFsuite
      Python
      215200Updated Jun 18, 2024Jun 18, 2024
    • Python
      MIT License
      2403Updated Jun 17, 2024Jun 17, 2024
    • autologin

      Public
      A project to attempt to automatically login to a website given a single seed
      Python
      Apache License 2.0
      431102Updated Jun 17, 2024Jun 17, 2024
    • Python wrapper for the Intercom API.
      Python
      Other
      146101Updated Jun 17, 2024Jun 17, 2024