Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new Bots report #7979

Open
15 tasks done
matomoto opened this issue Jan 29, 2025 · 7 comments
Open
15 tasks done

new Bots report #7979

matomoto opened this issue Jan 29, 2025 · 7 comments

Comments

@matomoto
Copy link

matomoto commented Jan 29, 2025

new Bots and similar

  • Mozilla/5.0 (compatible; BacklinksExtendedBot)
  • WhatsApp/2
  • NetworkingExtension/8619.1.26.30.5 Network/4277.2.5 iOS/18.0
  • NetworkingExtension/8619.2.8.10.7 Network/4277.42.2 iOS/18.1
  • NetworkingExtension/8619.2.8.10.9 CFNetwork/1568.200.51 Darwin/24.1.0
  • NetworkingExtension/8619.2.8.10.9 Network/4277.42.2 iOS/18.1.1
  • NetworkingExtension/8620.1.16.10.11 Network/4277.60.255 iOS/18.2.1
  • NetworkingExtension/8620.2.4.10.7 Network/4277.82.1 iOS/18.3
  • WhatsApp/2.2503.5 W
  • WhatsApp/2.23.20.0
  • curl/8.3.0
  • Research JLU
  • Snap URL Preview Service; bot; snapchat; https://developers.snap.com/robots
  • WebexTeams
  • python-requests/2.32.3
liviuconcioiu added a commit to liviuconcioiu/device-detector that referenced this issue Jan 30, 2025
liviuconcioiu added a commit to liviuconcioiu/device-detector that referenced this issue Jan 30, 2025
liviuconcioiu added a commit to liviuconcioiu/device-detector that referenced this issue Jan 30, 2025
liviuconcioiu added a commit to liviuconcioiu/device-detector that referenced this issue Jan 30, 2025
@liviuconcioiu
Copy link
Collaborator

Research JLU is detected as generic bot. I can't find any info about it.

sanchezzzhak pushed a commit that referenced this issue Jan 30, 2025
…ion for Semrush bots (#7980)

* Adds detection for BacklinksExtendedBot
* Adds detection for Webex Teams
* Adds detection for Telegram
* Adds detection for OpenVAS

ref #7979
@matomoto
Copy link
Author

matomoto commented Jan 30, 2025

Research JLU → Justus-Liebig-Universität Gießen (ISP: Telefonica Germany GmbH & Co.OHG)


Preselection:

  • Opera%20GX/2503 CFNetwork/1496.0.7 Darwin/23.5.0
  • Opera/1 CFNetwork/1568.100.1 Darwin/24.0.0
  • Opera/0 CFNetwork/1568.200.51 Darwin/24.1.0
  • MobileSafari/8615.3.12.10.2 CFNetwork/1410.0.3 Darwin/22.6.0
  • MobileSafari/8619.2.8.10.9 CFNetwork/1568.200.51 Darwin/24.1.0
  • MobileSafari/8620.2.2 CFNetwork/3826.400.101 Darwin/24.3.0
  • MobileSafari/8618.3.11.10.5 CFNetwork/1498.700.2 Darwin/23.6.0
  • GoogleOther
  • Go-http-client/1.1
  • appdb/1.4.4 (com.4sh2812.32u1982378; build:2875; iOS 18.1.1) Alamofire/3.5.0
  • appdb/1.4.4 (com.4sh2812.32u1982378; build:2875; iOS 18.3.0) Alamofire/3.5.0
  • LightspeedSystemsCrawler Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)
  • LightspeedSystemsCrawler Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US
  • DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 34)
  • ALittle Client
  • DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 31)
  • DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 33)
  • 'DuckDuckBot-Https/1.1; (+https://duckduckgo.com/duckduckbot)'
  • DuckDuckBot-Https/1.1; (+https://duckduckgo.com/duckduckbot)
  • IDG/EU (http://spaziodati.eu/)
  • Mozilla/5.0 (compatible)

@MatomoForumNotifications

This issue has been mentioned on Matomo forums. There might be relevant details there:

https://forum.matomo.org/t/bots-und-tracking-infos/61969/9

sanchezzzhak pushed a commit that referenced this issue Jan 30, 2025
…tection for Mobile Safari and Safari (#7981)

* Improves version detection for Android and Chrome OS
* Improves version detection for iOS and macOS
* Improves detection for Safari and Mobile Safari

ref #7979
liviuconcioiu added a commit to liviuconcioiu/device-detector that referenced this issue Jan 30, 2025
liviuconcioiu added a commit to liviuconcioiu/device-detector that referenced this issue Jan 30, 2025
liviuconcioiu added a commit to liviuconcioiu/device-detector that referenced this issue Jan 30, 2025
@matomoto
Copy link
Author

Have your already once reflect a negation regex like (^Mozilla\/)?

The most non-bot User Agents starts with Mozilla/.

@liviuconcioiu
Copy link
Collaborator

Have your already once reflect a negation regex like (^Mozilla\/)?

The most non-bot User Agents starts with Mozilla/.

I know, but I'm afraid we can't do that.

@matomoto
Copy link
Author

matomoto commented Feb 1, 2025

I know, but I'm afraid we can't do that.

OK. It's not really valid, because MobileSafari and Opera.

  • There are many entries like this:
    regex: 'Amazonbot/[\d.]+'
    regex: 'Discordbot/([\d+.]+)'
    Is the part /[\d.]+ or /([\d+.]+) really necessary? Not really. Without that it saves CPU.
    Example:
    Mozilla/5.0 (compatible; InternetMeasurement/1.0; +https://internet-measurement.com/)
    regex: 'InternetMeasurement/[\d.]+'
    It's for the version. It is in the most cases not really necessary.

  • not matched with regex: 'DuckDuck(?:Go-Favicons-)?Bot'
    DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 33)
    DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 35)

  • not matched with regex: 'IDG/IT'
    IDG/EU (http://spaziodati.eu/)

  • WhatsApp are no Bots?

WhatsApp/2
WhatsApp/2.2502.3 W
WhatsApp/2.2503.5 W
WhatsApp/2.23.20.0
  • Double entries:
- regex: 'SEOkicks-Robot'
  name: 'SEOkicks-Robot'
  category: 'Crawler'
  url: 'http://www.seokicks.de/robot.html'
  producer:
    name: 'SEOkicks'
    url: 'https://www.seokicks.de/'

- regex: 'SEOkicks'
  name: 'SEOkicks'
  category: 'Crawler'
  url: 'https://www.seokicks.de/robot.html'
  • Virtual Machine
    Dalvik/2.1.0 (Linux; U; Android 9.0; ZTE BA520 Build/MRA58K)

  • Very strange User Agent:
    Mozilla/5.0 (Linux; Android 10; HD1900 Build/QKQ1.190716.003; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/75.0.3770.156 Mobile Safari/537.36 aweme_230400 JsSdk/1.0 NetType/WIFI AppName/aweme app_version/23.4.0 ByteLocale/zh-CN Region/CN AppSkin/white AppTheme/light BytedanceWebview/d8a21c6 WebView/075113004008

New collection:

  • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
  • got (https://github.com/sindresorhus/got)
  • Apache/2.4.34 (Ubuntu) OpenSSL/1.1.1 (internal dummy connection)
  • Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)
  • axios/1.7.2
  • python-requests/2.27.1
  • python-requests/2.22.0
  • Dalvik/2.1.0 (Linux; U; Android 9.0; ZTE BA520 Build/MRA58K)
  • hgfAlphaXCrawl/1.0 (+https://www.fim.uni-passau.de/data-science/forschung/open-search)
  • Go-http-client/2.0
  • Go-http-client/1.1
  • WebexTeams
  • curl/7.61.1
  • curl/8.3.0

@liviuconcioiu
Copy link
Collaborator

 not matched with `regex: 'DuckDuck(?:Go-Favicons-)?Bot'`
  `DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 33)`
  `DuckDuckGo/5 (com.duckduckgo.mobile.android; Android API 35)`

These aren't bots, but DuckDuckGo Privacy Browser running on Android.

WhatsApp are no Bots?

WhatsApp/2
WhatsApp/2.2502.3 W
WhatsApp/2.2503.5 W
WhatsApp/2.23.20.0

See #5463

  [ ]  Virtual Machine
  `Dalvik/2.1.0 (Linux; U; Android 9.0; ZTE BA520 Build/MRA58K)`

We can't detect this as VM, even if it is. It has the user-agent of a real device.

  [ ]  Very strange User Agent:
  `Mozilla/5.0 (Linux; Android 10; HD1900 Build/QKQ1.190716.003; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/75.0.3770.156 Mobile Safari/537.36  aweme_230400 JsSdk/1.0 NetType/WIFI  AppName/aweme app_version/23.4.0 ByteLocale/zh-CN Region/CN AppSkin/white AppTheme/light BytedanceWebview/d8a21c6 WebView/075113004008`

Already detected as Douyin app.

New collection:

Can you check here http://devicedetector.net/ before you post new user agents?

Most libraries, are bots, but we can't detect them as bots since those are libraries used by thousands of people. I'm afraid you will need something custom or you need to do the changes yourself, in the code logic.

liviuconcioiu added a commit to liviuconcioiu/device-detector that referenced this issue Feb 3, 2025
liviuconcioiu added a commit to liviuconcioiu/device-detector that referenced this issue Feb 3, 2025
liviuconcioiu added a commit to liviuconcioiu/device-detector that referenced this issue Feb 3, 2025
liviuconcioiu added a commit to liviuconcioiu/device-detector that referenced this issue Feb 3, 2025
liviuconcioiu added a commit that referenced this issue Feb 4, 2025
* Adds detection for LightspeedSystemsCrawler

ref #7979

* Adds detection for appdb

ref #7979

* Adds detection for Research JLU

ref #7979

* Adds detection for AlphaXCrawl

ref #7979

* Remove duplicate SEOkicks regex

ref #7979

* Improves detection for IDG

ref #7979

* Adds detection for Apache

ref #7979

* Fix test

* Adds detection for Chatwork LinkPreview

* Adds detection for WPMU DEV

* Adds detection for vimeo.php
liviuconcioiu added a commit to liviuconcioiu/device-detector that referenced this issue Feb 6, 2025
sanchezzzhak pushed a commit that referenced this issue Feb 6, 2025
* Improves detection for generic bots
* Adds detection for PHP
* Improves detection for generic bots
* Improves detection for generic bots
* Adds detection for SnoopSecInspect
* Improves detection for generic bots
* Adds detection for ModatScanner
* Adds detection for researchcyber.net
* Adds detection for CrystalSemanticsBot
* Improves detection for generic bots
* Improves detection for PHP
* Adds detection for go-network
* Adds detection for najdu.s.holubem.eu
* Improves detection for Siteimprove

ref #7979
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants