Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

解决 up 主个人主页 URL 变化及其他问题 Solve URL Changes and Other Error Issues on Up - master's Personal Homepage #3039

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

shengrihui
Copy link

问题描述

1. URL 变化问题

也许是由于 B 站更新,部分页面的 URL 发生了改变。在我的电脑上,Chrome 和 Edge 浏览器显示的 URL 存在新旧差异。具体变化如下表所示:

页面 之前 变化后
合集列表(之前叫“TA 的合集和视频列表”) https://space.bilibili.com/22179951/channel/series https://space.bilibili.com/22179951/lists
id=3152230 的合集 https://space.bilibili.com/22179951/channel/collectiondetail?sid=3152230 https://space.bilibili.com/22179951/lists/3152230?type=season
id=485635 的视频列表 https://space.bilibili.com/22179951/channel/seriesdetail?sid=485635 https://space.bilibili.com/22179951/lists/485635?type=series
up 主的所有视频 https://space.bilibili.com/22179951/video https://space.bilibili.com/22179951/upload/video

新的 URL 无法被 you-get 匹配,会报错 you-get: [Error] Unsupported URL pattern.。旧的 URL 也存在部分无法匹配和报错的情况。

报错示例如下:

(you-get) E:\you-get-download\2>you-get "https://space.bilibili.com/22179951/channel/series" -d
[DEBUG] get_content: https://space.bilibili.com/22179951/channel/series
[DEBUG] get_content: https://space.bilibili.com/22179951/channel/series
you-get: [Error] Unsupported URL pattern.
(you-get) E:\you-get-download\3>you-get -d https://space.bilibili.com/22179951/video
[DEBUG] get_content: https://space.bilibili.com/22179951/video
[DEBUG] get_content: https://space.bilibili.com/22179951/video
[DEBUG] get_content: https://api.bilibili.com/x/space/arc/search?mid=22179951&pn=1&ps=50&tid=0&keyword=&order=pubdate&jsonp=jsonp
you-get: version 0.4.1743, a tiny downloader that scrapes the web.
you-get: Namespace(version=False, help=False, info=False, url=False, json=False, no_merge=False, no_caption=False, postfix=False, prefix=None, force=False, skip_existing_file_size_check=False, format=None, output_filename=None, output_dir='.', player=None, cookies=None, timeout=600, debug=True, input_file=None, password=None, playlist=False, first=None, last=None, size=None, auto_rename=False, insecure=False, http_proxy=None, extractor_proxy=None, no_proxy=False, socks_proxy=None, stream=None, itag=None, m3u8=False, URL=['https://space.bilibili.com/22179951/video'])
Traceback (most recent call last):
  File "\\?\C:\Users\11200\anaconda3\envs\you-get\Scripts\you-get-script.py", line 33, in <module>
    sys.exit(load_entry_point('you-get', 'console_scripts', 'you-get')())
  File "e:\cs\you-get\src\you_get\__main__.py", line 92, in main
    main(**kwargs)
  File "e:\cs\you-get\src\you_get\common.py", line 1883, in main
    script_main(any_download, any_download_playlist, **kwargs)
  File "e:\cs\you-get\src\you_get\common.py", line 1772, in script_main
    download_main(
  File "e:\cs\you-get\src\you_get\common.py", line 1386, in download_main
    download(url, **kwargs)
  File "e:\cs\you-get\src\you_get\common.py", line 1874, in any_download
    m.download(url, **kwargs)
  File "e:\cs\you-get\src\you_get\extractor.py", line 48, in download_by_url
    self.prepare(**kwargs)
  File "e:\cs\you-get\src\you_get\extractors\bilibili.py", line 208, in prepare
    self.download_playlist_by_url(self.url, **kwargs)
  File "e:\cs\you-get\src\you_get\extractors\bilibili.py", line 823, in download_playlist_by_url
    pc = math.ceil(videos_info['data']['page']['count'] / videos_info['data']['page']['ps'])
KeyError: 'data'

2. playlist 下载结束问题

playlist 下载结束后,由于仍会在download_by_url函数中执行self.extract(**kwargs)self.download(**kwargs),会导致

  1. 由于self.stream_sorted 为空,在 stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag'](位于 download 函数中)导致 IndexError: list index out of range 错误。
  2. 输出不必要的信息。如下:
site:                Bilibili
title:               None
streams:             # Available quality and codecs

3. 视频跳转和缺失问题

在下载所有视频(或合集等)时,会遇到部分视频跳转到课程页面,或者视频已不存在但仍在合集里显示的情况,进而导致程序出错。

解决方案

1. URL 重定向与 API 更新

如果输入的是旧的 URL,将其重定向到对应的新 URL。并更新部分 API。

2. 空列表判断

加入判断 if not self.streams_sorted: return,避免因 self.stream_sorted 为空而引发错误。

3. 异常处理

使用 try...except... 语句暂时处理视频缺失问题,将课程下载标记为 TODO。

待解决问题

1. 下载速度问题

下载速度较慢,不理解 skip_existing_file_size_check 参数的作用。代码中多处存在类似 file_size == os.path.getsize(filepath) or skip_existing_file_size_check 语句,感觉在使用该参数时仍需检查文件大小,没有体现出 “跳过” 的作用。

2. 下载结果反馈问题

建议在下载结束时列出哪些视频下载成功、哪些视频下载失败,以便用户了解下载情况。


Problem Description

1. URL Changes

Perhaps due to the update of Bilibili, the URLs of some pages have changed. On my computer, there are differences between the old and new URLs displayed in Chrome and Edge browsers. The specific changes are as shown in the table above:

The new URLs cannot be matched by the program, resulting in the error you-get: [Error] Unsupported URL pattern.. Some of the old URLs also cannot be matched and will cause errors.

Examples of error reports can be found above.

2. Issue after playlist Download Completion

After the playlist download is completed, since self.extract(**kwargs) and self.download(**kwargs) will still be executed in the download_by_url function, it will lead to the following problems:

  1. As self.stream_sorted is empty, in the statement stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag'] (located in the download function), it will result in an IndexError: list index out of range error.
  2. Unnecessary information will be output.

3. Video Redirection and Missing Issues

When downloading all videos (or collections, etc.), some videos may jump to course pages, or some videos may no longer exist but are still displayed in the list, causing the program to error.

Solutions

1. URL Redirection and API Update

If an old URL is entered, redirect it to the corresponding new URL and update some APIs.

2. Empty List Check

Add the check if not self.streams_sorted: return to the relevant code to avoid errors caused by an empty self.stream_sorted list.

3. Exception Handling

Use try...except... statements to temporarily handle video redirection and missing issues. Mark course downloads as TODO.

Outstanding Issues

1. Download Speed Issue

The download speed is slow, and I don't understand the function of the skip_existing_file_size_check parameter. There are many statements like file_size == os.path.getsize(filepath) or skip_existing_file_size_check in the code. It seems that the file size still needs to be checked when using this parameter, and the "skip" function is not reflected.

2. Download Result Feedback Issue

It is recommended to list which videos were downloaded successfully and which failed at the end of the download so that users can understand the download status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant