Skip to content

Commit

Permalink
feat: 发布 2.2 版本
Browse files Browse the repository at this point in the history
1. 移除 sec_ch_ua_platform 参数
2. 移除 sec_ch_ua 参数
3. 优化请求延时间隔
4. 优化并发下载功能
5. 修正英语翻译错误
6. 新增并发下载限制
7. 修正命令行模式错误
8. 简化数据请求头

Closes #86
Closes #87
Closes #93
Closes #98
Closes #105
Closes #109
Closes #110
Closes #140
Closes #152
Closes #154
Closes #157
Closes #159
Closes #160
Closes #162
Closes #164
Closes #165
  • Loading branch information
JoeanAmier committed Aug 30, 2024
1 parent 44d5c61 commit 0a52dc0
Show file tree
Hide file tree
Showing 24 changed files with 272 additions and 240 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM python:3.12.4-slim

LABEL name="XHS-Downloader" version="2.2 Beta" authors="JoeanAmier"
LABEL name="XHS-Downloader" version="2.2" authors="JoeanAmier"

COPY locale /locale
COPY source /source
Expand Down
55 changes: 29 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
<h1>🔗 支持链接</h1>
<ul>
<li><code>https://www.xiaohongshu.com/explore/作品ID</code></li>
<li><code>https://www.xiaohongshu.com/explore/作品ID?xsec_token=XXX</code></li>
<li><code>https://www.xiaohongshu.com/discovery/item/作品ID</code></li>
<li><code>https://xhslink.com/分享码</code></li>
<br/>
Expand All @@ -64,7 +65,7 @@
<p>⭐ 推荐使用 <a href="https://learn.microsoft.com/zh-cn/windows/terminal/install">Windows 终端</a> (Windows 11 默认终端)运行程序以便获得最佳显示效果!</p>
<h1>🥣 使用方法</h1>
<p>如果仅需下载无水印作品文件,建议选择 <b>程序运行</b> 或 <b>Docker 运行</b>;如果有其他需求,建议选择 <b>源码运行</b>!</p>
<p>建议自行设置 <code>cookie</code> 参数,若不设置该参数,程序功能可能无法正常使用!</p>
<p><del>建议自行设置 <code>cookie</code> 参数,若不设置该参数,程序功能可能无法正常使用!</del></p>
<h2>🖱 程序运行</h2>
<p>Mac OS、Windows 10 及以上用户可前往 <a href="https://github.com/JoeanAmier/XHS-Downloader/releases/latest">Releases</a> 下载程序压缩包,解压后打开程序文件夹,双击运行 <code>main</code> 即可使用。</p>
<p><strong>注意:Mac OS 平台可执行文件 <code>main</code> 可能需要从终端命令行启动;受设备限制,Mac OS 平台可执行文件尚未经过测试,无法保证可用性!</strong></p>
Expand Down Expand Up @@ -93,7 +94,7 @@
</ol>
<h1>🛠 命令行模式</h1>
<p>项目支持命令行运行模式,若想要下载图文作品的部分图片,可以使用此模式设置需要下载的图片序号!</p>
<p>可以使用命令行<b>从浏览器读取 Cookie 并写入配置文件</b>!注意需要关闭浏览器才能读取数据!</p>
<p>可以使用命令行 <b>从浏览器读取 Cookie 并写入配置文件</b></p>
<p>命令示例:<code>python .\main.py --browser_cookie Chrome --update_settings</code></p>
<p><code>bool</code> 类型参数支持使用 <code>true</code>、<code>false</code>、<code>1</code>、<code>0</code>、<code>yes</code>、<code>no</code>、<code>on</code> 或 <code>off</code>(不区分大小写)来设置。</p>
<hr>
Expand Down Expand Up @@ -180,10 +181,10 @@ async def example():
work_path = "D:\\" # 作品数据/文件保存根路径,默认值:项目根路径
folder_name = "Download" # 作品文件储存文件夹名称(自动创建),默认值:Download
name_format = "作品标题 作品描述"
sec_ch_ua = "" # 请求头 Sec-Ch-Ua
sec_ch_ua_platform = "" # 请求头 Sec-Ch-Ua-Platform
# sec_ch_ua = "" # 请求头 Sec-Ch-Ua
# sec_ch_ua_platform = "" # 请求头 Sec-Ch-Ua-Platform
user_agent = "" # User-Agent
cookie = "" # 小红书网页版 Cookie,无需登录,必需参数,登录状态对数据采集有影响
cookie = "" # 小红书网页版 Cookie,无需登录,可选参数,登录状态对数据采集有影响
proxy = None # 网络代理
timeout = 5 # 请求数据超时限制,单位:秒,默认值:10
chunk = 1024 * 1024 * 10 # 下载文件时,每次从服务器获取的数据块大小,单位:字节
Expand All @@ -193,26 +194,27 @@ async def example():
folder_mode = False # 是否将每个作品的文件储存至单独的文件夹
# async with XHS() as xhs:
# pass # 使用默认参数
async with XHS(work_path=work_path,
folder_name=folder_name,
name_format=name_format,
sec_ch_ua=sec_ch_ua,
sec_ch_ua_platform=sec_ch_ua_platform,
user_agent=user_agent,
cookie=cookie,
proxy=proxy,
timeout=timeout,
chunk=chunk,
max_retry=max_retry,
record_data=record_data,
image_format=image_format,
folder_mode=folder_mode,
) as xhs: # 使用自定义参数
async with XHS(
work_path=work_path,
folder_name=folder_name,
name_format=name_format,
# sec_ch_ua=sec_ch_ua,
# sec_ch_ua_platform=sec_ch_ua_platform,
user_agent=user_agent,
cookie=cookie,
proxy=proxy,
timeout=timeout,
chunk=chunk,
max_retry=max_retry,
record_data=record_data,
image_format=image_format,
folder_mode=folder_mode,
) as xhs: # 使用自定义参数
download = True # 是否下载作品文件,默认值:False
# 返回作品详细信息,包括下载地址
# 获取数据失败时返回空字典
print(await xhs.extract(error_link, download, ))
print(await xhs.extract(demo_link, download, ))
print(await xhs.extract(demo_link, download, index=[1, 2]))
# 支持传入多个作品链接
print(await xhs.extract(multiple_links, download, ))
</pre>
Expand Down Expand Up @@ -254,13 +256,13 @@ async def example():
<td align="center"><code>发布时间 作者昵称 作品标题</code></td>
</tr>
<tr>
<td align="center">sec_ch_ua</td>
<td align="center"><del>sec_ch_ua</del>(已废弃)</td>
<td align="center">str</td>
<td align="center">浏览器请求头 Sec-Ch-Ua</td>
<td align="center">内置 Chrome Sec-Ch-Ua</td>
</tr>
<tr>
<td align="center">sec_ch_ua_platform</td>
<td align="center"><del>sec_ch_ua_platform</del>(已废弃)</td>
<td align="center">str</td>
<td align="center">浏览器请求头 Sec-Ch-Ua-Platform</td>
<td align="center">内置 Chrome Sec-Ch-Ua-Platform</td>
Expand All @@ -274,12 +276,12 @@ async def example():
<tr>
<td align="center">cookie</td>
<td align="center">str</td>
<td align="center">小红书网页版 Cookie,<b>无需登录</b></td>
<td align="center">小红书网页版 Cookie,<b>无需登录,非必需参数!</b></td>
<td align="center">无</td>
</tr>
<tr>
<td align="center">proxy</td>
<td align="center">str|dict</td>
<td align="center">str | dict</td>
<td align="center">设置程序代理</td>
<td align="center">null</td>
</tr>
Expand Down Expand Up @@ -351,9 +353,10 @@ async def example():
</tr>
</tbody>
</table>
<p><b>其他说明:<code>sec_ch_ua</code>、<code>sec_ch_ua_platform</code>、<code>user_agent</code>参数获取示例,仅当程序获取数据失败时需要自行设置!</b></p>
<p><b>其他说明:<code>user_agent</code>参数获取示例;强烈建议根据实际浏览器信息进行设置!</b></p>
<img src="static/screenshot/请求头示例图.png" alt="">
<h1>🌐 Cookie</h1>
<p><code>2.2</code> 版本开始,项目功能无异常的情况下,无需额外处理 Cookie!</p>
<ol>
<li>打开浏览器(可选无痕模式启动),访问 <code>https://www.xiaohongshu.com/explore</code></li>
<li>登录小红书账号(可跳过)</li>
Expand Down
90 changes: 47 additions & 43 deletions README_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,30 +18,31 @@
<p>⭐ Due to the author's limited energy, I was unable to update the English document in a timely manner, and the content may have become outdated, partial translation is machine translation, the translation result may be incorrect, Suggest referring to Chinese documentation. If you want to contribute to translation, we warmly welcome you.</p>
<h1>📑 Project Features</h1>
<ul><b>Program Features</b>
<li>✅ Collect Xiaohongshu content information</li>
<li>✅ Extract Xiaohongshu content download addresses</li>
<li>✅ Download Xiaohongshu watermark-free content files</li>
<li>✅ Collect Xiaohongshu works information</li>
<li>✅ Extract Xiaohongshu works download addresses</li>
<li>✅ Download Xiaohongshu watermark-free works files</li>
<li>✅ Download Xiaohongshu livePhoto files (non-watermark-free)</li>
<li>✅ Automatically skip already downloaded content files</li>
<li>✅ Content file integrity handling mechanism</li>
<li>✅ Customizable text and image content file download format</li>
<li>✅ Persistently store content information to files</li>
<li>✅ Store content files to a separate folder</li>
<li>✅ Background clipboard monitoring for content download</li>
<li>✅ Record downloaded content IDs</li>
<li>✅ Support command line for downloading content files</li>
<li>✅ Automatically skip already downloaded works files</li>
<li>✅ works file integrity handling mechanism</li>
<li>✅ Customizable image works file download format</li>
<li>✅ Persistently store works information to files</li>
<li>✅ Store works files to a separate folder</li>
<li>✅ Background clipboard monitoring for works download</li>
<li>✅ Record downloaded works IDs</li>
<li>✅ Support command line for downloading works files</li>
<li>✅ Read cookies from browser</li>
<li>✅ Customizable file name format</li>
<li>✅ Support API call functionality</li>
<li>✅ Support file breakpoint resume download</li>
</ul>
<ul><b>Script Features</b>
<li>✅ Download Xiaohongshu watermark-free content files</li>
<li>✅ Extract discovery page content links</li>
<li>✅ Extract account-published content links</li>
<li>✅ Extract account-favorited content links</li>
<li>✅ Extract account-liked content links</li>
<li>✅ Extract search result content links</li>
<li>✅ Download Xiaohongshu watermark-free works files</li>
<li>✅ Extract discovery page works links</li>
<li>✅ Extract account-published works links</li>
<li>✅ Extract account-favorited works links</li>
<li>✅ Extract account-liked works links</li>
<li>✅ Extract account-board works links</li>
<li>✅ Extract search result works links</li>
<li>✅ Extract search result user links</li>
</ul>
<p>⭐ The development plan and progress of XHS-Downloader can be found at <a href="https://github.com/users/JoeanAmier/projects/5">Projects</a></p>
Expand All @@ -55,6 +56,7 @@
<h1>🔗 Supported Links</h1>
<ul>
<li><code>https://www.xiaohongshu.com/explore/WorksID</code></li>
<li><code>https://www.xiaohongshu.com/explore/WorksID?xsec_token=XXX</code></li>
<li><code>https://www.xiaohongshu.com/discovery/item/WorksID</code></li>
<li><code>https://xhslink.com/ShareCode</code></li>
<br/>
Expand All @@ -64,7 +66,7 @@
<p>⭐ It is recommended to use the <a href="https://learn.microsoft.com/en-us/windows/terminal/install">Windows Terminal</a> (default terminal for Windows 11) to run the program for the best display effect!</p>
<h1>🥣 Usage</h1>
<p>If you only need to download watermark-free content files, it is recommended to choose <b>Program Run</b>; if you have other needs, it is recommended to choose <b>Source Code Run</b>!</p>
<p>It is recommended to set the <code>cookie</code> parameter manually; if this parameter is not set, the program functions may not work properly!</p>
<p><del>It is recommended to set the <code>cookie</code> parameter manually; if this parameter is not set, the program functions may not work properly!</del></p>
<h2>🖱 Program Run</h2>
<p>Mac OS, Windows 10 and above users can go to <a href="https://github.com/JoeanAmier/XHS-Downloader/releases/latest">Releases</a> to download the program package, unzip it, open the program folder, and double-click to run <code>main</code> to use.</p>
<p><strong>Note: The executable file <code>main</code> for Mac OS may need to be launched from the terminal command line; Due to device limitations, the Mac OS executable file has not been tested and its availability cannot be guaranteed!</strong></p>
Expand Down Expand Up @@ -93,7 +95,7 @@
</ol>
<h1>🛠 Command Line Mode</h1>
<p>The project supports command line mode. If you want to download specific images from a text and image work, you can use this mode to set the image sequence number you want to download!</p>
<p>You can use the command line to <b>read cookies from the browser and write to the configuration file</b>! Note that you need to close the browser to read the data!</p>
<p>You can use the command line to <b>read cookies from the browser and write to the configuration file!</b></p>
<p>Command example: <code>python .\main.py --browser_cookie Chrome --update_settings</code></p>
<p>The <code>bool</code> type parameters support setting with <code>true</code>, <code>false</code>, <code>1</code>, <code>0</code>, <code>yes</code>, <code>no</code>, <code>on</code> or <code>off</code> (case insensitive).</p>
<hr>
Expand Down Expand Up @@ -183,10 +185,10 @@ async def example():
work_path = "D:\\" # 作品数据/文件保存根路径,默认值:项目根路径
folder_name = "Download" # 作品文件储存文件夹名称(自动创建),默认值:Download
name_format = "作品标题 作品描述"
sec_ch_ua = "" # 请求头 Sec-Ch-Ua
sec_ch_ua_platform = "" # 请求头 Sec-Ch-Ua-Platform
# sec_ch_ua = "" # 请求头 Sec-Ch-Ua
# sec_ch_ua_platform = "" # 请求头 Sec-Ch-Ua-Platform
user_agent = "" # User-Agent
cookie = "" # 小红书网页版 Cookie,无需登录,必需参数,登录状态对数据采集有影响
cookie = "" # 小红书网页版 Cookie,无需登录,可选参数,登录状态对数据采集有影响
proxy = None # 网络代理
timeout = 5 # 请求数据超时限制,单位:秒,默认值:10
chunk = 1024 * 1024 * 10 # 下载文件时,每次从服务器获取的数据块大小,单位:字节
Expand All @@ -196,26 +198,27 @@ async def example():
folder_mode = False # 是否将每个作品的文件储存至单独的文件夹
# async with XHS() as xhs:
# pass # 使用默认参数
async with XHS(work_path=work_path,
folder_name=folder_name,
name_format=name_format,
sec_ch_ua=sec_ch_ua,
sec_ch_ua_platform=sec_ch_ua_platform,
user_agent=user_agent,
cookie=cookie,
proxy=proxy,
timeout=timeout,
chunk=chunk,
max_retry=max_retry,
record_data=record_data,
image_format=image_format,
folder_mode=folder_mode,
) as xhs: # 使用自定义参数
async with XHS(
work_path=work_path,
folder_name=folder_name,
name_format=name_format,
# sec_ch_ua=sec_ch_ua,
# sec_ch_ua_platform=sec_ch_ua_platform,
user_agent=user_agent,
cookie=cookie,
proxy=proxy,
timeout=timeout,
chunk=chunk,
max_retry=max_retry,
record_data=record_data,
image_format=image_format,
folder_mode=folder_mode,
) as xhs: # 使用自定义参数
download = True # 是否下载作品文件,默认值:False
# 返回作品详细信息,包括下载地址
# 获取数据失败时返回空字典
print(await xhs.extract(error_link, download, ))
print(await xhs.extract(demo_link, download, ))
print(await xhs.extract(demo_link, download, index=[1, 2]))
# 支持传入多个作品链接
print(await xhs.extract(multiple_links, download, ))
</pre>
Expand Down Expand Up @@ -257,13 +260,13 @@ async def example():
<td align="center"><code>publish_time author_nickname title</code></td>
</tr>
<tr>
<td align="center">sec_ch_ua</td>
<td align="center"><del>sec_ch_ua</del>(Deprecated)</td>
<td align="center">str</td>
<td align="center">Browser request header Sec-Ch-Ua</td>
<td align="center">Built-in Chrome Sec-Ch-Ua</td>
</tr>
<tr>
<td align="center">sec_ch_ua_platform</td>
<td align="center"><del>sec_ch_ua_platform</del>(Deprecated)</td>
<td align="center">str</td>
<td align="center">Browser request header Sec-Ch-Ua-Platform</td>
<td align="center">Built-in Chrome Sec-Ch-Ua-Platform</td>
Expand All @@ -277,12 +280,12 @@ async def example():
<tr>
<td align="center">cookie</td>
<td align="center">str</td>
<td align="center">Xiaohongshu web version cookie, <b>login not required</b></td>
<td align="center">Xiaohongshu web version cookie, <b>No login required, non essential parameters!</b></td>
<td align="center">None</td>
</tr>
<tr>
<td align="center">proxy</td>
<td align="center">str|dict</td>
<td align="center">str | dict</td>
<td align="center">Set program proxy</td>
<td align="center">null</td>
</tr>
Expand Down Expand Up @@ -354,9 +357,10 @@ async def example():
</tr>
</tbody>
</table>
<p><b>Additional Notes: The parameters <code>sec_ch_ua</code>, <code>sec_ch_ua_platform</code>, and <code>user_agent</code> examples are provided for reference, and need to be set manually only if the program fails to fetch data!</b></p>
<p><b>Additional Notes: The parameters <code>user_agent</code> examples are provided for reference; Strongly recommend setting according to actual browser information!</b></p>
<img src="static/screenshot/请求头示例图.png" alt="">
<h1>🌐 Cookie</h1>
<p>Starting from version <code>2.2</code>, if there are no abnormalities in project functionality, there is no need to handle cookies separately!</p>
<ol>
<li>Open the browser (optional: start in incognito mode) and visit <code>https://www.xiaohongshu.com/explore</code></li>
<li>Log in to your Xiaohongshu account (can be skipped)</li>
Expand Down
39 changes: 20 additions & 19 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ async def example():
work_path = "D:\\" # 作品数据/文件保存根路径,默认值:项目根路径
folder_name = "Download" # 作品文件储存文件夹名称(自动创建),默认值:Download
name_format = "作品标题 作品描述"
sec_ch_ua = "" # 请求头 Sec-Ch-Ua
sec_ch_ua_platform = "" # 请求头 Sec-Ch-Ua-Platform
# sec_ch_ua = "" # 请求头 Sec-Ch-Ua
# sec_ch_ua_platform = "" # 请求头 Sec-Ch-Ua-Platform
user_agent = "" # User-Agent
cookie = "" # 小红书网页版 Cookie,无需登录,必需参数,登录状态对数据采集有影响
cookie = "" # 小红书网页版 Cookie,无需登录,可选参数,登录状态对数据采集有影响
proxy = None # 网络代理
timeout = 5 # 请求数据超时限制,单位:秒,默认值:10
chunk = 1024 * 1024 * 10 # 下载文件时,每次从服务器获取的数据块大小,单位:字节
Expand All @@ -32,26 +32,27 @@ async def example():
folder_mode = False # 是否将每个作品的文件储存至单独的文件夹
# async with XHS() as xhs:
# pass # 使用默认参数
async with XHS(work_path=work_path,
folder_name=folder_name,
name_format=name_format,
sec_ch_ua=sec_ch_ua,
sec_ch_ua_platform=sec_ch_ua_platform,
user_agent=user_agent,
cookie=cookie,
proxy=proxy,
timeout=timeout,
chunk=chunk,
max_retry=max_retry,
record_data=record_data,
image_format=image_format,
folder_mode=folder_mode,
) as xhs: # 使用自定义参数
async with XHS(
work_path=work_path,
folder_name=folder_name,
name_format=name_format,
# sec_ch_ua=sec_ch_ua,
# sec_ch_ua_platform=sec_ch_ua_platform,
user_agent=user_agent,
cookie=cookie,
proxy=proxy,
timeout=timeout,
chunk=chunk,
max_retry=max_retry,
record_data=record_data,
image_format=image_format,
folder_mode=folder_mode,
) as xhs: # 使用自定义参数
download = True # 是否下载作品文件,默认值:False
# 返回作品详细信息,包括下载地址
# 获取数据失败时返回空字典
print(await xhs.extract(error_link, download, ))
print(await xhs.extract(demo_link, download, ))
print(await xhs.extract(demo_link, download, index=[1, 2]))
# 支持传入多个作品链接
print(await xhs.extract(multiple_links, download, ))

Expand Down
Loading

0 comments on commit 0a52dc0

Please sign in to comment.