Skip to content

Commit

Permalink
Merge pull request #1020 from DIYgod/master
Browse files Browse the repository at this point in the history
[pull] master from diygod:master
  • Loading branch information
pull[bot] authored Aug 4, 2023
2 parents c331eae + a198d61 commit 6d7ab03
Show file tree
Hide file tree
Showing 9 changed files with 311 additions and 3 deletions.
74 changes: 74 additions & 0 deletions docs/en/other.md
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,80 @@ please refer to the [Notion API documentation](https://developers.notion.com/ref

<RouteEn author="sbilly" example="/sans/summit_archive" path="/sans/summit_archive" />

## Transformation

Pass URL and transformation rules to convert HTML/JSON into RSS.

### HTML

Specify options (in the format of query string) in parameter `routeParams` parameter to extract data from HTML.

| Key | Meaning | Accepted Values | Default |
| -------------- | -------------------------------------------------- | --------------- | ----------------------- |
| `title` | The title of the RSS | `string` | Extract from `<title>` |
| `item` | The HTML elements as `item` using CSS selector | `string` | html |
| `itemTitle` | The HTML elements as `title` in `item` using CSS selector | `string` | `item` element |
| `itemTitleAttr` | The attributes of `title` element as title | `string` | Element text |
| `itemLink` | The HTML elements as `link` in `item` using CSS selector | `string` | `item` element |
| `itemLinkAttr` | The attributes of `link` element as link | `string` | `href` |
| `itemDesc` | The HTML elements as `descrption` in `item` using CSS selector | `string` | `item` element |
| `itemDescAttr` | The attributes of `descrption` element as description | `string` | Element html |

<RouteEn author="ttttmr" example="/rsshub/transform/html/https%3A%2F%2Fwechat2rss.xlab.app%2Fposts%2Flist%2F/item=div%5Bclass%3D%27post%2Dcontent%27%5D%20p%20a" path="/rsshub/transform/html/:url/:routeParams" :paramsDesc="['`encodeURIComponent`ed URL address', 'Transformation rules, requires URL encode']" selfhost="1">

Parameters parsing in the above example:

| Parameter | Value |
| ------------ | ----------------------------------------- |
| `url` | `https://wechat2rss.xlab.app/posts/list/` |
| `routeParams`| `item=div[class='post-content'] p a` |

Parsing of `routeParams` parameter:

| Parameter | Value |
| --------- | ------------------------------- |
| `item` | `div[class='post-content'] p a` |

</RouteEn>

### JSON

Specify options (in the format of query string) in parameter `routeParams` parameter to extract data from JSON.

| Key | Meaning | Accepted Values | Default |
| ---------- | ---------------------------------------- | --------------- | ------------------------------------------ |
| `title` | The title of the RSS | `string` | Extracted from home page of current domain |
| `item` | The JSON Path as `item` element | `string` | Entire JSON response |
| `itemTitle` | The JSON Path as `title` in `item` | `string` | None |
| `itemLink` | The JSON Path as `link` in `item` | `string` | None |
| `itemDesc` | The JSON Path as `description` in `item` | `string` | None |

::: tip Note

JSON Path only supports format like `a.b.c`. if you need to access arrays, like `a[0].b`, you can write it as `a.0.b`.

:::

<RouteEn author="ttttmr" example="/rsshub/transform/json/https%3A%2F%2Fapi.github.com%2Frepos%2Fginuerzh%2Fgost%2Freleases/title=Gost%20releases&itemTitle=tag_name&itemLink=html_url&itemDesc=body" path="/rsshub/transform/json/:url/:routeParams" :paramsDesc="['`encodeURIComponent`ed URL address', 'Transformation rules, requires URL encode']" selfhost="1">

Parameters parsing in the above example:

| Parameter | Value |
| ------------- | ----------------------------------------------- |
| `url` | `https://api.github.com/repos/ginuerzh/gost/releases` |
| `routeParams` | `title=Gost releases&itemTitle=tag_name&itemLink=html_url&itemDesc=body` |

Parsing of `routeParams` parameter:

| Parameter | Value |
| ------------ | ---------------- |
| `title` | `Gost releases` |
| `itemTitle` | `tag_name` |
| `itemLink` | `html_url` |
| `itemDesc` | `body` |

</RouteEn>

## Trending Search Keyword Aggregator

### Aggregated Keyword Tracker
Expand Down
74 changes: 74 additions & 0 deletions docs/other.md
Original file line number Diff line number Diff line change
Expand Up @@ -1121,6 +1121,80 @@ type 为 all 时,category 参数不支持 cost 和 free

<Route author="Fatpandac" example="/ems/apple/EZ319397281CN" path="/ems/apple/:id" :paramsDesc="['苹果邮件编号']"/>

## 转换

传递 URL 和转化规则,将 HTML/JSON 转换为 RSS

### HTML

`routeParams` 参数中以 query string 格式指定选项,可以控制提取数据

|| 含义 | 接受的值 | 默认值 |
| --------------- | --------------------------------------------------------------- | -------- | ------------------------ |
| `title` | 指定 RSS 的标题 | `string` | 从当前网页中取 `<title>` |
| `item` | 通过 CSS 选择器查找 HTML 元素作为 `item` 元素 | `string` | html |
| `itemTitle` |`item` 中通过 CSS 选择器查找 HTML 元素作为 `title` 元素 | `string` | `item` 元素 |
| `itemTitleAttr` | 获取 `title` 元素属性作为标题 | `string` | 元素 text |
| `itemLink` |`item` 中通过 CSS 选择器查找 HTML 元素作为 `link` 元素 | `string` | `item` 元素 |
| `itemLinkAttr` | 获取 `link` 元素属性作为链接 | `string` | `href` |
| `itemDesc` |`item` 中通过 CSS 选择器查找 HTML 元素作为 `descrption` 元素 | `string` | `item` 元素 |
| `itemDescAttr` | 获取 `descrption` 元素属性作为描述 | `string` | 元素 html |

<Route author="ttttmr" example="/rsshub/transform/html/https%3A%2F%2Fwechat2rss.xlab.app%2Fposts%2Flist%2F/item=div%5Bclass%3D%27post%2Dcontent%27%5D%20p%20a" path="/rsshub/transform/html/:url/:routeParams" :paramsDesc="['URL地址,需经 URL 编码', '转换规则,需经 URL 编码']" selfhost="1">

上述例子中参数解析如下

| 参数 ||
| -------------- | ----------------------------------------- |
| `:url` | `https://wechat2rss.xlab.app/posts/list/` |
| `:routeParams` | `item=div[class='post-content'] p a` |

`routeParams`参数解析如下

| 参数 ||
| ------ | ------------------------------- |
| `item` | `div[class='post-content'] p a` |

</Route>

### JSON

`routeParams` 参数中以 query string 格式指定选项,可以控制提取数据

|| 含义 | 接受的值 | 默认值 |
| ----------- | --------------------------------------- | -------- | ------------------------------------ |
| `title` | 指定 RSS 的标题 | `string` | 从当前域名的根路径网页中取 `<title>` |
| `item` | 通过 JSON Path 查找作为 `item` 元素 | `string` | 整个响应 JSON |
| `itemTitle` |`item` 中通过 JSON Path 查找作为标题 | `string` ||
| `itemLink` |`item` 中通过 JSON Path 查找作为链接 | `string` ||
| `itemDesc` |`item` 中通过 JSON Path 查找作为描述 | `string` ||

::: tip 注意

JSON Path 目前只支持例如 `a.b.c` 的形式,如果需要从数组中读取,例如 `a[0].b`,可以写成 `a.0.b`

:::

<Route author="ttttmr" example="/rsshub/transform/json/https%3A%2F%2Fapi.github.com%2Frepos%2Fginuerzh%2Fgost%2Freleases/title=Gost%20releases&itemTitle=tag_name&itemLink=html_url&itemDesc=body" path="/rsshub/transform/json/:url/:routeParams" :paramsDesc="['URL地址,需经 URL 编码', '转换规则,需经 URL 编码']" selfhost="1">

上述例子中参数解析如下

| 参数 ||
| -------------- | ------------------------------------------------------------------------ |
| `:url` | `https://api.github.com/repos/ginuerzh/gost/releases` |
| `:routeParams` | `title=Gost releases&itemTitle=tag_name&itemLink=html_url&itemDesc=body` |

`routeParams` 参数解析如下

| 参数 ||
| ----------- | --------------- |
| `title` | `Gost releases` |
| `itemTitle` | `tag_name` |
| `itemLink` | `html_url` |
| `itemDesc` | `body` |

</Route>

## 自如

### 房源
Expand Down
2 changes: 1 addition & 1 deletion lib/maintainer.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ const { join } = require('path');
// Presence Check
for (const dir of fs.readdirSync(dirname)) {
const dirPath = join(dirname, dir);
if (!fs.existsSync(join(dirPath, 'maintainer.js'))) {
if (fs.existsSync(join(dirPath, 'router.js')) && !fs.existsSync(join(dirPath, 'maintainer.js'))) {
throw Error(`No maintainer.js in "${dirPath}".`);
}
}
Expand Down
13 changes: 13 additions & 0 deletions lib/v2/altervista/radar.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
module.exports = {
'altervista.org': {
_name: 'Altervista',
hyp3rlinx: [
{
title: 'hyp3rlinx blog',
docs: 'https://docs.rsshub.app/',
source: ['/'],
target: '/rsshub/transform/html/http%3A%2F%2Fhyp3rlinx.altervista.org%2F/item=table[border=%221%22]%20tr%20td%20a',
},
],
},
};
2 changes: 2 additions & 0 deletions lib/v2/rsshub/maintainer.js
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
module.exports = {
'/routes/:lang?': ['DIYgod'],
'/rsshub/sponsors': ['DIYgod'],
'/transform/html/:url/:routeParams': ['ttttmr'],
'/transform/json/:url/:routeParams': ['ttttmr'],
};
4 changes: 2 additions & 2 deletions lib/v2/rsshub/router.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
module.exports = (router) => {
router.get('/rss', require('./routes')); // 弃用

router.get('/routes/:lang?', require('./routes'));
router.get('/sponsors', require('./sponsors'));
router.get('/transform/html/:url/:routeParams', require('./transform/html'));
router.get('/transform/json/:url/:routeParams', require('./transform/json'));
};
75 changes: 75 additions & 0 deletions lib/v2/rsshub/transform/html.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
const got = require('@/utils/got');
const cheerio = require('cheerio');
const config = require('@/config').value;

module.exports = async (ctx) => {
if (!config.feature.allow_user_supply_unsafe_domain) {
ctx.throw(403, `This RSS is disabled unless 'ALLOW_USER_SUPPLY_UNSAFE_DOMAIN' is set to 'true'.`);
}
const { url } = ctx.params;
const response = await got({
method: 'get',
url,
});

const routeParams = new URLSearchParams(ctx.params.routeParams);
const $ = cheerio.load(response.data);
const rssTitle = routeParams.get('title') ? routeParams.get('title') : $('title').text();
const item = routeParams.get('item') ? routeParams.get('item') : 'html';
const items = $(item)
.toArray()
.map((item) => {
try {
item = $(item);

let title;
const titleEle = routeParams.get('itemTitle') ? item.find(routeParams.get('itemTitle')) : item;
if (routeParams.get('itemTitleAttr')) {
title = titleEle.attr(routeParams.get('itemTitleAttr'));
} else {
title = titleEle.text();
}

let link;
const linkEle = routeParams.get('itemLink') ? item.find(routeParams.get('itemLink')) : item;
if (routeParams.get('itemLinkAttr')) {
link = linkEle.attr(routeParams.get('itemLinkAttr'));
} else {
if (linkEle.is('a')) {
link = linkEle.attr('href');
} else {
link = linkEle.find('a').attr('href');
}
}
// 补全绝对链接
link = link.trim();
if (link && !link.startsWith('http')) {
link = `${new URL(url).origin}${link}`;
}

let desc;
const descEle = routeParams.get('itemDesc') ? item.find(routeParams.get('itemDesc')) : item;
if (routeParams.get('itemDescAttr')) {
desc = descEle.attr(routeParams.get('itemDescAttr'));
} else {
desc = descEle.html();
}

return {
title,
link,
description: desc,
};
} catch (e) {
return null;
}
})
.filter(Boolean);

ctx.state.data = {
title: rssTitle,
link: url,
description: `Proxy ${url}`,
item: items,
};
};
57 changes: 57 additions & 0 deletions lib/v2/rsshub/transform/json.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
const got = require('@/utils/got');
const cheerio = require('cheerio');
const config = require('@/config').value;

function jsonGet(obj, attr) {
if (typeof attr !== 'string') {
return obj;
}
// a.b.c
// a.b[0].c => a.b.0.c
attr.split('.').forEach((key) => {
obj = obj[key];
});
return obj;
}

module.exports = async (ctx) => {
if (!config.feature.allow_user_supply_unsafe_domain) {
ctx.throw(403, `This RSS is disabled unless 'ALLOW_USER_SUPPLY_UNSAFE_DOMAIN' is set to 'true'.`);
}
const { url } = ctx.params;
const response = await got({
method: 'get',
url,
});

const routeParams = new URLSearchParams(ctx.params.routeParams);
let rssTitle = routeParams.get('title');
if (!rssTitle) {
const resp = await got({
method: 'get',
url: new URL(url).origin,
});
const $ = cheerio.load(resp.data);
rssTitle = $('title').text();
}

const items = jsonGet(response.data, routeParams.get('item')).map((item) => {
let link = jsonGet(item, routeParams.get('itemLink')).trim();
// 补全绝对链接
if (link && !link.startsWith('http')) {
link = `${new URL(url).origin}${link}`;
}
return {
title: jsonGet(item, routeParams.get('itemTitle')),
link,
description: routeParams.get('itemDesc') ? jsonGet(item, routeParams.get('itemDesc')) : '',
};
});

ctx.state.data = {
title: rssTitle,
link: url,
description: `Proxy ${url}`,
item: items,
};
};
13 changes: 13 additions & 0 deletions lib/v2/sec/radar.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
module.exports = {
'sec.today': {
_name: '每日安全',
'.': [
{
title: '动态',
docs: 'https://docs.rsshub.app/',
source: ['/pulses', '/'],
target: '/rsshub/transform/html/https%3A%2F%2Fsec.today%2Fpulses%2F/item=div[class="card-body"]',
},
],
},
};

1 comment on commit 6d7ab03

@vercel
Copy link

@vercel vercel bot commented on 6d7ab03 Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.