topic/create script for collecting threats data #517

TsurutaYoshiki · 2024-12-09T08:51:00Z

PR の目的

特定のサービスに紐づく脆弱性とパッケージ情報を jsonで返すスクリプトの作成しました

経緯・意図・意思決定

出力形式: json
出力内容

{
    "pteam_id": "",
    "pteam_name": "",
    "service_id": "",
    "service_name": "",
    "service_description": "",
    "threats": [
        {
            "threat_id": "",
            "topic_id": "",
            "cve_id": [
                "CVE-xxxxx",
                "CVE-xxxxx"
            ],
            "tag_id": "",
            "tag_version": "",
            "tag_name": "axios:npm:npm"
        },
        {
            "threat_id": "",
            "topic_id": "",
            "cve_id": [
                "CVE-xxxxx",
                "CVE-xxxxx"
            ],
            "tag_id": "",
            "tag_version": "",
            "tag_name": "axios:npm:"
        }
    ]
}

スクリプト実行時にAPI_BASE_URL、token、pteam_id、service_idを指定してもらうようにしました
cve_id内に"CVE-xxxxx"がなかった場合、threatごと取り除いています
ファイルの出力名をcollect_threats_data + service_idにしています

dejima-shikou · 2024-12-09T23:55:59Z

scripts/collect_threats_data.py

+                service_id=service["service_id"],
+                service_name=service["service_name"],
+                service_description=service["description"],
+            )


service_id が見つかったら、他のserviceに用がないのでbreakした方が良い

dejima-shikou · 2024-12-10T00:06:29Z

scripts/collect_threats_data.py

+            tag_id=dependency["tag_id"], tag_version=dependency["version"], tag_name=tag["tag_name"]
+        )
+        del threat["dependency_id"]
+


引数のthreatsの値が変わる、という挙動は推奨しない。
新しいlistを作って返す方が分かりやすい。
〇理由
・１つの変数が場所によって状態が変わる、というコードは読みにくい。状態が変わっている箇所を全部追いかけないと理解できないため。定義時から値が変わらなければ、定義箇所を見れば変数の中身が分かって読みやすい。
・getXXX関数は戻り値を返す、その他状態を更新しない、というケースが多く、この慣習に従い、適切な関数名を付けていると、呼び出し元を見るだけでコードの意味が分かる。

dejima-shikou · 2024-12-10T00:08:03Z

scripts/collect_threats_data.py

+    _threats = get_misp_tag(tc_client, threats)
+    output_data.update(threats=_threats)
+    output_json_file(output_data, args.pteam_id, args.service_id)
+


threats、_threats は変数名を見ても違いが分からない。
多少長くなっても良いので、違いが分かるような名前が良い。
threats_with_tag_data, threats_with_topic_data とか

dejima-shikou · 2024-12-10T00:14:27Z

エラー時の挙動を設計してください
・get_pteam
　→見つからなければエラーメッセージを表示して異常終了
　　※serviceも見つからなければ同様
・get_threats
　→見つからなければエラーメッセージを表示して異常終了
　　or
　　0件なので特にエラー処理せず、結果戻り値[]
・get_dependency
　→エラーメッセージは出すが、他のthreatの処理は続行
・・・

dejima-shikou · 2024-12-10T00:20:42Z

scripts/collect_threats_data.py

+    return _threats
+
+
+def output_json_file(threats: dict, pteam_id: str, service_id: str):


引数pteam_idは使ってないので消すべき

mshim03

関数設計が統一されてないので意図が伝わりにくいです。output_dataを受け取って加工しする関数があったり、データを返してmain側でoutput_data を加工したりと挙動がバラバラです。
最終的なデータ構築の責務はmain側に持たせた方がいいと思います

またthreats という辞書の中身がどんどん変化するので関数の中まで見ないと混乱します。関数の中で辞書を書き換えるのは意図せぬ挙動の元なので極力避けた方がいいと思います

mshim03 · 2024-12-10T01:19:34Z

scripts/collect_threats_data.py

+def get_misp_tag(tc_client: ThreatconnectomeClient, threats: list):
+    """
+    Remove threat that did not have cve_id in missp_tag.
+    """
+    _threats: list = []
+    for threat in threats:
+        topic = tc_client.get_topic(threat["topic_id"])
+        cve_id: list = []
+        for misp_tag in topic["misp_tags"]:
+            if misp_tag["tag_name"] and misp_tag["tag_name"].startswith("CVE"):
+                cve_id.append(misp_tag["tag_name"])
+
+        if len(cve_id) == 0:
+            continue
+        else:
+            threat.update(cve_id=cve_id)
+            _threats.append(threat)
+
+    return _threats


misp_tag の取得と cve_id という値をデータに付与する責務を持っています。関数名がじつ挙動を表してません。責務を分けた方がいいと思います

TsurutaYoshiki · 2024-12-10T07:00:07Z

@mshim03
スクリプトを修正しました
以下の部分を変更しました

全体的な修正

output_dataをmain関数内で追加、変更を行うようにしました
threats という辞書型のデータを加工していくのではなく、関数ごとに新しい変数をリターンするようにしました
関数名を変更しました
- get_tags_data → add_tag_data_to_threat
- get_misp_data → add_cve_data_to_threat

部分的な修正

def get_pteam_and_service_data()内にbreakを追加
for文の引数を処理中にdelを用いて削除するのをやめました
def output_json_file()の引数にあるpteam_idの削除
pteam_id、service_idが間違っていた時のエラー出力

上記のPRでの記載忘れ

スクリプトを実行する際に必要なコマンドの説明をREADMEに追加しました

dejima-shikou · 2024-12-10T07:12:54Z

scripts/collect_threats_data.py

+        try:
+            response = self._retry_call(requests.get, f"{self.api_url}/threats", params=params)
+        except APIError as error:
+            sys.exit("Is the service_id correct?\n" + str(error))


service_id は正しいが、そのserviceにthreatが1件も無い場合もあるので、
ここではservice_idに紐づくthreatが取得できなかった、とのメッセージが良いと思います。

TsurutaYoshiki · 2024-12-10T08:21:20Z

エラーメッセージを変更しました
Is the service_id correct?
↓
There is no threat tied to service_id

mshim03

LGTM

topic/create script for collecting threats data

f535b85

TsurutaYoshiki marked this pull request as ready for review December 9, 2024 08:53

dejima-shikou reviewed Dec 9, 2024

View reviewed changes

dejima-shikou reviewed Dec 10, 2024

View reviewed changes

mshim03 requested changes Dec 10, 2024

View reviewed changes

fix script

2c8421c

dejima-shikou reviewed Dec 10, 2024

View reviewed changes

fix error message

bd160f8

mshim03 approved these changes Dec 10, 2024

View reviewed changes

mshim03 merged commit 35f0604 into main Dec 10, 2024
4 checks passed

mshim03 deleted the topic/create-script-for-collecting-threats-data branch December 10, 2024 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

topic/create script for collecting threats data #517

topic/create script for collecting threats data #517

TsurutaYoshiki commented Dec 9, 2024 •

edited

Loading

dejima-shikou Dec 9, 2024

dejima-shikou Dec 10, 2024

dejima-shikou Dec 10, 2024

dejima-shikou commented Dec 10, 2024 •

edited

Loading

dejima-shikou Dec 10, 2024

mshim03 left a comment •

edited

Loading

mshim03 Dec 10, 2024

TsurutaYoshiki commented Dec 10, 2024 •

edited

Loading

dejima-shikou Dec 10, 2024

TsurutaYoshiki commented Dec 10, 2024

mshim03 left a comment

		return _threats


		def output_json_file(threats: dict, pteam_id: str, service_id: str):

topic/create script for collecting threats data #517

topic/create script for collecting threats data #517

Conversation

TsurutaYoshiki commented Dec 9, 2024 • edited Loading

PR の目的

経緯・意図・意思決定

dejima-shikou Dec 9, 2024

Choose a reason for hiding this comment

dejima-shikou Dec 10, 2024

Choose a reason for hiding this comment

dejima-shikou Dec 10, 2024

Choose a reason for hiding this comment

dejima-shikou commented Dec 10, 2024 • edited Loading

dejima-shikou Dec 10, 2024

Choose a reason for hiding this comment

mshim03 left a comment • edited Loading

Choose a reason for hiding this comment

mshim03 Dec 10, 2024

Choose a reason for hiding this comment

TsurutaYoshiki commented Dec 10, 2024 • edited Loading

全体的な修正

部分的な修正

上記のPRでの記載忘れ

dejima-shikou Dec 10, 2024

Choose a reason for hiding this comment

TsurutaYoshiki commented Dec 10, 2024

mshim03 left a comment

Choose a reason for hiding this comment

TsurutaYoshiki commented Dec 9, 2024 •

edited

Loading

dejima-shikou commented Dec 10, 2024 •

edited

Loading

mshim03 left a comment •

edited

Loading

TsurutaYoshiki commented Dec 10, 2024 •

edited

Loading