Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

老师,metric.py中pred最终出现乱码怎么处理? #4201

Closed
1 task done
demouo opened this issue Jun 11, 2024 · 0 comments
Closed
1 task done

老师,metric.py中pred最终出现乱码怎么处理? #4201

demouo opened this issue Jun 11, 2024 · 0 comments
Labels
wontfix This will not be worked on

Comments

@demouo
Copy link

demouo commented Jun 11, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

正常运行lora下的sft

Reproduction

正常运行lora下的sft

Expected behavior

微调的模型是Qwen2-7B,问题同样出现在ChatGLM3-6B
因为想要在评估的过程中看到一些自定义指标所以改动了metric.py,改动如下:
metric.py下ComputeMetrics类,在__call__下补充

print("preds shape", preds.shape)
print("labels shape", labels.shape)

输出

preds shape (10,440,65024)
labels shape (10, 440 )

发现维度不一致,因此把preds的最后一个维度挑选最大的下标值(也就是选择了概率最大的word)

# 将概率分布转换为类别索引
if preds.ndim == 3:
       preds = np.argmax(preds, axis=-1)

至此可以正常输出,但是preds输出有乱码,而labels输出没问题说明不是tokenizer的问题
测试段的代码

for pred, label in zip(decoded_preds, decoded_labels):
            
            print("没jieba之前pred,", pred)
            print("没jieba之前label,", label)
            
            hypothesis = list(jieba.cut(pred))
            reference = list(jieba.cut(label))
            
            if len(" ".join(hypothesis).split()) == 0 or len(" ".join(reference).split()) == 0:
                result = {"rouge-1": {"f": 0.0}, "rouge-2": {"f": 0.0}, "rouge-l": {"f": 0.0}}
            else:
                rouge = Rouge()
                scores = rouge.get_scores(" ".join(hypothesis), " ".join(reference))
                result = scores[0]

           # ...

输出

没jieba之前pred, 0: are a helpful assistant.
n:I are given helpfulless. and I are talking a conversation with a user. The seeker isates the conversation by and they are to respond with the seeker based The seeker should should as ['m tostrategy} to to { following. and the response is {reply}.Available is  few of 1 conversations: and each strategy are their descriptions are as follows:
1-focusedWhat the clarification or to the topic or better the seekereeeker clarify their problem they they are.
 
-ended questions are preferred for as::bing:00igh:  questions are be used to confirm specific information. 
 Validationate: Paraphrasing: Rest technique rest clear direct versionphrasing of the help-seeker's words. can help them clarify the situation from clearly. Strategy Validation: Feelings: Ackiculating the validate the help-seeker’s feelings, This Validation-disclosure: Shareulge personal experiences or you have had, that that you have with the help-seeker. help empathy understanding. 
 Validationirmation: Validationassurance: Ackirm the help-seer’s feelings and abilities, and abilities. provide reassurance that hope. 
 S Information: Offer specific and how to deal the improve avoid careful not avoid bestep and provide the what to do. 
 Problemal: Provide information that help help-seeker, such example, a, statistics, or, or, etc examples asking their. 
 Problem: Any ofasantries, small small strategies strategies that are not fit into the above categories.Question used with you, I's important of the strategies too I
user
I used Aff strategy the conversation, and my reply is I do you fears feel about this situation how reply

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
没jieba之前label, I used Question for the conversation, and my reply is How do your children feel about this? And his?

完整metric.py的代码如下

from dataclasses import dataclass
from typing import TYPE_CHECKING, Dict, Sequence, Tuple, Union

import numpy as np

from ...extras.constants import IGNORE_INDEX
from ...extras.packages import is_jieba_available, is_nltk_available, is_rouge_available

if TYPE_CHECKING:
    from transformers.tokenization_utils import PreTrainedTokenizer

if is_jieba_available():
    import jieba  # type: ignore

if is_nltk_available():
    from nltk.translate.bleu_score import SmoothingFunction, sentence_bleu

if is_rouge_available():
    from rouge_chinese import Rouge


@dataclass
class ComputeMetrics:
    r"""
    Wraps the tokenizer into metric functions, used in Seq2SeqPeftTrainer.
    """

    tokenizer: "PreTrainedTokenizer"

    def __call__(self, eval_preds: Sequence[Union[np.ndarray, Tuple[np.ndarray]]]) -> Dict[str, float]:
        r"""
        Uses the model predictions to compute metrics.
        """
        preds, labels = eval_preds
        score_dict = {"rouge-1": [], "rouge-2": [], "rouge-l": [], "bleu-4": [], "acc": []}
        
        # 将概率分布转换为类别索引
        if preds.ndim == 3:
            preds = np.argmax(preds, axis=-1)
            
        preds = np.where(preds != IGNORE_INDEX, preds, self.tokenizer.pad_token_id)
        labels = np.where(labels != IGNORE_INDEX, labels, self.tokenizer.pad_token_id)
        
        # 打印维度
        print("preds shape", preds.shape)
        print("labels shape", labels.shape)

        # 打印部分预测和标签数据
        print("Predicted tokens:", preds[0][:10])
        print("Label tokens:", labels[0][:10])

        decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)
        decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)

        for pred, label in zip(decoded_preds, decoded_labels):
            
            print("没jieba之前pred,", pred)
            print("没jieba之前label,", label)
            
            hypothesis = list(jieba.cut(pred))
            reference = list(jieba.cut(label))
            
            if len(" ".join(hypothesis).split()) == 0 or len(" ".join(reference).split()) == 0:
                result = {"rouge-1": {"f": 0.0}, "rouge-2": {"f": 0.0}, "rouge-l": {"f": 0.0}}
            else:
                rouge = Rouge()
                scores = rouge.get_scores(" ".join(hypothesis), " ".join(reference))
                result = scores[0]

            for k, v in result.items():
                score_dict[k].append(round(v["f"] * 100, 4))

            bleu_score = sentence_bleu([list(label)], list(pred), smoothing_function=SmoothingFunction().method3)
            score_dict["bleu-4"].append(round(bleu_score * 100, 4))
        
        print(score_dict)
        print()
        ans = {k: float(np.mean(v)) for k, v in score_dict.items()}
        print(ans)
        
        return ans

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jun 11, 2024
@hiyouga hiyouga added wontfix This will not be worked on and removed pending This problem is yet to be addressed labels Jun 14, 2024
@hiyouga hiyouga closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants