-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[xdoctest] reformat example code with google style in No. 261-263 #57703
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
先把 fleet 里面没修改的示例改一下,然后重跑 CI 看看结果吧 ~
import paddle.distributed.fleet as fleet | ||
strategy = fleet.DistributedStrategy() | ||
fleet.init(strategy=strategy) | ||
>>> import paddle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
示例部分要写在 Examples:
下面。另外,上面的 code-example1
也要改~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. code-block:: python
之前写个 Examples:
吧 ~ 可以参考后面的代码 ~
import paddle.distributed.fleet as fleet | ||
strategy = fleet.DistributedStrategy() | ||
fleet.init(strategy=strategy) | ||
>>> import paddle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. code-block:: python
之前写个 Examples:
吧 ~ 可以参考后面的代码 ~
>>> if pre_layer_norm: | ||
... out = layer_norm1(x) | ||
>>> else: | ||
... out = x | ||
>>> out = linear2(dropout1(activation(linear1(src)))) | ||
>>> if add_residual: | ||
... out = residual + dropout2(out) | ||
>>> else: | ||
... out = dropout2(out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里把 .. code-block:: python
改为 .. code-block:: text
吧~
另外,else
属于复合语句,用 ...
代替 >>>
>>> # [2, 4, 128] | ||
>>> print(output.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> print(output.shape)
(2, 4, 128)
需要把原有的注释方式改为输出~
>>> residual = x | ||
>>> if pre_layer_norm: | ||
... out = layer_norm(x) | ||
>>> else: | ||
... out = x | ||
>>> # compute q, k, v | ||
>>> out = matmul(out, qkv_weight) + qkv_bias | ||
>>> out = transpose(out, perm=[2, 0, 3, 1, 4]) | ||
>>> # extract q, k and v from out | ||
>>> q = out[0:1,::] * (head_dim ** -0.5) | ||
>>> k = out[1:2,::] | ||
>>> v = out[2:3,::] | ||
>>> out = matmul(q, k, transpose_y=True) | ||
>>> out = out + attn_mask | ||
>>> out = softmax(out) | ||
>>> out = dropout(out) | ||
>>> out = matmul(out, v) | ||
>>> # combine heads | ||
>>> out = transpose(out, perm=[0, 2, 1, 3]) | ||
>>> # project to output | ||
>>> out = linear(out) | ||
>>> if add_residual: | ||
... out = residual + dropout(out) | ||
>>> else: | ||
... out = dropout(out) | ||
>>> if not pre_layer_norm: | ||
... out = layer_norm(out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同样,改一下 .. code-block:: python
else 的提示符 ~
>>> # [2, 4, 128] | ||
>>> print(output.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
输出改一下 ~
>>> if pre_layer_norm: | ||
... out = layer_norm(x) | ||
... out = qkv_linear(out) + qkv_bias | ||
>>> else: | ||
... out = qkv_linear(x) + qkv_bias | ||
>>> out = transpose(out, perm=[2, 0, 3, 1, 4]) | ||
>>> # extract q, k and v from out. | ||
>>> q = out[0:1, ::] | ||
>>> k = out[1:2, ::] | ||
>>> v = out[2:3, ::] | ||
>>> out = q * k^t | ||
>>> out = attn_mask + out | ||
>>> out = softmax(out) | ||
>>> out = dropout(out) | ||
>>> out = out * v | ||
>>> out = transpose(out, perm=[0, 2, 1, 3]) | ||
>>> out = linear(out) | ||
>>> if pre_layer_norm: | ||
... out = x + dropout(out + bias) | ||
>>> else: | ||
... out = layer_norm(x + dropout(out + bias)) | ||
|
||
>>> residual = out; | ||
>>> if pre_layer_norm: | ||
... out = ffn_layer_norm(out) | ||
>>> out = ffn1_linear(out) | ||
>>> out = dropout(activation(out + ffn1_bias)) | ||
>>> out = ffn2_linear(out) | ||
>>> out = residual + dropout(out + ffn2_bias) | ||
>>> if not pre_layer_norm: | ||
... out = ffn_layer_norm(out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同样,改一下 .. code-block:: python
else 提示符 ~
>>> # [2, 4, 128] | ||
>>> print(output.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
输出 ~
@@ -290,7 +291,7 @@ def fused_bias_dropout_residual_layer_norm( | |||
|
|||
.. code-block:: python | |||
|
|||
y = layer_norm(residual + dropout(bias + x)) | |||
>>> y = layer_norm(residual + dropout(bias + x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把上面的 .. code-block:: python
改为 .. code-block:: text
~
这里应该是算法说明部分,可以不当作示例使用 ~
>>> import paddle | ||
>>> import paddle.distributed.fleet as fleet | ||
>>> fleet.init(is_collective=True) | ||
>>> strategy = fleet.DistributedStrategy() | ||
>>> optimizer = paddle.optimizer.SGD(learning_rate=0.001) | ||
>>> optimizer = fleet.distributed_optimizer(optimizer, strategy=strategy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> import paddle
>>> import paddle.distributed.fleet as fleet
>>> fleet.init(is_collective=True)
>>> strategy = fleet.DistributedStrategy()
>>> linear = paddle.nn.Linear(10, 10)
>>> optimizer = paddle.optimizer.SGD(learning_rate=0.001, parameters=linear.parameters())
>>> optimizer = fleet.distributed_optimizer(optimizer, strategy=strategy)
根据错误提示,应该给 optimizer 加上参数 ~ 或者用 static ~
>>> print(y_train) | ||
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True, | ||
[[2., 0., 6.], | ||
[0., 0., 0.]]) | ||
|
||
>>> m.eval() # switch the model to test phase | ||
>>> y_test = m(x) | ||
>>> print(y_test) | ||
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True, | ||
[[1., 2., 3.], | ||
[4., 5., 6.]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropout 可以试一下加 seed ~ 如果 seed 也不能固定输出的话,可以把这部分的输出 skip 包裹一下 ~
Sorry to inform you that 361fa9d's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要根据 review 意见进行修改,并 merge 一下最新的 develop 分支
@KongAKun 记得更新PR哈 |
close due to the following PR is merged: |
PR types
others
PR changes
others
Description
#55629