Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] pd: update paddle doc #4489

Open
wants to merge 9 commits into
base: devel
Choose a base branch
from

Conversation

HydrogenSulfate
Copy link
Contributor

@HydrogenSulfate HydrogenSulfate commented Dec 23, 2024

Add paddle icon svg and update related docs

Summary by CodeRabbit

  • New Features

    • Added Paddle as a supported deep learning backend for DeePMD-kit
    • Expanded documentation to include Paddle installation, training, and model support
    • Updated documentation for various model descriptors and training methods to include Paddle
  • Documentation

    • Added Paddle-specific installation instructions for different environments
    • Updated README and backend documentation to reflect Paddle support
    • Expanded training, fine-tuning, and parallel training guides to include Paddle

@HydrogenSulfate HydrogenSulfate changed the title pd: update paddle doc [WIP] pd: update paddle doc Dec 23, 2024
Copy link
Contributor

coderabbitai bot commented Dec 23, 2024

📝 Walkthrough

Walkthrough

This pull request adds comprehensive documentation support for the Paddle deep learning framework in the DeePMD-kit project. The changes span multiple documentation files, including README.md, backend documentation, installation guides, model-specific documentation, and training instructions. The updates consistently integrate Paddle as a new supported backend, adding icons, installation commands, training procedures, and framework-specific details across various sections of the project documentation.

Changes

File Change Summary
README.md Updated "Highlighted features" to include Paddle as a supported deep learning framework
doc/backend.md Added Paddle backend with file extensions .json, .pdiparams, and .pd
doc/conf.py Added Paddle icon substitution
doc/freeze/freeze.md Added Paddle freezing instructions
doc/install/easy-install.md Added Paddle installation commands for CUDA and CPU
doc/install/install-from-source.md Added Paddle-specific environment and CMake variables
doc/model/*.md Updated various model documentation to include Paddle backend support
doc/train/*.md Added Paddle implementation details for training, fine-tuning, parallel training, and TensorBoard

Suggested labels

python, documentation, backend-support

Suggested reviewers

  • njzjz
  • iProzd
  • wanghan-iapcm

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (13)
doc/freeze/freeze.md (1)

38-44: Minor grammar improvement.

Consider changing “The output model is called model.json and model.pdiparams.” to indicate that there are two distinct files:

- The output model is called `model.json` and `model.pdiparams`.
+ The output files are called `model.json` and `model.pdiparams`.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~44-~44: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ... where the model is trained. The output model is called model.json and `model.pdipa...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)

🪛 Markdownlint (0.37.0)

41-41: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

doc/env.md (2)

59-64: Consider adding usage examples for PD_JIT

The documentation clearly explains the purpose and options but would benefit from practical examples showing how to enable JIT in different scenarios and its impact on performance.


66-72: Add examples and clarify CINN compatibility

While the documentation mentions the CINN requirement, it would be helpful to:

  1. Add examples showing how to check CINN support and enable it
  2. Provide guidance on when to use CINN (e.g., specific operations or models that benefit from it)
  3. Document any known limitations or compatibility issues
doc/backend.md (2)

38-46: Enhance Paddle backend documentation

The Paddle backend section would benefit from additional information:

  1. Platform support (CPU/GPU/ROCm compatibility)
  2. Performance characteristics compared to other backends
  3. Migration guide for users switching from TensorFlow or PyTorch
  4. Known limitations or unsupported features

69-69: Add examples of backend switching commands

Consider adding concrete examples of the dp commands for each backend to make it clearer for users:

# Train with TensorFlow backend
dp train --tf input.json

# Train with PyTorch backend
dp train --pt input.json

# Train with Paddle backend
dp train --pd input.json
doc/train/training.md (1)

29-35: LGTM! Consider adding command output example.

The Paddle tab item is well-structured and consistent with other backend documentation. Consider adding an example of the command output to match the documentation style of other backends.

🧰 Tools
🪛 Markdownlint (0.37.0)

32-32: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

doc/train/finetuning.md (2)

255-255: Fix typo in documentation

"refering" should be "referring"

-One can check the available model branches in multi-task pre-trained model by refering to the documentation
+One can check the available model branches in multi-task pre-trained model by referring to the documentation
🧰 Tools
🪛 LanguageTool

[misspelling] ~255-~255: This word is normally spelled as one.
Context: ...n check the available model branches in multi-task pre-trained model by refering to the do...

(EN_COMPOUNDS_MULTI_TASK)


208-264: Standardize terminology usage

The document inconsistently uses both hyphenated and non-hyphenated versions of terms:

  • "pre-trained" vs "pretrained"
  • "multi-task" vs "multitask"

Please standardize the usage throughout the document.

🧰 Tools
🪛 Markdownlint (0.37.0)

214-214: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


221-221: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


228-228: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


251-251: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


258-258: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

🪛 LanguageTool

[uncategorized] ~210-~210: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...ilizes a single-task pre-trained model (pretrained.pd) and modifies the energy bias withi...

(EN_WORD_COHERENCY)


[misspelling] ~241-~241: This word is normally spelled as one.
Context: ...: {} } ``` #### Fine-tuning from a multi-task pre-trained model Additionally, within...

(EN_COMPOUNDS_MULTI_TASK)


[misspelling] ~243-~243: This word is normally spelled as one.
Context: ...bility offered by the framework and the multi-task training process proposed in DPA2 [pape...

(EN_COMPOUNDS_MULTI_TASK)


[typographical] ~247-~247: It appears that a comma is missing.
Context: ... the paper above. For fine-tuning using this multitask pre-trained model (`multitask...

(BY_FOR_IN_COMMA)


[grammar] ~247-~247: The verb ‘multitask’ is plural. Did you mean: “multitasks”? Did you use a verb instead of a noun?
Context: ...aper above. For fine-tuning using this multitask pre-trained model (`multitask_pretraine...

(PLURAL_VERB_AFTER_THIS)


[uncategorized] ~247-~247: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...using this multitask pre-trained model (multitask_pretrained.pd), one can select a specific branch ...

(EN_WORD_COHERENCY)


[uncategorized] ~248-~248: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...ch (e.g., CHOOSEN_BRANCH) included in multitask_pretrained.pd for fine-tuning with the following ...

(EN_WORD_COHERENCY)


[misspelling] ~255-~255: This word is normally spelled as one.
Context: ...n check the available model branches in multi-task pre-trained model by refering to the do...

(EN_COMPOUNDS_MULTI_TASK)

doc/train/parallel-training.md (3)

202-202: Fix punctuation and grammar

There are two issues in this line:

  1. Double period at the end
  2. Missing article before "sequential sampler"
-> **Note** The underlying dataloader will use a distributed sampler to ensure that each GPU receives batches with different content in parallel mode, which will use sequential sampler in serial mode. In the TensorFlow version, Horovod shuffles the dataset using different random seeds for the same purpose..
+> **Note** The underlying dataloader will use a distributed sampler to ensure that each GPU receives batches with different content in parallel mode, which will use a sequential sampler in serial mode. In the TensorFlow version, Horovod shuffles the dataset using different random seeds for the same purpose.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~202-~202: Possible missing article found.
Context: ...ontent in parallel mode, which will use sequential sampler in serial mode. In the TensorFl...

(AI_HYDRA_LEO_MISSING_A)


[typographical] ~202-~202: Two consecutive dots
Context: ...ferent random seeds for the same purpose.. ```mermaid flowchart LR subgraph s...

(DOUBLE_PUNCTUATION)


193-193: Consider using English documentation URL

The Paddle DDP documentation link is in Chinese. Consider using the English version of the documentation if available.


278-279: Enhance environment variable documentation

The note about NUM_WORKERS could be more specific about recommended ranges or provide examples of appropriate values.

Consider adding more specific guidance:

-If `NUM_WORKERS` is too large, it may cause the program to be terminated by the system;
-if it is too small, it may slow down data reading. You can try adjusting it to an appropriate size.
+If `NUM_WORKERS` is too large (e.g., > 16), it may cause the program to be terminated by the system;
+if it is too small (e.g., < 4), it may slow down data reading. We recommend starting with NUM_WORKERS=8
+and adjusting based on your system's capabilities.
doc/install/install-from-source.md (2)

365-366: Fix grammar and improve clarity.

The sentence has grammatical issues and can be clearer.

-If you want to use C++ interface of Paddle, you need to compile the Paddle inference library(C++ interface) manually from the [linux-compile-by-make](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/compile/linux-compile-by-make.html), then use the `.so` and `.a` files in `Paddle/build/paddle_inference_install_dir/`.
+To use the C++ interface of Paddle, you need to manually compile the Paddle inference library following the [Linux compilation guide](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/compile/linux-compile-by-make.html). After compilation, use the `.so` and `.a` files from `Paddle/build/paddle_inference_install_dir/`.
🧰 Tools
🪛 LanguageTool

[style] ~365-~365: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...-item} Paddle {{ paddle_icon }} If you want to use C++ interface of Paddle, you need t...

(REP_WANT_TO_VB)


441-442: Fix grammar in the assumption statement.

The sentence has a grammatical error with the verb "get".

-I assume you have get the Paddle inference library(C++ interface) to `$PADDLE_INFERENCE_DIR`, then execute CMake
+I assume you have obtained the Paddle inference library (C++ interface) in `$PADDLE_INFERENCE_DIR`. Execute CMake with:
🧰 Tools
🪛 LanguageTool

[grammar] ~441-~441: The verb form ‘get’ does not seem to be suitable in this context.
Context: ...le {{ paddle_icon }} I assume you have get the Paddle inference library(C++ interf...

(HAVE_VB)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cfe17a3 and 563a7bd.

⛔ Files ignored due to path filters (1)
  • doc/_static/paddle.svg is excluded by !**/*.svg
📒 Files selected for processing (16)
  • README.md (1 hunks)
  • doc/backend.md (3 hunks)
  • doc/conf.py (1 hunks)
  • doc/env.md (1 hunks)
  • doc/freeze/freeze.md (1 hunks)
  • doc/install/easy-install.md (1 hunks)
  • doc/install/install-from-source.md (10 hunks)
  • doc/model/dpa2.md (1 hunks)
  • doc/model/sel.md (1 hunks)
  • doc/model/train-energy.md (1 hunks)
  • doc/model/train-se-atten.md (1 hunks)
  • doc/model/train-se-e2-a.md (1 hunks)
  • doc/train/finetuning.md (2 hunks)
  • doc/train/parallel-training.md (2 hunks)
  • doc/train/tensorboard.md (1 hunks)
  • doc/train/training.md (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • doc/model/train-se-e2-a.md
  • doc/model/train-energy.md
🧰 Additional context used
🪛 Markdownlint (0.37.0)
doc/train/training.md

32-32: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

doc/freeze/freeze.md

41-41: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


50-50: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

doc/train/finetuning.md

214-214: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


221-221: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


228-228: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


251-251: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


258-258: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

🪛 LanguageTool
doc/freeze/freeze.md

[uncategorized] ~44-~44: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ... where the model is trained. The output model is called model.json and `model.pdipa...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)


[misspelling] ~46-~46: This word is normally spelled as one.
Context: ...model.jsonandmodel.pdiparams`. In [multi-task mode](../train/multi-task-training-pd.m...

(EN_COMPOUNDS_MULTI_TASK)

doc/train/parallel-training.md

[uncategorized] ~194-~194: Possible missing comma found.
Context: ...arallel (distributed) mode or in serial mode depending on your execution command. #...

(AI_HYDRA_LEO_MISSING_COMMA)


[misspelling] ~200-~200: This word is normally spelled as one.
Context: ...fines the number of threads involved in multi-threaded loading, which can be modified by setti...

(EN_COMPOUNDS_MULTI_THREADED)


[uncategorized] ~202-~202: Possible missing article found.
Context: ...ontent in parallel mode, which will use sequential sampler in serial mode. In the TensorFl...

(AI_HYDRA_LEO_MISSING_A)


[typographical] ~202-~202: Two consecutive dots
Context: ...ferent random seeds for the same purpose.. ```mermaid flowchart LR subgraph s...

(DOUBLE_PUNCTUATION)

doc/install/install-from-source.md

[style] ~365-~365: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...-item} Paddle {{ paddle_icon }} If you want to use C++ interface of Paddle, you need t...

(REP_WANT_TO_VB)


[grammar] ~441-~441: The verb form ‘get’ does not seem to be suitable in this context.
Context: ...le {{ paddle_icon }} I assume you have get the Paddle inference library(C++ interf...

(HAVE_VB)

doc/train/finetuning.md

[misspelling] ~202-~202: Did you mean “and”?
Context: ... In Paddle version, we have introduced an updated, more adaptable approach to fin...

(AN_AND)


[uncategorized] ~210-~210: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...ilizes a single-task pre-trained model (pretrained.pd) and modifies the energy bias withi...

(EN_WORD_COHERENCY)


[misspelling] ~241-~241: This word is normally spelled as one.
Context: ...: {} } ``` #### Fine-tuning from a multi-task pre-trained model Additionally, within...

(EN_COMPOUNDS_MULTI_TASK)


[misspelling] ~243-~243: This word is normally spelled as one.
Context: ...bility offered by the framework and the multi-task training process proposed in DPA2 [pape...

(EN_COMPOUNDS_MULTI_TASK)


[typographical] ~247-~247: It appears that a comma is missing.
Context: ... the paper above. For fine-tuning using this multitask pre-trained model (`multitask...

(BY_FOR_IN_COMMA)


[grammar] ~247-~247: The verb ‘multitask’ is plural. Did you mean: “multitasks”? Did you use a verb instead of a noun?
Context: ...aper above. For fine-tuning using this multitask pre-trained model (`multitask_pretraine...

(PLURAL_VERB_AFTER_THIS)


[uncategorized] ~247-~247: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...using this multitask pre-trained model (multitask_pretrained.pd), one can select a specific branch ...

(EN_WORD_COHERENCY)


[uncategorized] ~248-~248: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...ch (e.g., CHOOSEN_BRANCH) included in multitask_pretrained.pd for fine-tuning with the following ...

(EN_WORD_COHERENCY)


[misspelling] ~255-~255: This word is normally spelled as one.
Context: ...n check the available model branches in multi-task pre-trained model by refering to the do...

(EN_COMPOUNDS_MULTI_TASK)

🔇 Additional comments (10)
doc/conf.py (1)

170-170: Ensure the icon file is present.

The file reference "_static/paddle.svg" is newly introduced. Confirm that it exists and is properly placed under the "_static/" directory; otherwise, documentation builds may fail or show a broken icon link.

Use this script to verify the presence of "paddle.svg" in the repository:

✅ Verification successful

Icon file is present at the expected location

The file paddle.svg exists at doc/_static/paddle.svg, which is the correct location referenced in the configuration.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search the entire repo for paddle.svg to ensure it exists

fd paddle.svg

Length of output: 38

doc/model/sel.md (1)

35-39: LGTM!

The addition of Paddle is consistent with the existing pattern for other backends. This is well-aligned with the broader documentation updates.

doc/freeze/freeze.md (1)

46-54: LGTM!

The new multi-task instructions for Paddle are clear and consistent with the structure used for other backends.

🧰 Tools
🪛 LanguageTool

[misspelling] ~46-~46: This word is normally spelled as one.
Context: ...model.jsonandmodel.pdiparams`. In [multi-task mode](../train/multi-task-training-pd.m...

(EN_COMPOUNDS_MULTI_TASK)

🪛 Markdownlint (0.37.0)

50-50: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

doc/model/dpa2.md (1)

1-4: No issues found.

Adding the Paddle icon and listing it among supported backends is consistent with the broader changes.

doc/train/tensorboard.md (1)

1-4: Consider adding Paddle-specific TensorBoard instructions

While the Paddle backend has been added to the supported backends list, the document doesn't mention any Paddle-specific considerations or setup requirements for TensorBoard integration. Please clarify:

  1. Are there any special steps needed to enable TensorBoard with Paddle?
  2. Are all TensorBoard features (metrics tracking, model graph, etc.) fully supported with Paddle?
README.md (1)

22-22: LGTM! Documentation update is accurate and well-integrated.

The addition of Paddle to the list of supported backends is consistent with the PR objectives and maintains the documentation style.

doc/model/train-se-atten.md (1)

1-1: LGTM: Paddle backend support properly documented

The addition of Paddle icons in the header and supported backends note accurately reflects the new backend support.

Also applies to: 4-4

doc/train/finetuning.md (1)

200-264: LGTM: Paddle implementation properly documented

The Paddle implementation section accurately mirrors the PyTorch implementation, providing clear instructions for both single-task and multi-task fine-tuning scenarios.

🧰 Tools
🪛 Markdownlint (0.37.0)

214-214: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


221-221: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


228-228: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


251-251: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


258-258: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

🪛 LanguageTool

[misspelling] ~202-~202: Did you mean “and”?
Context: ... In Paddle version, we have introduced an updated, more adaptable approach to fin...

(AN_AND)


[uncategorized] ~210-~210: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...ilizes a single-task pre-trained model (pretrained.pd) and modifies the energy bias withi...

(EN_WORD_COHERENCY)


[misspelling] ~241-~241: This word is normally spelled as one.
Context: ...: {} } ``` #### Fine-tuning from a multi-task pre-trained model Additionally, within...

(EN_COMPOUNDS_MULTI_TASK)


[misspelling] ~243-~243: This word is normally spelled as one.
Context: ...bility offered by the framework and the multi-task training process proposed in DPA2 [pape...

(EN_COMPOUNDS_MULTI_TASK)


[typographical] ~247-~247: It appears that a comma is missing.
Context: ... the paper above. For fine-tuning using this multitask pre-trained model (`multitask...

(BY_FOR_IN_COMMA)


[grammar] ~247-~247: The verb ‘multitask’ is plural. Did you mean: “multitasks”? Did you use a verb instead of a noun?
Context: ...aper above. For fine-tuning using this multitask pre-trained model (`multitask_pretraine...

(PLURAL_VERB_AFTER_THIS)


[uncategorized] ~247-~247: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...using this multitask pre-trained model (multitask_pretrained.pd), one can select a specific branch ...

(EN_WORD_COHERENCY)


[uncategorized] ~248-~248: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...ch (e.g., CHOOSEN_BRANCH) included in multitask_pretrained.pd for fine-tuning with the following ...

(EN_WORD_COHERENCY)


[misspelling] ~255-~255: This word is normally spelled as one.
Context: ...n check the available model branches in multi-task pre-trained model by refering to the do...

(EN_COMPOUNDS_MULTI_TASK)

doc/train/parallel-training.md (1)

191-281: LGTM: Paddle parallel training implementation well documented

The Paddle implementation section effectively documents the DDP setup, providing clear instructions for both single-node and multi-node training scenarios. The dataloader implementation and usage instructions are comprehensive.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~194-~194: Possible missing comma found.
Context: ...arallel (distributed) mode or in serial mode depending on your execution command. #...

(AI_HYDRA_LEO_MISSING_COMMA)


[misspelling] ~200-~200: This word is normally spelled as one.
Context: ...fines the number of threads involved in multi-threaded loading, which can be modified by setti...

(EN_COMPOUNDS_MULTI_THREADED)


[uncategorized] ~202-~202: Possible missing article found.
Context: ...ontent in parallel mode, which will use sequential sampler in serial mode. In the TensorFl...

(AI_HYDRA_LEO_MISSING_A)


[typographical] ~202-~202: Two consecutive dots
Context: ...ferent random seeds for the same purpose.. ```mermaid flowchart LR subgraph s...

(DOUBLE_PUNCTUATION)

doc/install/install-from-source.md (1)

212-218: LGTM! Well-documented environment and CMake variables.

The documentation for environment variables and CMake variables is clear, complete, and follows the same format as other backends.

Also applies to: 233-239, 480-487, 496-503

Comment on lines 203 to 206
```bash
pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
pip install deepmd-kit-cu11
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Verify package repository URL.

The CUDA 11.8 installation uses the CUDA 12.3 package repository URL. This should be updated to use the correct CUDA version.

-pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
+pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

Comment on lines 212 to 215
```bash
pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
pip install deepmd-kit
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix CPU installation command.

The CPU installation command incorrectly uses the GPU package. For CPU-only installations, use the CPU-specific package.

-pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
+pip install paddlepaddle==3.0.0b2 -i https://www.paddlepaddle.org.cn/whl/linux/cpu/stable.html

Comment on lines 189 to 221
::::::{tab-item} Paddle {{ paddle_icon }}

:::::{tab-set}

::::{tab-item} CUDA 12.3

```bash
pip install deepmd-kit[paddle]
```

::::

::::{tab-item} CUDA 11.8

```bash
pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
pip install deepmd-kit-cu11
```

::::

::::{tab-item} CPU

```bash
pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
pip install deepmd-kit
```

::::

:::::

::::::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Warning needed: PaddlePaddle 3.0.0b2 is still in beta

The PyPI version history confirms that PaddlePaddle 3.0.0b2 is the latest version and is still in beta status. Since the installation instructions rely on this beta version, we should add explicit warnings about:

  • Using beta software in production environments
  • Potential API instability and breaking changes
  • The lack of a stable 3.0.0 release

Specific locations needing attention:

  • doc/install/easy-install.md: Add a warning note above the Paddle installation instructions about beta status and stability implications
🔗 Analysis chain

Consider stability implications of beta dependency.

The installation instructions specify PaddlePaddle version 3.0.0b2, which is a beta release. Consider:

  1. Adding a warning about potential stability issues
  2. Documenting any known limitations
  3. Specifying when a stable release is expected
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are newer versions available
curl -s https://pypi.org/pypi/paddlepaddle/json | jq -r '.releases | keys[]' | sort -V | tail -n 5

Length of output: 136

Comment on lines +369 to +371
[Cuda118_cudnn860_Trt8531_D1/latest/paddle_inference.tgz](https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda118_cudnn860_Trt8531_D1/latest/paddle_inference.tgz)

[Cuda123_cudnn900_Trt8616_D1/latest/paddle_inference.tgz](https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda123_cudnn900_Trt8616_D1/latest/paddle_inference.tgz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Add last modified dates to pre-compiled library links

The links are actively maintained with recent updates (December 21, 2024). Adding the last modified dates would help users verify the freshness of these pre-compiled libraries:

  • Update link text for CUDA 11.8: [Cuda118_cudnn860_Trt8531_D1/latest/paddle_inference.tgz (Last updated: Dec 21, 2024)](https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda118_cudnn860_Trt8531_D1/latest/paddle_inference.tgz)
  • Update link text for CUDA 12.3: [Cuda123_cudnn900_Trt8616_D1/latest/paddle_inference.tgz (Last updated: Dec 21, 2024)](https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda123_cudnn900_Trt8616_D1/latest/paddle_inference.tgz)
🔗 Analysis chain

Consider adding version information for pre-compiled libraries.

The links to pre-compiled libraries should include version information or last updated date to help users make informed decisions.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the links are accessible and get last modified date
for url in \
  "https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda118_cudnn860_Trt8531_D1/latest/paddle_inference.tgz" \
  "https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda123_cudnn900_Trt8616_D1/latest/paddle_inference.tgz"; do
  echo "Checking $url"
  curl -sI "$url" | grep "Last-Modified"
done

Length of output: 1552

Comment on lines +101 to +106
# cu123
pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/
# cu118
pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu118/
# cpu
pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Use stable version installation commands instead of nightly builds

The repository's easy-install.md already contains stable version installation commands for PaddlePaddle. Replace the nightly build commands with stable version commands:

# cu123
pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
# cu118
pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
# cpu
pip install paddlepaddle==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

This aligns with the version requirement (3.0 or above) mentioned in backend.md and provides more stable installations compared to nightly builds.

🔗 Analysis chain

Consider providing stable release installation commands.

The current installation commands use nightly builds (--pre flag), which might be unstable. Consider adding commands for installing stable releases or adding a note about potential instability of nightly builds.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if stable versions are available
curl -s https://www.paddlepaddle.org.cn/documentation/docs/en/install/pip/linux-pip_en.html | grep -A 5 "pip install paddlepaddle"

Length of output: 132


Script:

#!/bin/bash
# Let's check if there are any mentions of stable PaddlePaddle versions in the repository
rg -i "paddlepaddle.*stable" --type md

# Also check for any version constraints or recommendations
rg -i "paddlepaddle.*version" --type md

# Check if there are any other installation instructions for PaddlePaddle
fd -e md -x grep -l "paddlepaddle" {}

Length of output: 612


Script:

#!/bin/bash
# Let's check the stable version installation commands from easy-install.md
cat doc/install/easy-install.md | grep -A 5 "paddlepaddle.*stable"

# Also check the version requirement mentioned in backend.md
cat doc/backend.md | grep -A 2 -B 2 "Paddle.*version"

Length of output: 989

@github-actions github-actions bot added the Docs label Dec 23, 2024
Copy link

codecov bot commented Dec 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.55%. Comparing base (3cecca4) to head (7064b3a).
Report is 12 commits behind head on devel.

Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #4489      +/-   ##
==========================================
- Coverage   84.59%   84.55%   -0.04%     
==========================================
  Files         675      677       +2     
  Lines       63575    63902     +327     
  Branches     3486     3487       +1     
==========================================
+ Hits        53779    54034     +255     
- Misses       8671     8742      +71     
- Partials     1125     1126       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

README.md Outdated Show resolved Hide resolved
doc/_static/paddle.svg Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Currently, parallel training in paddle version is implemented in the form of Paddle Distributed Data Parallelism [DDP](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/06_distributed_training/cluster_quick_start_collective_cn.html).
DeePMD-kit will decide whether to launch the training in parallel (distributed) mode or in serial mode depending on your execution command.

### Dataloader and Dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this section copied from PyTorch? I suggest only maintaining one for the same documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this section copied from PyTorch? I suggest only maintaining one for the same documentation

You are right, this is duplicated and removed now.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
doc/install/install-from-source.md (1)

101-106: 🛠️ Refactor suggestion

Use stable version installation commands

Replace the nightly build commands with stable version commands for better stability:

-# cu123
-pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/
-# cu118
-pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu118/
-# cpu
-pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/
+# cu123
+pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
+# cu118
+pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
+# cpu
+pip install paddlepaddle==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
🧹 Nitpick comments (4)
doc/train/finetuning.md (3)

213-215: Format commands according to documentation standards.

Remove the $ prefix from commands to align with markdown best practices.

Apply this format to all commands:

-$ dp --pd train input.json --finetune pretrained.pd
+dp --pd train input.json --finetune pretrained.pd

Also applies to: 220-222, 227-229, 250-252, 257-259

🧰 Tools
🪛 Markdownlint (0.37.0)

214-214: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


247-252: Fix typo in branch name example.

Correct the spelling of "CHOOSEN_BRANCH" to "CHOSEN_BRANCH" in the example.

-one can select a specific branch (e.g., `CHOOSEN_BRANCH`) included in `multitask_pretrained.pd` for fine-tuning with the following command:
+one can select a specific branch (e.g., `CHOSEN_BRANCH`) included in `multitask_pretrained.pd` for fine-tuning with the following command:

-$ dp --pd train input.json --finetune multitask_pretrained.pd --model-branch CHOOSEN_BRANCH
+dp --pd train input.json --finetune multitask_pretrained.pd --model-branch CHOSEN_BRANCH
🧰 Tools
🪛 Markdownlint (0.37.0)

251-251: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

🪛 LanguageTool

[typographical] ~247-~247: It appears that a comma is missing.
Context: ... the paper above. For fine-tuning using this multitask pre-trained model (`multitask...

(BY_FOR_IN_COMMA)


[grammar] ~247-~247: The verb ‘multitask’ is plural. Did you mean: “multitasks”? Did you use a verb instead of a noun?
Context: ...aper above. For fine-tuning using this multitask pre-trained model (`multitask_pretraine...

(PLURAL_VERB_AFTER_THIS)


[uncategorized] ~247-~247: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...using this multitask pre-trained model (multitask_pretrained.pd), one can select a specific branch ...

(EN_WORD_COHERENCY)


[uncategorized] ~248-~248: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...ch (e.g., CHOOSEN_BRANCH) included in multitask_pretrained.pd for fine-tuning with the following ...

(EN_WORD_COHERENCY)


254-256: Fix typo in note section.

Correct the spelling of "refering" to "referring".

-One can check the available model branches in multi-task pre-trained model by refering to the documentation of the pre-trained model or by using the following command:
+One can check the available model branches in multi-task pre-trained model by referring to the documentation of the pre-trained model or by using the following command:
🧰 Tools
🪛 LanguageTool

[misspelling] ~255-~255: This word is normally spelled as one.
Context: ...n check the available model branches in multi-task pre-trained model by refering to the do...

(EN_COMPOUNDS_MULTI_TASK)

doc/install/install-from-source.md (1)

437-437: Fix grammatical error

The phrase "have get" is grammatically incorrect.

-I assume you have get the Paddle inference library(C++ interface)
+I assume you have obtained the Paddle inference library (C++ interface)
🧰 Tools
🪛 LanguageTool

[grammar] ~437-~437: The verb form ‘get’ does not seem to be suitable in this context.
Context: ...le {{ paddle_icon }} I assume you have get the Paddle inference library(C++ interf...

(HAVE_VB)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 563a7bd and 7064b3a.

⛔ Files ignored due to path filters (1)
  • doc/_static/paddle.svg is excluded by !**/*.svg
📒 Files selected for processing (6)
  • README.md (1 hunks)
  • doc/install/easy-install.md (1 hunks)
  • doc/install/install-from-source.md (9 hunks)
  • doc/model/dpa2.md (1 hunks)
  • doc/train/finetuning.md (2 hunks)
  • doc/train/parallel-training.md (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • README.md
  • doc/model/dpa2.md
  • doc/install/easy-install.md
🧰 Additional context used
🪛 LanguageTool
doc/install/install-from-source.md

[style] ~359-~359: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...-item} Paddle {{ paddle_icon }} If you want to use C++ interface of Paddle, you need t...

(REP_WANT_TO_VB)


[uncategorized] ~361-~361: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...rence_install_dir/`. We also provide a weekly-build Paddle C++ library for Linux x86_64 wit...

(HYPHENATED_LY_ADVERB_ADJECTIVE)


[grammar] ~437-~437: The verb form ‘get’ does not seem to be suitable in this context.
Context: ...le {{ paddle_icon }} I assume you have get the Paddle inference library(C++ interf...

(HAVE_VB)

doc/train/finetuning.md

[misspelling] ~202-~202: Did you mean “and”?
Context: ... In Paddle version, we have introduced an updated, more adaptable approach to fin...

(AN_AND)


[uncategorized] ~210-~210: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...ilizes a single-task pre-trained model (pretrained.pd) and modifies the energy bias withi...

(EN_WORD_COHERENCY)


[misspelling] ~241-~241: This word is normally spelled as one.
Context: ...: {} } ``` #### Fine-tuning from a multi-task pre-trained model Additionally, within...

(EN_COMPOUNDS_MULTI_TASK)


[misspelling] ~243-~243: This word is normally spelled as one.
Context: ...bility offered by the framework and the multi-task training process proposed in DPA2 [pape...

(EN_COMPOUNDS_MULTI_TASK)


[typographical] ~247-~247: It appears that a comma is missing.
Context: ... the paper above. For fine-tuning using this multitask pre-trained model (`multitask...

(BY_FOR_IN_COMMA)


[grammar] ~247-~247: The verb ‘multitask’ is plural. Did you mean: “multitasks”? Did you use a verb instead of a noun?
Context: ...aper above. For fine-tuning using this multitask pre-trained model (`multitask_pretraine...

(PLURAL_VERB_AFTER_THIS)


[uncategorized] ~247-~247: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...using this multitask pre-trained model (multitask_pretrained.pd), one can select a specific branch ...

(EN_WORD_COHERENCY)


[uncategorized] ~248-~248: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text.
Context: ...ch (e.g., CHOOSEN_BRANCH) included in multitask_pretrained.pd for fine-tuning with the following ...

(EN_WORD_COHERENCY)


[misspelling] ~255-~255: This word is normally spelled as one.
Context: ...n check the available model branches in multi-task pre-trained model by refering to the do...

(EN_COMPOUNDS_MULTI_TASK)

🪛 Markdownlint (0.37.0)
doc/train/finetuning.md

214-214: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


221-221: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


228-228: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


251-251: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)


258-258: null
Dollar signs used before commands without showing output

(MD014, commands-show-output)

⏰ Context from checks skipped due to timeout of 90000ms (19)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Analyze (python)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test C++ (true)
🔇 Additional comments (5)
doc/train/finetuning.md (3)

1-4: LGTM! Header and note section updated correctly.

The changes consistently integrate Paddle as a supported backend alongside TensorFlow and PyTorch.


200-203: LGTM! Clear and consistent introduction.

The introduction aligns well with the existing PyTorch implementation structure, maintaining documentation consistency.

🧰 Tools
🪛 LanguageTool

[misspelling] ~202-~202: Did you mean “and”?
Context: ... In Paddle version, we have introduced an updated, more adaptable approach to fin...

(AN_AND)


264-264: Add multi-task fine-tuning section for Paddle.

The documentation is missing the multi-task fine-tuning section for Paddle, which is present in the PyTorch implementation. Please add this section to maintain consistency across backends and provide complete documentation for Paddle users.

Would you like me to help generate the missing multi-task fine-tuning section based on the PyTorch implementation?

doc/train/parallel-training.md (1)

191-225: Documentation for Paddle parallel training looks good!

The implementation details, commands, and environment variables are well documented. The examples cover both single-node and multi-node scenarios effectively.

doc/install/install-from-source.md (1)

363-365: Add last modified dates to pre-compiled library links

The links are actively maintained with recent updates. Adding the last modified dates would help users verify the freshness of these pre-compiled libraries:

-[Cuda118_cudnn860_Trt8531_D1/latest/paddle_inference.tgz](https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda118_cudnn860_Trt8531_D1/latest/paddle_inference.tgz)
+[Cuda118_cudnn860_Trt8531_D1/latest/paddle_inference.tgz (Last updated: Dec 21, 2024)](https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda118_cudnn860_Trt8531_D1/latest/paddle_inference.tgz)

-[Cuda123_cudnn900_Trt8616_D1/latest/paddle_inference.tgz](https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda123_cudnn900_Trt8616_D1/latest/paddle_inference.tgz)
+[Cuda123_cudnn900_Trt8616_D1/latest/paddle_inference.tgz (Last updated: Dec 21, 2024)](https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda123_cudnn900_Trt8616_D1/latest/paddle_inference.tgz)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants