-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
19 changed files
with
395 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = source | ||
BUILDDIR = build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
EduNLP document and tutorial folder | ||
=================================== | ||
|
||
Requirements | ||
------------ | ||
See the requirements `docs_deps` in `setup.py`: | ||
```sh | ||
pip install -e .[doc] | ||
``` | ||
|
||
|
||
Build documents | ||
--------------- | ||
First, clean up existing files: | ||
``` | ||
make clean | ||
``` | ||
|
||
Then build: | ||
``` | ||
make html | ||
``` | ||
|
||
Render locally | ||
-------------- | ||
``` | ||
cd build/html | ||
python3 -m http.server 8000 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
@ECHO OFF | ||
|
||
pushd %~dp0 | ||
|
||
REM Command file for Sphinx documentation | ||
|
||
if "%SPHINXBUILD%" == "" ( | ||
set SPHINXBUILD=sphinx-build | ||
) | ||
set SOURCEDIR=source | ||
set BUILDDIR=build | ||
|
||
if "%1" == "" goto help | ||
|
||
%SPHINXBUILD% >NUL 2>NUL | ||
if errorlevel 9009 ( | ||
echo. | ||
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx | ||
echo.installed, then set the SPHINXBUILD environment variable to point | ||
echo.to the full path of the 'sphinx-build' executable. Alternatively you | ||
echo.may add the Sphinx directory to PATH. | ||
echo. | ||
echo.If you don't have Sphinx installed, grab it from | ||
echo.http://sphinx-doc.org/ | ||
exit /b 1 | ||
) | ||
|
||
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% | ||
goto end | ||
|
||
:help | ||
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% | ||
|
||
:end | ||
popd |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
EduNLP.Formula | ||
======================= | ||
|
||
.. automodule:: EduNLP.Formula.ast | ||
:members: | ||
:imported-members: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
EduNLP.I2V | ||
============ | ||
|
||
.. automodule:: EduNLP.I2V.i2v | ||
:members: | ||
:imported-members: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
EduNLP | ||
====== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
EduNLP.SIF | ||
============== | ||
|
||
SIF | ||
---------- | ||
.. automodule:: EduNLP.SIF.sif | ||
:members: | ||
:imported-members: | ||
|
||
|
||
Segment | ||
---------- | ||
.. automodule:: EduNLP.SIF.segment | ||
:members: | ||
:imported-members: | ||
|
||
|
||
Parser | ||
-------- | ||
.. automodule:: EduNLP.SIF.parser | ||
:members: | ||
:imported-members: | ||
|
||
|
||
Tokenization | ||
--------------- | ||
|
||
tokenize | ||
^^^^^^^^^^ | ||
.. automodule:: EduNLP.SIF.tokenization.tokenization | ||
:members: | ||
:imported-members: | ||
|
||
|
||
formula | ||
^^^^^^^^^ | ||
.. automodule:: EduNLP.SIF.tokenization.formula | ||
:members: | ||
:imported-members: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# Configuration file for the Sphinx documentation builder. | ||
# | ||
# This file only contains a selection of the most common options. For a full | ||
# list see the documentation: | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html | ||
|
||
# -- Path setup -------------------------------------------------------------- | ||
|
||
# If extensions (or modules to document with autodoc) are in another directory, | ||
# add these directories to sys.path here. If the directory is relative to the | ||
# documentation root, use os.path.abspath to make it absolute, like shown here. | ||
# | ||
import os | ||
import sys | ||
sys.path.insert(0, os.path.abspath('../')) | ||
|
||
import sphinx_rtd_theme | ||
|
||
# -- Project information ----------------------------------------------------- | ||
|
||
project = 'EduNLP' | ||
copyright = '2021, bigdata-ustc' | ||
author = 'bigdata-ustc' | ||
|
||
|
||
# -- General configuration --------------------------------------------------- | ||
|
||
# Add any Sphinx extension module names here, as strings. They can be | ||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom | ||
# ones. | ||
extensions = [ | ||
'sphinx.ext.autodoc', | ||
'sphinx.ext.autosummary', | ||
'sphinx.ext.intersphinx', | ||
'sphinx.ext.viewcode', | ||
'sphinx.ext.napoleon', | ||
'sphinx.ext.mathjax', | ||
'sphinx_toggleprompt', | ||
] | ||
|
||
# Add any paths that contain templates here, relative to this directory. | ||
templates_path = ['_templates'] | ||
|
||
# The language for content autogenerated by Sphinx. Refer to documentation | ||
# for a list of supported languages. | ||
# | ||
# This is also used if you do content translation via gettext catalogs. | ||
# Usually you set "language" from the command line for these cases. | ||
language = 'en' | ||
|
||
# List of patterns, relative to source directory, that match files and | ||
# directories to ignore when looking for source files. | ||
# This pattern also affects html_static_path and html_extra_path. | ||
exclude_patterns = [] | ||
|
||
|
||
# -- Options for HTML output ------------------------------------------------- | ||
|
||
# The theme to use for HTML and HTML Help pages. See the documentation for | ||
# a list of builtin themes. | ||
# | ||
html_theme = "sphinx_rtd_theme" | ||
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] | ||
|
||
# Add any paths that contain custom static files (such as style sheets) here, | ||
# relative to this directory. They are copied after the builtin static files, | ||
# so a file named "default.css" will overwrite the builtin "default.css". | ||
html_static_path = ['_static'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
.. EduNLP documentation master file, created by | ||
sphinx-quickstart on Sat Aug 7 19:55:39 2021. | ||
You can adapt this file completely to your liking, but it should at least | ||
contain the root `toctree` directive. | ||
=============================================== | ||
Welcome to EduNLP's Tutorials and Documentation | ||
=============================================== | ||
.. Logo | ||
.. image:: _static/EduNLP.png | ||
:width: 200px | ||
:align: center | ||
|
||
.. Badges | ||
.. image:: https://img.shields.io/pypi/v/EduNLP.svg | ||
:target: https://pypi.python.org/pypi/EduNLP | ||
|
||
.. image:: https://github.com/bigdata-ustc/EduNLP/actions/workflows/python-test.yml/badge.svg?branch=master | ||
:target: https://github.com/bigdata-ustc/EduNLP/actions/workflows/python-test.yml | ||
|
||
.. todo: add all badges in EduNLP/REAMD.md | ||
`EduNLP <https://github.com/bigdata-ustc/EduNLP>`_ is a library for advanced Natural Language Processing in Python and is one of the projects of `EduX <https://github.com/bigdata-ustc/EduX>`_ plan of BDAA. | ||
It's built on the very latest research, and was designed from day one to be used in real educational products. | ||
|
||
EduNLP now comes with pretrained pipelines and currently supports segment, tokenization and vertorization. It supports varies of preprocessing for NLP in educational scenario, such as formula parsing, multi-modal segment. | ||
|
||
EduNLP is commercial open-source software, released under the `Apache-2.0 license <https://github.com/bigdata-ustc/EduNLP/blob/master/LICENSE>`_. | ||
|
||
Install | ||
--------- | ||
EduNLP requires Python version 3.6, 3.7, 3.8 or 3.9. EduNLP use PyTorch as the backend tensor library. | ||
|
||
We recommend installing EduNLP by ``pip``: | ||
|
||
:: | ||
|
||
pip install EduNLP | ||
|
||
But you can also install from source: | ||
|
||
:: | ||
|
||
git clone https://github.com/bigdata-ustc/EduNLP.git | ||
cd EduNLP | ||
pip install . | ||
|
||
|
||
|
||
Getting Started | ||
------------------ | ||
For absolute beginners, start with the `Tutorial to EduNLP <tutorial/en/index>`_ `(中文版) <tutorial/zh/index>`_. | ||
It covers the basic concepts of EduNLP and | ||
a step-by-step on training, loading and using the language models. | ||
|
||
|
||
Contribution | ||
-------------- | ||
EduNLP is free software; you can redistribute it and/or modify it under the terms of the Apache License 2.0. | ||
We welcome contributions. Join us on GitHub and check out our `contribution guidelines <https://github.com/bigdata-ustc/EduNLP/blob/master/CONTRIBUTE.md>`_ `(中文版) <https://github.com/bigdata-ustc/EduNLP/blob/master/CONTRIBUTE_CH.md>`_. | ||
|
||
.. toctree:: | ||
:caption: Introduction | ||
:hidden: | ||
|
||
self | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Tutorial | ||
:hidden: | ||
:glob: | ||
|
||
tutorial/en/index | ||
tutorial/en/sif | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: 用户指南 | ||
:hidden: | ||
|
||
tutorial/zh/index | ||
tutorial/zh/sif | ||
tutorial/zh/seg | ||
tutorial/zh/parse | ||
tutorial/zh/tokenize | ||
tutorial/zh/vectorization | ||
tutorial/zh/pretrain | ||
|
||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:caption: API Reference | ||
:hidden: | ||
:glob: | ||
|
||
api/index | ||
api/i2v | ||
api/sif | ||
api/formula |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Get Started | ||
=========== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Standard Item Format | ||
==================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
入门 | ||
===== | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:titlesonly: | ||
|
||
sif | ||
seg | ||
parse | ||
tokenize | ||
vectorization |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
语法解析 | ||
========= | ||
|
||
在教育资源中,文本、公式都具有内在的隐式或显式的语法结构,提取这种结构对后续进一步的处理是大有裨益的: | ||
|
||
* 文本语法结构解析 | ||
* 公式语法结构解析 | ||
|
||
公式语法结构解析 | ||
-------------------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
预训练 | ||
======= | ||
|
||
在自然语言处理领域中,预训练语言模型(Pre-trained Language Models)已成为非常重要的基础技术。 | ||
我们将在本章节介绍EduNLP中预训练工具: | ||
|
||
* 如何从零开始用一份语料训练得到一个预训练模型 | ||
* 如何加载预训练模型 | ||
* 公开的预训练模型 | ||
|
||
|
||
训练模型 | ||
--------- | ||
|
||
装载模型 | ||
-------- | ||
|
||
公开模型一览 | ||
------------ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
成分分解 | ||
========= | ||
|
||
由于教育资源是一种多模态数据,包含了诸如文本、图片、公式等数据结构; | ||
同时在语义上也可能包含不同组成部分,例如题干、选项等,因此我们首先需要对教育资源的不同组成成分进行识别并进行分解: | ||
|
||
* 语义成分分解 | ||
* 结构成分分解 | ||
|
||
语义成分分解 | ||
------------ | ||
|
||
结构成分分解 | ||
------------ | ||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
标准项目格式 | ||
============ |
Oops, something went wrong.