Skip to content

Commit

Permalink
Merge pull request #9 from NanoNLP/development
Browse files Browse the repository at this point in the history
Development
  • Loading branch information
CoreySutphin authored Jan 3, 2019
2 parents 14db004 + 1ccda7e commit d76bb44
Show file tree
Hide file tree
Showing 144 changed files with 2,287 additions and 52,823 deletions.
4 changes: 4 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[run]
omit =
*/tests/*
*/__init__.py
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,8 @@ instance/
.scrapy

# Sphinx documentation
docs/_build/
docs/build/
docs/source/medacy*

# PyBuilder
target/
Expand Down Expand Up @@ -165,6 +166,9 @@ fabric.properties
# Editor-based Rest Client
.idea/httpRequests

# VSCode Settings
.vscode

*.pyc
*.egg-info
*.eggs/
Expand Down
90 changes: 23 additions & 67 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,95 +12,51 @@ MedaCy is a text processing and learning framework built over [spaCy](https://sp
- Customizable pipelines with detailed development instructions and documentation.
- Allows the designing of replicable NLP systems for reproducing results and encouraging the distribution of models whilst still allowing for privacy.
- Active community development spearheaded and maintained by [NLP@VCU](https://nlp.cs.vcu.edu/).
- Detailed [API](https://medacy.readthedocs.io/en/latest/)

## :thought_balloon: Where to ask questions

MedaCy is actively maintained by [@AndriyMulyar](https://github.com/AndriyMulyar)
and [@CoreySutphin](https://github.com/CoreySutphin). The best way to
receive immediate responses to any questions is to raise an issue. See how to formulate a good issue or feature request in the [Contribution Guide](/CONTRIBUTING.md).
receive immediate responses to any questions is to raise an issue. Make sure to first consult the [API](https://medacy.readthedocs.io/en/latest/). See how to formulate a good issue or feature request in the [Contribution Guide](CONTRIBUTING.md).

## :computer: Installation Instructions
Medacy can be installed for general use or for pipeline development / research purposes.

| Application | Run |
| ----------- |:-------------:|
| Prediction and Model Training (stable) | `pip install git+https://github.com/NanoNLP/medaCy.git` |
| Prediction and Model Training (latest) | `pip install git+https://github.com/NanoNLP/medaCy.git@development` |
| Prediction and Model Training (stable) | `pip install git+https://github.com/NanoNLP/medaCy.git --process-dependency-links` |
| Prediction and Model Training (latest) | `pip install git+https://github.com/NanoNLP/medaCy.git@development --process-dependency-links` |
| Pipeline Development and Contribution | [See Contribution Instructions](/CONTRIBUTING.md) |


**Note:** Make sure you have at the least spaCy's small model installed.

`pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz`


# :books: User Guide
Using medaCy is simple and [detailed examples](/examples) are provided:
1. Select a pipeline or build your own.
2. Load training data (raw text and annotations)
3. Instantiate a Model with your chosen pipeline, train on your annotated data, and retrieve a model for prediction!

Training and using a Named Entity Recognition model for Clinical Text using medaCy:
# :books: Power of medaCy
After installing medaCy and [medaCy's clinical model](examples/models/clinical_notes_model.md), simply run:

```python
from medacy.model import Model
from medacy.pipelines import ClinicalPipeline
from medacy.tools import DataLoader
from medacy.pipeline_components import MetaMap
import logging, sys

# See what medaCy is doing at any part of the learning or prediction process
logging.basicConfig(stream=sys.stdout,level=logging.INFO) #set level=logging.DEBUG for more information

# Load in and organize traiing and testing files
train_loader = DataLoader("/training/directory")
test_loader = DataLoader("/evaluation/directory")

# MetaMap is required for powerful ClinicalPipeline performance, configure to your MetaMap path
metamap = MetaMap(metamap_path="/home/share/programs/metamap/2016/public_mm/bin/metamap")

# Optionally pre-MetaMap data to speed up performance
train_loader.metamap(metamap)
test_loader.metamap(metamap)

# Choose which pipeline to use and what entities to classify
pipeline = ClinicalPipeline(metamap, entities=['Drug', 'Form', 'Route', 'ADE', 'Reason', 'Frequency', 'Duration', 'Dosage', 'Strength'])

# Initialize a Model with the pipeline it will use to preprocess the data
# The algorithm used for prediction is specified in the pipeline - ClinicalPipeline uses CRF(Conditional Random Field)
model = Model(pipeline)

# Run training docs through pipeline and fit the model
model.fit(train_loader)

# Perform 10-fold stratified cross-validation on the data used to fit the model
# Can also pass in a DataLoader instance to instead cross validate on new data
model.cross_validate(num_folds=10)

# Predictions appear in a /predictions subdirectory of your test data
model.predict(test_loader)

model = Model.load_external('medacy_model_clinical_notes')
annotation = model.predict("The patient was prescribed 1 capsule of Advil for 5 days.")
print(annotation)
```

One can also dump/load fitted models into a specified directory.
and receive instant predictions:
```python
model.fit(train_loader)
model.dump('/path/to/dump/to') # Trained model is now stored at specified directory
model.load('/path/to/dump/to') # Trained model is loaded back into medaCy

```

{'entities': {'T3': ('Drug', 40, 45, 'Advil'), 'T1': ('Dosage', 27, 28, '1'), 'T2': ('Form', 29, 36, 'capsule'), 'T4': ('Duration', 46, 56, 'for 5 days')}, 'relations': []}
```
To explore medaCy's other models or train your own, visit the [examples section](examples).

Reference
=========

> @ARTICLE {,
> author = "Andriy Mulyar, Natassja Lewinski and Bridget McInnes",
> title = "TAC SRIE 2018: Extracting Systematic Review Information with MedaCy",
> journal = "National Institute of Standards and Technology (NIST) 2018 Systematic Review Information Extraction (SRIE) > Text Analysis Conference",
> year = "2018",
> month = "nov"
> }
```
@ARTICLE {
author = "Andriy Mulyar, Natassja Lewinski and Bridget McInnes",
title = "TAC SRIE 2018: Extracting Systematic Review Information with MedaCy",
journal = "National Institute of Standards and Technology (NIST) 2018 Systematic Review Information Extraction (SRIE) > Text Analysis Conference",
year = "2018",
month = "nov"
}
```

License
=======
Expand All @@ -109,7 +65,7 @@ This package is licensed under the GNU General Public License

Authors
=======
Andriy Mulyar, Corey Sutphin, Bobby Best, Steele Farnsworth, and Bridget McInnes
Andriy Mulyar, Corey Sutphin, Bobby Best, Steele Farnsworth, and Bridget T McInnes

Acknowledgments
===============
Expand Down
5 changes: 3 additions & 2 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SPHINXBUILD = python -m sphinx
SOURCEDIR = source
BUILDDIR = build

Expand All @@ -16,4 +16,5 @@ help:
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
sphinx-apidoc -o source/ ../medacy/
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
18 changes: 12 additions & 6 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
import os
import sys
sys.path.insert(0, os.path.abspath('../../medacy'))


# -- Project information -----------------------------------------------------
Expand All @@ -26,7 +26,7 @@
# The short X.Y version
version = ''
# The full version, including alpha/beta/rc tags
release = '0.2'
release = '0.6'


# -- General configuration ---------------------------------------------------
Expand All @@ -42,11 +42,17 @@
'sphinx.ext.autodoc',
'sphinx.ext.doctest',
'sphinx.ext.coverage',
'sphinx.ext.napoleon',
'sphinx.ext.mathjax',
'sphinx.ext.ifconfig',
'sphinx.ext.viewcode',
]

autodoc_default_options = {
'members': None,
'private-members': None
}

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

Expand Down Expand Up @@ -80,7 +86,7 @@
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'
html_theme = 'classic'

# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
Expand Down Expand Up @@ -179,4 +185,4 @@
epub_exclude_files = ['search.html']


# -- Extension configuration -------------------------------------------------
# -- Extension configuration -------------------------------------------------
38 changes: 38 additions & 0 deletions docs/source/medacy.model.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
medacy.model package
====================

Submodules
----------

medacy.model.feature\_extractor module
--------------------------------------

.. automodule:: medacy.model.feature_extractor
:members:
:undoc-members:
:show-inheritance:

medacy.model.model module
-------------------------

.. automodule:: medacy.model.model
:members:
:undoc-members:
:show-inheritance:

medacy.model.stratified\_k\_fold module
---------------------------------------

.. automodule:: medacy.model.stratified_k_fold
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------

.. automodule:: medacy.model
:members:
:undoc-members:
:show-inheritance:
22 changes: 22 additions & 0 deletions docs/source/medacy.pipeline_components.annotation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
medacy.pipeline\_components.annotation package
==============================================

Submodules
----------

medacy.pipeline\_components.annotation.gold\_annotator\_component module
------------------------------------------------------------------------

.. automodule:: medacy.pipeline_components.annotation.gold_annotator_component
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------

.. automodule:: medacy.pipeline_components.annotation
:members:
:undoc-members:
:show-inheritance:
22 changes: 22 additions & 0 deletions docs/source/medacy.pipeline_components.base.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
medacy.pipeline\_components.base package
========================================

Submodules
----------

medacy.pipeline\_components.base.base\_component module
-------------------------------------------------------

.. automodule:: medacy.pipeline_components.base.base_component
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------

.. automodule:: medacy.pipeline_components.base
:members:
:undoc-members:
:show-inheritance:
30 changes: 30 additions & 0 deletions docs/source/medacy.pipeline_components.metamap.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
medacy.pipeline\_components.metamap package
===========================================

Submodules
----------

medacy.pipeline\_components.metamap.metamap module
--------------------------------------------------

.. automodule:: medacy.pipeline_components.metamap.metamap
:members:
:undoc-members:
:show-inheritance:

medacy.pipeline\_components.metamap.metamap\_component module
-------------------------------------------------------------

.. automodule:: medacy.pipeline_components.metamap.metamap_component
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------

.. automodule:: medacy.pipeline_components.metamap
:members:
:undoc-members:
:show-inheritance:
21 changes: 21 additions & 0 deletions docs/source/medacy.pipeline_components.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
medacy.pipeline\_components package
===================================

Subpackages
-----------

.. toctree::

medacy.pipeline_components.annotation
medacy.pipeline_components.base
medacy.pipeline_components.metamap
medacy.pipeline_components.tokenization
medacy.pipeline_components.units

Module contents
---------------

.. automodule:: medacy.pipeline_components
:members:
:undoc-members:
:show-inheritance:
Loading

0 comments on commit d76bb44

Please sign in to comment.