A unified Python package that standardizes existing implementations of similarity measures to faciliate comparisons across studies.
Paper: A Framework for Standardizing Similarity Measures in a Rapidly Evolving Field
Install via pip:
pip install git+https://github.com/nacloos/similarity-repository.git
For faster installation using uv
in a virtual environment (add --system
to install outside of virtual environment):
pip install uv
uv pip install git+https://github.com/nacloos/similarity-repository.git
Alternatively, clone and install locally:
git clone https://github.com/nacloos/similarity-repository.git
cd similarity-repository
pip install -e .
import numpy as np
import similarity
# generate two datasets
X, Y = np.random.randn(100, 30), np.random.randn(100, 30)
# measure their similarity
measure = similarity.make("measure/netrep/procrustes-distance=angular")
score = measure(X, Y)
Each similarity measure has a unique identifier composed of three parts:
- the object type (i.e.
measure
) - the repository name
- the measure name
See similarity/types/__init__.py
for a complete list of implemented measures.
All measures follow this interface:
- Inputs:
X, Y
- numpy arrays of shape(n_samples, n_features)
- Output:
score
- float value
Select all measures from a specific repository:
measures = similarity.make("measure/netrep/*")
for name, measure in measures.items():
score = measure(X, Y)
print(f"{name}: {score}")
Select all implementations of a specific measure across repositories:
measures = similarity.make("measure/*/procrustes-distance=angular")
for name, measure in measures.items():
score = measure(X, Y)
print(f"{name}: {score}")
Register your own measure:
# register the function with a unique id
def my_measure(x, y):
return x.reshape(-1) @ y.reshape(-1) / (np.linalg.norm(x) * np.linalg.norm(y))
similarity.register("my_repo/my_measure", my_measure)
# use it like any other measure
measure = similarity.make("my_repo/my_measure")
score = measure(X, Y)
similarity/registry
: all the registered github repositoriessimilarity/standardization.py
: mapping to standardize names and transformations to leverage relations between measuressimilarity/papers.py
: information about papers for each github repository in the registrysimilarity/types/__init__.py
: list with all the registered identifiers
If your implementation of similarity measures is missing, please contribute!
Follow these steps to register your own similarity measures:
-
Fork the repository.
-
Create a new folder in
similarity/registry/
for your repository and a__init__.py
file inside it. -
Register your measures using
similarity.register
. The easiest way is to copy your code with the similarity measures into the created folder and import them in your__init__.py
file. -
Use the naming convention
{repo_name}/{measure_name}
(you can use any measure name under your own namespace). -
Add your folder to imports in
similarity/registry/__init__.py
. -
Add your paper to
similarity/papers.py
.
You can then check that your measures have been registered correctly:
import similarity
X, Y = np.random.randn(50, 30), np.random.randn(50, 30)
measures = similarity.make("{repo_name}/{measure_name}")
score = measures(X, Y)
If you want to map your measures to standardized names, see similarity/standardization.py
. Standardized measures are under the measure/
namespace and have the form measure/{repo_name}/{standardized_measure_name}
. If your measure already exists in another repository, you can use the same standardized name. In this case, make sure your implementation is consistent with the existing ones. If your measure is new, you can propose a new standardized name.
Submit a pull request for your changes to be reviewed and merged.
For additional questions for how to contribute, please contact [email protected].
@inproceedings{
cloos2024framework,
title={A Framework for Standardizing Similarity Measures in a Rapidly Evolving Field},
author={Nathan Cloos and Guangyu Robert Yang and Christopher J Cueva},
booktitle={UniReps: 2nd Edition of the Workshop on Unifying Representations in Neural Models},
year={2024},
url={https://openreview.net/forum?id=vyRAYoxUuA}
}
For questions or feedback, please contact:
- Nathan Cloos ([email protected])
- Christopher Cueva ([email protected])