Skip to content

Commit

Permalink
0.4.0 (#163)
Browse files Browse the repository at this point in the history
* RBX file format (#158)

* Initial commit

* Fix coding style

* Shorten revision number and header checksum

* Filesystem test uses RBX format

* Recurse into object tree

* Check for header/body class match

* Fix coding style

* Variable compression levels

* Remove length header

* Outline of specification

* Convert checksums to HMACs

* Spiff up a little bit

* Tidy up

* Use Native base serializer under the hood

* Better error messages

* Base serializer now injectable

* Introduce encrypted variant

* Appease Stan

* Add beef to RBXP payload hash

* Rename RBX portable to RBX standard

* Use password digests by default

* Appease Stan

* More appeasement

* Unrestricted digest length

* Benchmark serializers

* Change default base serializers

* Switch payload HMAC to sha256

* Fix hmac

* Fix mkdocs nav

* RBX use checksums instead of HMACs

* No default password

* Tidy up

* Remove PHP 8.0 from CI due to 3rd party incompatibility

* Move RBXE to Extras package

* Appease Stan

* Dynamic column width in console output (#149)

* Added dynamic column size in console
Refactoring Console.php

* Cast columnSize to int

* Replace array_reduce with foreach

* Mark tests as skipped

* GitHub CLA

* GitHub CLA check

* GitHub CLA check with other username

* Embed library version in RBX format

* Appease Stan

* Added custom class revision mismatch exception

* Add RBX stuff to the user guide

* Deprecate Igbinary serializer

* New Transformer: Boolean Converter (#159)

* add a boolean converter which converts true to 1 and false to 0.

* updating the BooleanConverter too accept a customizable true/false value. Also updated docs to include the BooleanConverter

* fix up the PHPdoc. Failed static analysis.

* working through static analysis failures.

* using single quotes

* improvements per Andrew's comments on the PR

* Add windows latest to CI build environments

* Add fileinfo to required CI extensions

* add a boolean converter which converts true to 1 and false to 0.

* updating the BooleanConverter too accept a customizable true/false value. Also updated docs to include the BooleanConverter

* fix up the PHPdoc. Failed static analysis.

* working through static analysis failures.

* using single quotes

* improvements per Andrew's comments on the PR

Co-authored-by: Andrew DalPino <[email protected]>

* Appease Stan

* Remove debug.log that should have been ignored by Git

* Appease Stan

* Update changelog

* Tighten up RBX format

* Initial commit (#162)

* Deprecate explainedVar() and noiseVar() methods on PCA and LDA

* Add missing extension specification and exception

* Rename Autotrack Revisions

* No need to sort singular values

* Implement transformer conduits

* Add return transformers method to Conduit

* Clean up

* Revert conduits

* Single-threaded by default

* Bump Tensor version requirement

* Polish up for release

Co-authored-by: Vladimir Stepanov <[email protected]>
Co-authored-by: Zachary Vander Velden <[email protected]>
  • Loading branch information
3 people authored Mar 7, 2021
1 parent 61de61f commit 4320e3f
Show file tree
Hide file tree
Showing 128 changed files with 1,604 additions and 362 deletions.
2 changes: 1 addition & 1 deletion .php_cs.dist
Original file line number Diff line number Diff line change
Expand Up @@ -122,4 +122,4 @@ return Config::create()->setRules([
'trim_array_spaces' => true,
'unary_operator_spaces' => true,
'whitespace_after_comma_in_array' => true,
])->setFinder($finder);
])->setFinder($finder);
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
- 0.4.0
- Added Truncated SVD transformer
- Added Rubix Object File (RBX) format serializer
- Added class revision() method to the Persistable interface
- Added custom class revision mismatch exception
- Add Boolean Converter transformer
- Deprecated Igbinary serializer and move to Extras package
- Deprecate explainedVar() and noiseVar() methods on PCA and LDA
- Added missing extension specification and exception

- 0.3.2
- Fix t-SNE momentum gain bus error when using Tensor extension
- Optimize t-SNE matrix instantiation
Expand Down
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,10 @@ $ composer require rubix/ml
#### Optional

- [Extras Package](https://github.com/RubixML/Extras) for experimental features
- [SVM extension](https://php.net/manual/en/book.svm.php) for Support Vector Machine engine (libsvm)
- [Mbstring extension](https://www.php.net/manual/en/book.mbstring.php) for fast multibyte string manipulation
- [GD extension](https://php.net/manual/en/book.image.php) for image manipulation
- [Mbstring extension](https://www.php.net/manual/en/book.mbstring.php) for fast multibyte string manipulation
- [SVM extension](https://php.net/manual/en/book.svm.php) for Support Vector Machine engine (libsvm)
- [Redis extension](https://github.com/phpredis/phpredis) for persisting to a Redis DB
- [Igbinary extension](https://github.com/igbinary/igbinary) for binary serialization of persistables

## Documentation
Read the latest docs [here](https://docs.rubixml.com).
Expand Down
59 changes: 59 additions & 0 deletions benchmarks/Persisters/Serializers/GzipBench.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
<?php

namespace Rubix\ML\Benchmarks\Persisters\Serializers;

use Rubix\ML\Datasets\Generators\Blob;
use Rubix\ML\Classifiers\KNearestNeighbors;
use Rubix\ML\Datasets\Generators\Agglomerate;
use Rubix\ML\Persisters\Serializers\Gzip;

/**
* @Groups({"Serializers"})
* @BeforeMethods({"setUp"})
*/
class GzipBench
{
protected const TRAINING_SIZE = 2500;

/**
* @var \Rubix\ML\Persisters\Serializers\Gzip
*/
protected $serializer;

/**
* @var \Rubix\ML\Persistable
*/
protected $persistable;

public function setUp() : void
{
$generator = new Agglomerate([
'Iris-setosa' => new Blob([5.0, 3.42, 1.46, 0.24], [0.35, 0.38, 0.17, 0.1]),
'Iris-versicolor' => new Blob([5.94, 2.77, 4.26, 1.33], [0.51, 0.31, 0.47, 0.2]),
'Iris-virginica' => new Blob([6.59, 2.97, 5.55, 2.03], [0.63, 0.32, 0.55, 0.27]),
]);

$training = $generator->generate(self::TRAINING_SIZE);

$estimator = new KNearestNeighbors(5, true);

$estimator->train($training);

$this->persistable = $estimator;

$this->serializer = new Gzip();
}

/**
* @Subject
* @revs(10)
* @Iterations(5)
* @OutputTimeUnit("milliseconds", precision=3)
*/
public function serializeUnserialize() : void
{
$encoding = $this->serializer->serialize($this->persistable);

$persistable = $this->serializer->unserialize($encoding);
}
}
59 changes: 59 additions & 0 deletions benchmarks/Persisters/Serializers/NativeBench.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
<?php

namespace Rubix\ML\Benchmarks\Persisters\Serializers;

use Rubix\ML\Datasets\Generators\Blob;
use Rubix\ML\Classifiers\KNearestNeighbors;
use Rubix\ML\Datasets\Generators\Agglomerate;
use Rubix\ML\Persisters\Serializers\Native;

/**
* @Groups({"Serializers"})
* @BeforeMethods({"setUp"})
*/
class NativeBench
{
protected const TRAINING_SIZE = 2500;

/**
* @var \Rubix\ML\Persisters\Serializers\Native
*/
protected $serializer;

/**
* @var \Rubix\ML\Persistable
*/
protected $persistable;

public function setUp() : void
{
$generator = new Agglomerate([
'Iris-setosa' => new Blob([5.0, 3.42, 1.46, 0.24], [0.35, 0.38, 0.17, 0.1]),
'Iris-versicolor' => new Blob([5.94, 2.77, 4.26, 1.33], [0.51, 0.31, 0.47, 0.2]),
'Iris-virginica' => new Blob([6.59, 2.97, 5.55, 2.03], [0.63, 0.32, 0.55, 0.27]),
]);

$training = $generator->generate(self::TRAINING_SIZE);

$estimator = new KNearestNeighbors(5, true);

$estimator->train($training);

$this->persistable = $estimator;

$this->serializer = new Native();
}

/**
* @Subject
* @revs(10)
* @Iterations(5)
* @OutputTimeUnit("milliseconds", precision=3)
*/
public function serializeUnserialize() : void
{
$encoding = $this->serializer->serialize($this->persistable);

$persistable = $this->serializer->unserialize($encoding);
}
}
59 changes: 59 additions & 0 deletions benchmarks/Persisters/Serializers/RBXBench.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
<?php

namespace Rubix\ML\Benchmarks\Persisters\Serializers;

use Rubix\ML\Datasets\Generators\Blob;
use Rubix\ML\Classifiers\KNearestNeighbors;
use Rubix\ML\Datasets\Generators\Agglomerate;
use Rubix\ML\Persisters\Serializers\RBX;

/**
* @Groups({"Serializers"})
* @BeforeMethods({"setUp"})
*/
class RBXBench
{
protected const TRAINING_SIZE = 2500;

/**
* @var \Rubix\ML\Persisters\Serializers\RBX
*/
protected $serializer;

/**
* @var \Rubix\ML\Persistable
*/
protected $persistable;

public function setUp() : void
{
$generator = new Agglomerate([
'Iris-setosa' => new Blob([5.0, 3.42, 1.46, 0.24], [0.35, 0.38, 0.17, 0.1]),
'Iris-versicolor' => new Blob([5.94, 2.77, 4.26, 1.33], [0.51, 0.31, 0.47, 0.2]),
'Iris-virginica' => new Blob([6.59, 2.97, 5.55, 2.03], [0.63, 0.32, 0.55, 0.27]),
]);

$training = $generator->generate(self::TRAINING_SIZE);

$estimator = new KNearestNeighbors(5, true);

$estimator->train($training);

$this->persistable = $estimator;

$this->serializer = new RBX();
}

/**
* @Subject
* @revs(10)
* @Iterations(5)
* @OutputTimeUnit("milliseconds", precision=3)
*/
public function serializeUnserialize() : void
{
$encoding = $this->serializer->serialize($this->persistable);

$persistable = $this->serializer->unserialize($encoding);
}
}
46 changes: 46 additions & 0 deletions benchmarks/Transformers/NumericStringConverterBench.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
<?php

namespace Rubix\ML\Benchmarks\Transformers;

use Rubix\ML\Datasets\Generators\Blob;
use Rubix\ML\Transformers\NumericStringConverter;

/**
* @Groups({"Transformers"})
* @BeforeMethods({"setUp"})
*/
class NumericStringConverterBench
{
protected const DATASET_SIZE = 100000;

/**
* @var \Rubix\ML\Datasets\Dataset
*/
public $dataset;

/**
* @var \Rubix\ML\Transformers\NumericStringConverter
*/
protected $transformer;

public function setUp() : void
{
$generator = new Blob([0.0, 0.0, 0.0, 0.0]);

$this->dataset = $generator->generate(self::DATASET_SIZE)
->transformColumn(1, 'strval')
->transformColumn(3, 'strval');

$this->transformer = new NumericStringConverter();
}

/**
* @Subject
* @Iterations(3)
* @OutputTimeUnit("milliseconds", precision=3)
*/
public function apply() : void
{
$this->dataset->apply($this->transformer);
}
}
49 changes: 49 additions & 0 deletions benchmarks/Transformers/TruncatedSVDBench.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
<?php

namespace Rubix\ML\Benchmarks\Transformers;

use Rubix\ML\Datasets\Generators\Blob;
use Rubix\ML\Datasets\Generators\Agglomerate;
use Rubix\ML\Transformers\TruncatedSVD;

/**
* @Groups({"Transformers"})
* @BeforeMethods({"setUp"})
*/
class TruncatedSVDBench
{
protected const DATASET_SIZE = 10000;

/**
* @var \Rubix\ML\Datasets\Labeled
*/
public $dataset;

/**
* @var \Rubix\ML\Transformers\TruncatedSVD
*/
protected $transformer;

public function setUp() : void
{
$generator = new Agglomerate([
'Iris-setosa' => new Blob([5.0, 3.42, 1.46, 0.24], [0.35, 0.38, 0.17, 0.1]),
'Iris-versicolor' => new Blob([5.94, 2.77, 4.26, 1.33], [0.51, 0.31, 0.47, 0.2]),
'Iris-virginica' => new Blob([6.59, 2.97, 5.55, 2.03], [0.63, 0.32, 0.55, 0.27]),
]);

$this->dataset = $generator->generate(self::DATASET_SIZE);

$this->transformer = new TruncatedSVD(1);
}

/**
* @Subject
* @Iterations(3)
* @OutputTimeUnit("milliseconds", precision=3)
*/
public function apply() : void
{
$this->dataset->apply($this->transformer);
}
}
6 changes: 2 additions & 4 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"ext-json": "*",
"amphp/parallel": "^1.3",
"psr/log": "^1.1",
"rubix/tensor": "^2.0",
"rubix/tensor": "^2.2",
"symfony/polyfill-mbstring": "^1.0",
"symfony/polyfill-php73": "^1.20",
"symfony/polyfill-php80": "^1.17",
Expand All @@ -45,7 +45,7 @@
"require-dev": {
"friendsofphp/php-cs-fixer": "2.18.*",
"league/flysystem-memory": "^2.0",
"phpbench/phpbench": "1.0.0-alpha4",
"phpbench/phpbench": "1.0.0-alpha6",
"phpstan/extension-installer": "^1.0",
"phpstan/phpstan": "0.12.*",
"phpstan/phpstan-phpunit": "0.12.*",
Expand Down Expand Up @@ -96,8 +96,6 @@
"sort-packages": true,
"process-timeout": 3000
},
"minimum-stability": "dev",
"prefer-stable": true,
"funding": [
{
"type": "github",
Expand Down
6 changes: 3 additions & 3 deletions docs/hyper-parameter-tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,22 @@
Hyper-parameter tuning is an experimental process that incorporates [cross-validation](cross-validation.md) to guide hyper-parameter selection. When choosing an estimator for your project it often helps to fine-tune its hyper-parameters in order to get the best accuracy and performance from the model.

## Manual Tuning
In a manual scenario, a user will train an estimator with one set of hyper-parameters, obtain a validation score, and then use that as a baseline to make future adjustments. The goal at each iteration is to determine whether the adjustments improve accuracy or cause it to decrease. We can consider a model to be *fully* tuned when adjustments to the hyper-parameters can no longer make improvements to the validation score. In the example below, we'll tune the *radius* parameter of [Radius Neighbors Regressor](regressors/radius-neighbors-regressor.md) by iterating over the following block of code with a different setting each time. At first, we can start by choosing radius from a set of values and then honing in on the best value once we have obtained the settings with the highest [SMAPE](cross-validation/metrics/smape.md) score.
When actively tuning a model, we will train an estimator with one set of hyper-parameters, obtain a validation score, and then use that as a baseline to make future adjustments. The goal at each iteration is to determine whether the adjustments improve accuracy or cause it to decrease. We can consider a model to be *fully* tuned when adjustments to the hyper-parameters can no longer make improvements to the validation score. With practice, we'll develop an intuition for which parameters need adjusting. Refer to the API documentation for each learner for a description of each hyper-parameter. In the example below, we'll tune the *radius* parameter of [Radius Neighbors Regressor](regressors/radius-neighbors-regressor.md) by iterating over the following block of code with a different setting each time. At first, we can start by choosing radius from a set of values and then honing in on the best value once we have obtained the settings with the highest [SMAPE](cross-validation/metrics/smape.md) score.

```php
use Rubix\ML\Regressors\RadiusNeighborsRegressor;
use Rubix\ML\CrossValidation\Metrics\SMAPE;

[$training, $testing] = $dataset->randomize()->split(0.8);

$metric = new SMAPE();

$estimator = new RadiusNeighborsRegressor(0.5); // 0.1, 0.5, 1.0, 2.0, 5.0

$estimator->train($training);

$predictions = $estimator->predict($testing);

$metric = new SMAPE();

$score = $metric->score($predictions, $testing->labels());

echo $score;
Expand Down
5 changes: 2 additions & 3 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,7 @@ $ composer require rubix/ml
**Optional**

- [Extras Package](https://github.com/RubixML/Extras) for experimental features
- [SVM extension](https://php.net/manual/en/book.svm.php) for Support Vector Machine engine (libsvm)
- [Mbstring extension](https://www.php.net/manual/en/book.mbstring.php) for fast multibyte string manipulation
- [GD extension](https://php.net/manual/en/book.image.php) for image manipulation
- [Mbstring extension](https://www.php.net/manual/en/book.mbstring.php) for fast multibyte string manipulation
- [SVM extension](https://php.net/manual/en/book.svm.php) for Support Vector Machine engine (libsvm)
- [Redis extension](https://github.com/phpredis/phpredis) for persisting to a Redis DB
- [Igbinary extension](https://github.com/igbinary/igbinary) for binary serialization of persistables
Loading

0 comments on commit 4320e3f

Please sign in to comment.