Recent Changes

H2O

Zeno (3.30.1.1) - 8/10/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zeno/1/index.html

Bug

[PUBDEV-7119] - H2OFrames with fields containing double quotes/line breaks can now be converted to Pandas dataframe.
[PUBDEV-7489] - Impossible to set Max_depth to unlimited on DRF classifer
[PUBDEV-7635] - Model generation for MOJO/POJO are disabled when interaction columns are used in GLM.
[PUBDEV-7646] - Reproducibility Information Table now hidden in H2O-Flow.

New Feature

[PUBDEV-4915] - Added support for `offset_column` in the Stacked Ensemble metalearner.
[PUBDEV-4916] - Added support for `weights_column` in the Stacked Ensemble metalearner.
[PUBDEV-6807] - Added continued support to Generalized Additive Models for H2O.
[PUBDEV-7237] - The value of model parameters can be retrieved at the end of training, allowing users to retrieve an automatically chosen value when a parameter is set to AUTO.
[PUBDEV-7283] - H2O Frame is now able to be saved into a Hive table.
[PUBDEV-7467] - XGBoost can now be executed on an external Hadoop cluster.
[PUBDEV-7640] - Added the `contamination` parameter to Isolation Forest which is used to mark anomalous observations.
[PUBDEV-7641] - Introduced the `validation_response_column` parameter for Isolation Forest which allows users to name the response column in the validation frame.
[PUBDEV-7647] - Added official support for Java 14 in H2O.
[PUBDEV-7697] - Added external cluster startup timeout for XGBoost.

Task

[PUBDEV-7649] - Hadoop Docker image run independent of S3.
[PUBDEV-7673] - Upgraded the build/test environment to support R 4.0 and Roxygen2.7.1.1.

Improvement

[PUBDEV-6938] - Implemented TF-IDF algorithm to reflect how important a word is to a document or collection of documents.
[PUBDEV-6946] - GridSearch R API test added for Isolation Forest.
[PUBDEV-7444] - ‘AUTO’ option added for GLM & GAM family parameter.
[PUBDEV-7496] - XGBoost Variable Importances now computed using a Java predictor.
[PUBDEV-7547] - StackedEnsemble can now be created using only monotone models if user specifies `monotone_constraints` in AutoML.
[PUBDEV-7567] - Enabled using imported models as base models in Stacked Ensembles.
[PUBDEV-7651] - Removed deprecated H2O-Scala module.

Technical Task

[PUBDEV-7185] - Added Java backend to support MOJO in GAM.
[PUBDEV-7611] - Added support for `early_stopping` parameter in GAM and GLM.

Engineering Story

[PUBDEV-7701] - Sparkling Water Booklet removed from the H2O-3 repository.

Docs

[PUBDEV-7556] - Added H2O Client chapter to the User Guide which includes section on Sklearn integration.
[PUBDEV-7639] - Added documentation in the Isolation Forest section for the `contamination` parameter.
[PUBDEV-7648] - Added documentation in GLM & GAM, and the `family` & `link` algorithm parameters to include how `family` can now be set equal to AUTO.
[PUBDEV-7655] - Added `gains lift_bins` to the parameter appendix and added and example to the parameter in the Python documentation. Added an example for the Kolmogorov-Smirnov metric to the Python documentation.
[PUBDEV-7656] - Updated GAM and GLM documentation to include support for `early_stopping`.
[PUBDEV-7661] - Added the Kolmogorov-Smirnov metric formula to the Performance and Prediction chapter.
[PUBDEV-7679] - Added the `negativebinomial` value to the `family` parameter page.
[PUBDEV-7680] - Added the `ordinal` and `modified_huber` values to the `distribution` parameter page.
[PUBDEV-7682] - Updated deprecated parameter `loading_name` to `representation_name` and fixed the broken init link in the GLRM section of the User Guide.
[PUBDEV-7684] - Added a note in the User Guide Stacked Ensemble section about building a monotonic Stacked Ensemble.
[PUBDEV-7699] - Added documentation for how `balance_classes` is triggered.

Zahradnik (3.30.0.7) - 7/21/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/7/index.html

New Feature

[PUBDEV-7430] - Added support for partitionBy column in partitioned parquet or CSV files.

Task

[PUBDEV-7645] - Warning added for user if both a lamba value and lambda search are provided in GLM.

Improvement

[PUBDEV-5808] - Added `max_runtime_secs` parameter to Stacked Ensemble.
[HEXDEV-758] - Upgraded Jetty 9 and switched default webserver to Jetty 9.

Zahradnik (3.30.0.6) - 6/30/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/6/index.html

Bug

[PUBDEV-7630] - GLM Plug values are now propagated to MOJOs/POJOs.
[PUBDEV-7631] - In the Python documentation, the HGLM example now references `random_columns` by indices rather than by column name.
[PUBDEV-7642] - Fixed a link to H2O blogs in the R documentation.

New Feature

[PUBDEV-7404] - Added support for the Kolmogorov-Smirnov metric for binary classification models.

Docs

[PUBDEV-7625] - Added documentation in the Performance and Prediction chapter for the Kolmogorov-Smirnov metric.

Zahradnik (3.30.0.5) - 6/18/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/5/index.html

Bug

[PUBDEV-7329] - Fixed an issue that denied all requests to display H2O Flow in an iframe.
[PUBDEV-7563] - Importing with `use_temp_table=False` now works correctly on Teradata.
[PUBDEV-7588] - Building a GLM model with `interactions` and `lambda = 0` no longer produces a "Categorical value out of bounds" error.
[PUBDEV-7590] - Fixed an inconsistency that occurred when using `predict_leaf_node_assignment` with a path and with a terminal node. For trees with a max_depth of up to 63, the results now match. For max_depth of 64 or higher (for path and nodes that are "too deep"), H2O will no longer produce incorrect results. Instead it will return "NA" for tree paths and "-1" for node IDs.
[PUBDEV-7596] - Leaf node assignment now works correctly for trees with a depth >= 31. Note that for trees with a max_depth of 64 or higher, H2O will return "NA" for tree paths and "-1" for node IDs.
[PUBDEV-7599] - `allow_insecure_xgboost` now works correctly on Hadoop.

New Feature

[PUBDEV-7431] - HTML documentation is now available as a downloadable zip file.
[PUBDEV-7601] - Users can now retrieve the prediction contributions when running `mojo_predict_pandas` in Python.
[PUBDEV-7614] - H2O documentation is now available in an h2odriver distribution zip file.
[PUBDEV-7615] - Quantiles models during the training of other models are now recognized as a regular model.
[PUBDEV-7621] - The H2O-SCALA module is deprecated and will be removed in a future release.

Improvement

[PUBDEV-6424] - Added support for models built with any `family` when running makeGLMModel.
[PUBDEV-7586] - K8S Docker images for h2o-3 are now available.
[PUBDEV-7616] - Warnings are now produced during model building when using the Python client.

Docs

[PUBDEV-7144] - Added examples for saving and loading grids in the User Guide.
[PUBDEV-7587] - Improved the examples in the Performance and Prediction chapter.
[PUBDEV-7589] - In the AutoML Random Grid Search Parameters topic, removed the no-longer-supported `min_sum_hessian_in_leaf` parameter from the XGBoost table. Also added clarification on how GHL models are handled in an AutoML random grid search run.
[PUBDEV-7603] - In the Python documentation, add examples for Grid Metrics.
[PUBDEV-7626] - The value of T as described in the description for `categorical_encoding="enum_limited"` is 10, not 1024.

Zahradnik (3.30.0.4) - 6/1/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/4/index.html

Bug

[PUBDEV-7362] - h2o.merge() now works correctly when you joining an H2O frame where the join is on a column to another frame.
[PUBDEV-7454] - Fixed an issue that caused h2o.get_leaderboard to fail after creating an AutoML object, disconnecting the client, starting a new session, and then reconecting to the running H2O cluster for the re-attached H2OAutoML object.
[PUBDEV-7491] - Stacked Ensemble now inherits distributions/families supported by the metalearner.
[PUBDEV-7501] - Fixed an issue that caused AutoML to fail when the target included special characters.
[PUBDEV-7565] - CAcert is now supported with the Python API.
[PUBDEV-7569] - Water Meter and Form Login now work correctly.
[PUBDEV-7572] - In Aggregator, added support for retrieving the Mappings Frame.
[PUBDEV-7582] - Added support for using monotone constraints with Tweedie distribution in GBM.

New Feature

[PUBDEV-3292] - Added a new drop_duplicates function to drop duplicate observations from an H2O frame.
[PUBDEV-6250] - Partial dependence plots are now available for multiclass problems.

Improvement

[PUBDEV-7504] - Users now receive a warning if they try to get variable importances in Stacked Ensemble.
[PUBDEV-7527] - In XGBoost, removed the min_sum_hessian_in_leaf and min_data_in_leaf options, which are no longer supported by XGBoost. Also added the `colsample_bynode` option.
[PUBDEV-7549] - data.table warning messages are now suppressed inside h2o.automl() in R.

Docs

[PUBDEV-7518] - Added a "Training Models" section to the User Guide, which describes train() and train_segments().
[PUBDEV-7525] - Updated XGBoost to indicate that this version requires CUDA 9, and included information showing users how to check their CUDA version.
[PUBDEV-7526] - Added information about GAM support to the missing_values_handling parameter appendix entry.
[PUBDEV-7531] - Updated the Minio Instance topic.
[PUBDEV-7574] - `monotone_constraints` can now be used with `distribution=tweedie`.
[PUBDEV-7576] - Updated the PDP topic to include support for multinomial problems and updated the examples.
[PUBDEV-7585] - In the API-related Changes topic, noted that `min_sum_hessian_in_leaf` and `min_data_in_leaf` are no longer supported in XGBoost.

Zahradnik (3.30.0.3) - 5/12/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/3/index.html

Bug

[PUBDEV-7492] - Improved validation and error messages for CoxPH.
[PUBDEV-7498] - In XGBoost, the `predict_leaf_node_assignment` parameter now works correctly with multiclass.
[PUBDEV-7517] - Fixed an issue that caused GBM to fail when it encountered a bin that included a single value and the rest NAs.

Improvement

[PUBDEV-7103] - Updated the AutoML example in the R package.
[PUBDEV-7438] - PDPs now allow y-axis scaling options.
[PUBDEV-7463] - Improved speed for training and prediction of Stacked Ensembles.

Docs

[PUBDEV-6003] - Added tables showing parameter values and random grid space ranges to the AutoML chapter.
[PUBDEV-7343] - Improved the Hive import documentation.
[PUBDEV-7500] - Improved documentation for Quantiles in the User Guide.
[PUBDEV-7505] - Fixed the documented default value for `min_split_improvement` parameter in XGBoost.

Zahradnik (3.30.0.2) - 4/28/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/2/index.html

Bug

[PUBDEV-7400] - Fixed an issue that caused H2O to crash while debugging Python code using intellij/pycharm.
[PUBDEV-7426] - Fixed an issue that caused an assertion error while running Grid Search.
[PUBDEV-7434] - Training of a model based on a data frame that includes Target Encodings no longer fails due to a locked frame.
[PUBDEV-7439] - Added train_segments() to the R html documentation.
[PUBDEV-7441] - Target Encoder now unlocks the output frame.
[PUBDEV-7456] - Fixed the BiasTerm in XGBoost Contributions after upgrading to XGBoost 1.0.
[PUBDEV-7486] - GBM and XGBoost no longer ignore a column that includes a constant and NAs.

New Feature

[PUBDEV-7353] - Added the following options for customizing and retrieving threshold values.
- `threshold` allows you to specify the threshold value used for calculating the confusion matrix.
- `default_threshold` allows you to change the threshold that is used to binarise the predicted class probabilities.
- `reset_model_threshold` allows you to reset the model threshold.
[PUBDEV-7376] - Introduced Kubernetes integration. Docker image tests are now available on K8S and published to Docker Hug.
[PUBDEV-7408] - A progress bar is now available during Shap Contributions calculations.

Improvement

[PUBDEV-6417] - An H2O Frame containing weights can now be specified when running `make_metrics`.
[PUBDEV-7274] - Added POJO and MOJO support for all encodings in GBM.
[PUBDEV-7446] - Users will now receive an error if they attempt to run https in h2o.init() when starting a local cluster.
[PUBDEV-7465] - Added an `-allow_insecure_xgboost` option to h2o and h2odriver that allows XGBoost multinode to run in a secured cluster.
[PUBDEV-7469] - Only the leader node is exposed on K8S.

Docs

[PUBDEV-7020] - Updated the Target Encoding topic and examples based on the improved API.
[PUBDEV-7344] - Added a new "Supported Data Types" topic to the Algorithms chapter.
[PUBDEV-7442] - Added a new "Kubernetes Integration" topic to the Welcome chapter.
[PUBDEV-7443] - Fixed the links for the constrained k-means Python demos.
[PUBDEV-7447] - Fixed the R example in the GAM chapter.
[PUBDEV-7451] - Added clarification for when `min_mem_size` and `max_mem_size`` are set to NULL/None in h2o.init().
[PUBDEV-7459] - The link to the slideshare in the DRF chapter now points to https instead of http.
[PUBDEV-7461] - Added information about the h2o.get_leaderboard() function to the AutoML chapter of the User Guide.
[PUBDEV-7462] - Updated the MOJO Quickstart showing how to use PrintMojo to visualize MOJOs without requiring Graphviz.
[PUBDEV-7470] - The import_mojo() function now uses "path" instead of "dir" when downloading, importing, uploading, and saving models. Updated the examples in the documentation.

Zahradnik (3.30.0.1) - 4/3/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-zahradnik/1/index.html

Bug

[PUBDEV-7002] - Fixed an issue that caused performing multiple h2o.init() to fails with R on Windows.
[PUBDEV-7095] - Increased the default clouding time to avoid times out that resulted in a Cloud 1 under 4 error.
[PUBDEV-7341] - Removed obsolete exactLambdas parameter from GLM.

New Feature

[PUBDEV-6037] - Added support for a fractional response in GLM.
[PUBDEV-6807] - Added support for Generalized Additive Models (GAMs) in H2O. The documentation for this newly added algorithm can be found here.
[PUBDEV-6976] - Added support for parallel training (e.g. spark_apply in rsparkling or Python/R).
[PUBDEV-7229] - Added support for Continuous Bag of Words (CBOW) models in Word2Vec.
[PUBDEV-7266] - H2O can now predict OOME during parsing and stop the job if OOME is imminent.
[PUBDEV-7304] - Add GBM POJO support for SortByResponse and enumlimited.
[PUBDEV-7347] - Added support for Leaf Node Assignments in XGBoost and Isolation Forest MOJOs.
[PUBDEV-7352] - Added support for importing Stacked Ensemble MOJO models for scoring. (Note that this only applies to Stacked Ensembles that include algos with MOJO support.)
[PUBDEV-7405] - Added support for the `single_node_mode` parameter in CoxPH.
[PUBDEV-7409] - H2O now provides the original algorithm name for MOJO import.
[PUBDEV-7422] - Created a segmented model training interface in R.
[PUBDEV-7423] - Added a print method for the H2OSegmentModel object type in R.

Task

[PUBDEV-7232] - Removed the previously deprecated DeepWater Estimator function.
[PUBDEV-7385] - Now using Java-based scoring for XGBoostModels.

Improvement

[PUBDEV-4639] - In the H2O R package, `data.table` is now enabled by default (if installed).
[PUBDEV-6293] - In AutoML, users can try tuning the learning rate for the best model found during exploration in XGBoost and GBM. Note that the new `exploitation_ratio` parameter is still experimental.
[PUBDEV-6852] - Added out-of-the-box support for starting an h2o cluster on Kubernetes. Refer to this README for more information.
[PUBDEV-7087] - Improved the way AUC-PR is calculated.
[PUBDEV-7202] - Added an option to upload binary models from Python and R.

Docs

[PUBDEV-7076] - Added examples for Grid Search in the Python Module documentation.
[PUBDEV-7158] - Added examples to the R Reference Guide.
[PUBDEV-7350] - Added documentation for the fractional binomial family in the GLM section.
[PUBDEV-7351] - Added documentation for the new GAM algorithm.
[PUBDEV-7388] - Updated tab formatting for the `cluster_size_constraints` parameter appendix entry.
[PUBDEV-7406] - Updated the Target Encoding R example.
[PUBDEV-7407] - Included confusion matrix threshold details for binary and multiclass classification in the Performance and Prediction chapter.
[PUBDEV-7410] - Added documentation for new `upload_model` function.
[PUBDEV-7416] - Improved documentation around citing H2O in publications.
[PUBDEV-7428] - Added documentation for `single_node_mode` in CoxPH.

Yule (3.28.1.3) - 4/2/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yule/3/index.html

Bug

[PUBDEV-7337] - Fixed an issue that occurred during Hive SQL import with `fetch_mode=SINGLE`; improved Hive SQL import speed; added an option to specify the number of chunks to parse.
[PUBDEV-7386] - Hive delegation token refresh now recognizes `-runAsUser`.
[PUBDEV-7394] - Fixed `base_model` selection for Stacked Ensembles in Flow.
[PUBDEV-7396] - The Parquet parser now supports arbitrary precision decimal types.

Story

[PUBDEV-7391] - The H2O Hive parser now recognizes varchar column types.

Task

[PUBDEV-7414] - Hive tokens are now refreshed without distributing the Steam keytab.

Improvement

[PUBDEV-7171] - Users can now specify the `max_log_file_size` when starting H2O. The log file size currently defaults to 3MB.
[PUBDEV-7358] - Fixed the of parameters for TargetEncoder in Flow.
[PUBDEV-7390] - HostnameGuesser.isInetAddressOnNetwork is now public.
[PUBDEV-7402] - Improved mapper-side Hive delegation token acquisition. Now when H2O is started from Steam, the Hive delegation token will already be acquired when the cluster is up.

Docs

[PUBDEV-7380] - Added to docs that `transform` only works on numerical columns.
[PUBDEV-7419] - Added documentation for the new num_chunks_hint option that can be specified with `import_sql_table`.
[PUBDEV-7420] - Added documentation for the new `max_log_file_size` H2O starting parameter.

Yule (3.28.1.2) - 3/17/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yule/2/index.html

Bug

[PUBDEV-6787] - The `base_models` attribute in Stacked Ensembles is now populated in both Python and R.
Note that in Python, if there are no `base_models` in `_parms`, then `actual_params` is used to retrieve base_models, and it contains the names of the models. In R, `ensemble@model$base_models` is populated with a vector of base model names.
[PUBDEV-7293] - Fixed an issue that caused the leader node to be overloaded when parsing 30k+ Parquet files.
[PUBDEV-7305] - Fixed an issue that caused `model end_time` and `run_time` properties to return a value of 0 in client mode.
[PUBDEV-7357] - TargetEncoderModel's summary no longer prints the fold column as a column that is going to be encoded by this model.
[PUBDEV-7364] - When h2omapper fails before discovering SELF (ip & port), the log messages are no longer lost.

New Feature

[PUBDEV-7167] - Added DeepLearning MOJO support in Generic Models.

Improvement\

[PUBDEV-6599] - Changed the output format of `get_automl` in Python from a dictionary to an object.
[PUBDEV-7371] - Users can now specify `-hdfs_config` multiple times to specify multiple Hadoop config files.
[PUBDEV-7373] - Fixed an issue that caused the clouding process to time out for the Target Encoding module and resulted in a `Cloud 1 under 4` error.

Docs

[PUBDEV-7187] - Improved FAQ describing how to use the H2O-3 REST API from Java.

Yule (3.28.1.1) - 3/5/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yule/1/index.html

Bug

[PUBDEV-7323] - Added missing AutoML global functions to the Python and R documentation.
[PUBDEV-7325] - In the Python client, improved the H2OFrame documentation and properly labeled deprecated functions.
[PUBDEV-7334] - Fixed an issue that caused imported MOJOs to produce different predictions than the original model.

Engineering Story

[PUBDEV-7327] - Removed Sparling Water external backend code from H2O.

Docs

[PUBDEV-7328] - In the R client docs for h2o.head() and h2o.tail(), added an example showing how to control the number of columns to display in dataframe when using a Jupyter notebook with the R kernel.

Yu (3.28.0.4) - 2/23/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yu/4/index.html

Bug

[PUBDEV-6615] - DeepLearning MOJOs are now thread-safe.
[PUBDEV-7227] - Fixed an issue that caused h2oframe.apply to fail when run in Python 3.7. Note that Python 3.7 is still not officially supported, but support is a WIP.
[PUBDEV-7260] - XGBoost now correctly respects monotonicity constraints for all tree_methods.
[PUBDEV-7262] - Decision Tree descriptions no longer include more descriptions than `max_depth` splits.
[PUBDEV-7270] - Fixed an issue that caused `import_hive_table` to fail with a JDBC source and a partitioned table.
[PUBDEV-7271] - Improved the DKVManager sequential removal mechanism.
[PUBDEV-7279] - In XGBoost, added a message indicating that the `exact` tree method is not supported in multinode.
[PUBDEV-7308] - XGBoost ContributionsPredictor is now serializable.
[PUBDEV-7309] - Fixed a CRAN warning related to ellipsis within arguments in the R package.
[PUBDEV-7312] - Added support for specifying AWS session tokens.

New Feature

[PUBDEV-6447] - Added support for Constrained K-Means clustering.
[PUBDEV-6965] - In Stacked Ensembles, added support for "xgboost" and "naivebayes" in the `metalearner_algorithm` parameter.
[PUBDEV-7303] - Added support for `build_tree_one_node` in XGBoost.

Improvement

[PUBDEV-7136] - In the R client, users can now optionally specify the number of columns to display in `h2o.frame`, `h2o.head`, and `h2o.tail`.
[PUBDEV-7189] - Fixed an issue that caused AutoML to fail to run if XGBoost was disabled.
[PUBDEV-7253] - Stacktraces are no longer returned in `h2o.getGrid` when failed models are present.
[PUBDEV-7310] - Added `createNewChunks` with a "sparse" parameter in ChunkUtils.

Docs

[PUBDEV-6964] - Added an FAQ to the MOJO and POJO quick starts noting that MOJOs and POJOs are thread safe for all supported algorithms.
[PUBDEV-7213] - Added the new `cluster_size_constraints` parameter to the KMeans chapter.
[PUBDEV-7286] - Updated docs to specify that `mtries=-2` gives all features.
[PUBDEV-7314] - Updated EC2 and S3 Storage topic to include the new, optional AWS session token.

Yu (3.28.0.3) - 2/5/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yu/3/index.html

Bug

[PUBDEV-6745] - In the R client, fixed a parsing bug that occurred when using quotes with .csv files in as.data.frame().
[PUBDEV-6818] - Fixed an Unsupported Operation Exception in UDP-TCP-SEND.
[PUBDEV-7118] - GLM now supports coefficients on variable importance when model standardization is disabled.
[PUBDEV-7186] - In the Python client, rbind() can now be used on all numerical types.
[PUBDEV-7192] - In XGBoost, fixed an error that occurred during model prediction when OneHotExplicit was specified during model training.
[PUBDEV-7204] - Performing grid search over Target Encoding parameters now works correctly.
[PUBDEV-7244] - Fixed an issue that caused import_hive_table to not classload the JDBC driver.
[PUBDEV-7246] - MOJOs can now be built from XGBoost models built with an offset column.
[PUBDEV-7272] - Fixed an issue that cause the R and Python clients to return the wrong sensitivity metric value.
[PUBDEV-7273] - Fixed an incorrect sender port calculation in TimestampSnapshot.

New Feature

[PUBDEV-6502] - In AutoML, multinode XGBoost is now enabled by default.
[PUBDEV-7223] - Users can now specify a custom JDBC URL to retrieve the Hive Delegation token using hiveJdbcUrlPattern.

Task

[PUBDEV-7250] - In XGBoost fixed a deprecation warning for reg:linear.

Improvement

[PUBDEV-7190] - import_folder() can now be used when running H2O in GCS.
[PUBDEV-7226] - Added support for registering custom servlets.
[PUBDEV-7258] - In XGBoost, when a parameter with a synonym is updated, the synonymous parameter is now also updated.

Engineering Story

[PUBDEV-7247] - AutoBuffer.getInt() is now public.

Docs

[PUBDEV-7221] - Python examples for plot method on binomial models now use the correct method signature.
[PUBDEV-7222] - Updated custom_metric_func description to indicate that it is not supported in GLM.
[PUBDEV-7239] - Updated the AutoML documentation to indicate that multinode XGBoost is now turned on by default.
[PUBDEV-7256] - Fixed the description for the Hadoop -nthreads parameter.

Yu (3.28.0.2) - 1/20/2020

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yu/2/index.html

Bug

[PUBDEV-7069] - Fixed an issue that resulted in a "DistributedException java.lang.ClassNotFoundException: BAD" message.
[PUBDEV-7140] - Users can now specify either a model or a model key when checkpointing.
[PUBDEV-7149] - Fixed an issue that resulted in an endless loop when CsvParser parser $ sign was enclosed in quotes.
[PUBDEV-7161] - In GBM and DRF, fixed an AIOOBE error that occurred when the dataset included negative zeros (-0.0).
[PUBDEV-7173] - Fixed a race condition in the addWarningP method on Model class.
[PUBDEV-7177] - h2odriver now gets correct version of Hadoop dependencies.
[PUBDEV-7193] - Fixed a race condition in addVec.
[PUBDEV-7197] - Parallel Grid Search threads now call the Hyperspace iterator one at a time.
[PUBDEV-7201] - sklearn wrappers now expose wrapped estimator as a public property.
[PUBDEV-7205] - Fixed an issue in reading user_splits in Java.
[PUBDEV-7212] - Fixed an issue that caused rank vectors of Spearman correlation to have different chunk layouts.

Task

[PUBDEV-7057] - Added a JSON option of PrintMojo.
[PUBDEV-7120] - Improved the error message that displays when a user attempts to import data from an HDFS directory that is empty.
[PUBDEV-7176] - H2O can now read Hive table metadata two ways: either via direct Metastore access or via JDBC.

Improvement

[PUBDEV-6460] - Improved heuristics used for finding IP addresses on Hadoop in order to select the right subnet automatically.
[PUBDEV-7029] - Added support for `offset_column in XGBoost.
[PUBDEV-7089] - Users can now create tree visualizations without installing additional packages.
[PUBDEV-7135] - Added a new `download_model` function for downloading binary models in the R and Python clients.
[PUBDEV-7164] - Improved XGBoost performance.
[PUBDEV-7165] - When computing the correlation matrix of one or two H2OFrames (using `cor()`), users can now specify a method of either Pearson (default) or Spearman.
[PUBDEV-7194] - Users are now warned when they attempt to run AutoML with a validation frame and with nfolds > 0.
[PUBDEV-7196] - AutoML no longer trains a "Best of Family Stacked Ensemble" when only one family is specified.

Docs

[PUBDEV-6142] - Removed `ignored_columns` from the list of available paramters in AutoML.
[PUBDEV-6993] - Fixed a broken link in the JAVA FAQ.
[PUBDEV-7088] - Improved the documentation for Tree Class in the Python Client docs.
[PUBDEV-7155] - Clarified the difference between h2o.performance() and h2o.predict() in the Performance and Prediction chapter of the User Guide.
[PUBDEV-7159] - Incorporated HGLM documentation updates into the GLM booklet.
[PUBDEV-7191] - Added an FAQ for GC allocation failure in the FAQ > Clusters section.
[PUBDEV-7198] - In the Stacked Ensembles chapter, improved the metalearner support FAQ.
[PUBDEV-7214] - Added `offset_column` to the list of supported parameters in XGBoost.
[PUBDEV-7215] - Added information about recent API changes in AutoML to the API-Related Changes section in the User Guide.

Yu (3.28.0.1) - 12/16/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yu/1/index.html

Bug

[PUBDEV-5975] - AutoML reruns using, for example, the same project name, no project name, etc., now produce consistent results.
[PUBDEV-6708] - Fixed an issue that occcurred when running an AutoML instance twice using the same project_name. AutoML no longer appends new models to the existing leaderboard, which caused the models for the first run to attempt to get rescored against the new learderboard_frame.
[PUBDEV-6940] - Updated the list of stopping metric options for AutoML in Flow. Also added support for the aucpr stopping metric in AutoML.
[PUBDEV-6966] - When training a K-Means model, the framename is no longer missing in the training metrics.
[PUBDEV-6998] - In AutoML, the `project_name` is now restricted to the same constraints as h2o frames.
[PUBDEV-7064] - In GBM, fixed an NPE that occurred when sample rate < 1.
[PUBDEV-7065] - The AutoML backend no longer accepts `ignored_columns` that contain one of response column, fold column, or weights column.
[PUBDEV-7133] - XGBoost MOJO now works correctly in Spark.
[PUBDEV-7134] - The REST API ping thread now starts after the cluster is up.
[PUBDEV-7137] - Fixed an NPE at hex.tree.TreeHandler.fillNodeCategoricalSplitDescription(TreeHandler.java:272)

New Feature

[PUBDEV-5351] - Extended MOJO support for PCA
[PUBDEV-6509] - We are very excited to add HGLM (Hierarchical GLM) to our open source offering. As this is the first release, we only implemented the Gaussian family. However, stay tuned or better yet, tell us what distributions you want to see next. Try it out and send us your feedback!
[PUBDEV-6513] - MOJO Import is now available for XGBoost.
[PUBDEV-6715] - Improved integration of the H2O Python client with Sklearn.
[PUBDEV-6737] - Users can now specify monotonicity constraints in AutoML.
[PUBDEV-6749] - Users can now save and load grids to continue a Grid Search after a cluster restart.
[PUBDEV-6774] - Users can now specify a `parallelism` parameter when running grid search. A value of 1 indicagtes sequential building (default); a value of 0 is used for adapative parallelism; and any value greater than 1 sets the exact number of models built in parallel.
[PUBDEV-6796] - Added a function to calculate Spearman Correlation.
[PUBDEV-6840] - Users can now specify the order in which training steps will be executed during an AutoML run. This is done using the new `modeling_plan` option.
[PUBDEV-6890] - The `calibration_frame` and `calibrate_model` options can now be spcified in XGBoost.
[PUBDEV-6929] - Added support for OneHotExplicit categorical encoding in EasyPredictModelWrapper.
[PUBDEV-7072] - Added aucpr to the AutoML leaderboard, stopping_metric, and sort_metric.
[PUBDEV-7074] - An AutoML leaderboard extension is now available that includes model training time and model scoring time.
[PUBDEV-7082] - Exposed the location of Chunks in the REST API.
[PUBDEV-7096] - Added a `rest_api_ping_timeout` option, which can stop a cluster if nothing has touched the REST API for the specified timeout.
[PUBDEV-7105] - Added support for Java 13.
[PUBDEV-7127] - H2O no longer performs an internal self-check when converting trees in H2O.

Task

[PUBDEV-6793] - Fixed an XGBoost error on multinode with AutoML.
[PUBDEV-6815] - Added checkpointing to XGBoost.
[PUBDEV-6975] - Users can now perform random grid search over target encoding hyperparameters
[PUBDEV-7058] - Improved Grid Search testing in Flow.

Improvement

[PUBDEV-4986] - When specifying a `stopping_metric`, H2O now supports lowercase and uppercase characters.
[PUBDEV-6195] - Added a warning message to AutoML if the leaderboard is empty due to too little time for training.
[PUBDEV-6612] - In AutoML, blending frame details were added to event_log.
[PUBDEV-6754] - If early stopping is enabled, GBM can reset the ntree value. In these cases, added an `ntrees_actual` (Python)/`get_ntrees_actual` (R) method to provide the actual ntree value (whether CV is enabled or not) rather than the original ntree value set by the user before building a model.
[PUBDEV-6824] - Refactored AutoML to improve integration with Target Encoding.
[PUBDEV-6928] - Exposed `get_automl` from `h2o.automl` in the Python client.
[PUBDEV-6935] - In GBM POJOs, one hot explicit EasyPredictModelWrapper now takes care of the encoding, and the user does not need to explicitly apply it.
[PUBDEV-6969] - Added support for numeric arrays to IcedHashMap.
[PUBDEV-7059] - Improved the AutoML Flow UI.
[PUBDEV-7066] - The `mae`, `rmsle`, and `aucpr` stopping metrics are now available in Grid Search.
[PUBDEV-7073] - When creating a hex.genmodel.easy.EasyPredictModelWrapper with contributions enabled, H2O now uses slf4j in the library, giving more control to users about when/where warnings will be printed.
[PUBDEV-7148] - Moved the order of AUCPR in the list of values for `stopping_metric` to right after AUC.

Engineering Story

[PUBDEV-7099] - Removed unused code in UDPClientEvent.

Docs

[PUBDEV-6675] - Added examples to the Python Module documentation DRF chapter.
[PUBDEV-6712] - Added examples to the Binomial Models section in the Python Python Module documentation.
[PUBDEV-6728] - Added examples to the Multimonial Models section in the Python Python Module documentation.
[PUBDEV-6730] - Added examples to the Clustering Methods section in the Python Module documentation.
[PUBDEV-6731] - Added examples to the Regression section in the Python documentation.
[PUBDEV-6741] - Added examples to the Autoencoder section in the Python documentation.
[PUBDEV-6742] - Added examples to the Tree Class section in the Python documentation.
[PUBDEV-6761] - Added examples to the Assembly section in the Python documentation.
[PUBDEV-6766] - Added examples to the Node, Leaf Node, and Split Leaf Node sections in the Python documentation.
[PUBDEV-6769] - Added examples to the H2O Module section in the Python documentation
[PUBDEV-6812] - Added examples to the H2OFrame section in the Python documentation
[PUBDEV-6828] - Documented support for `checkpointing` in XGBoost.
[PUBDEV-6830] - Added examples to the GroupBy section in the Python documentation.
[PUBDEV-6841] - Update to the supported platform table in the XGBoost chapter.
[PUBDEV-6849] - Added R/Python examples to the metrics in Performance and Prediction section of the User Guide.
[PUBDEV-6851] - Added Parameter Appendix entries for CoxPH parameters.
[PUBDEV-6891] - Added examples to the GBM section in the Python documentation
[PUBDEV-6905] - Added a new Reference entry to the Target Encoding documentation.
[PUBDEV-6912] - Added examples to the KMeans section in the Python documentation.
[PUBDEV-6924] - Added examples to the CoxPH section in the Python documentation.
[PUBDEV-6939] - Added examples to the Deep Learning section in the Python documentation.
[PUBDEV-6972] - Added examples to the Stacked Ensembles section in the Python documentation.
[PUBDEV-6979] - Added new `use_spnego` option to the Starting H2O in R topic.
[PUBDEV-6986] - Added examples to the Target Encoding section in the Python documentation.
[PUBDEV-6988] - Added examples to the Aggregator section in the Python documentation.
[PUBDEV-6989] - Updated the XGBoost extramempercent FAQ.
[PUBDEV-7004] - Added examples to the PCA section in the Python documentation.
[PUBDEV-7019] - Added a new section for Installing and Starting H2O in the Python Client documentation.
[PUBDEV-7025] - Added examples to the SVD section in the Python documentation.
[PUBDEV-7030] - Improve the R and Python documentation for `search_criteria` in Grid Search.
[PUBDEV-7094] - Added an example using `predict_contributions` to the MOJO quick start.
[PUBDEV-7116] - Added examples to the PSVM section in the Python documentation.
[PUBDEV-7128] - Added documentation for HGLM in the GLM chapter.
[PUBDEV-7141] - Improved AutoML documentation:
- aucpr is now an available stopping metric and sort metric for AutoML.
- monotone_constraints can now be specified in AutoML.
- Added modeling_plan option to list of AutoML parameters.
[PUBDEV-7142] - MOJOs are now available for PCA.
[PUBDEV-7143] - MOJO models are now available for XGBoost.
[PUBDEV-7145] - calibration_frame and calibrate_model are now available in XGBoost.
[PUBDEV-7146] - Added Java 13 to list of supported Java versions.

Yau (3.26.0.11) - 12/05/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yau/11/index.html

Bug

[PUBDEV-6580] - The Python client now fails with descriptive message when attempting to run on an unsupported Java version.
[PUBDEV-6895] - Fixed an issue that caused h2o to fail when running on Hadoop with `-internal_secure_connections`.
[PUBDEV-6911] - H2OGenericEstimator can now be instantiated with no parameters.
[PUBDEV-6945] - Multi-node H2O XGBoost now returns reproducible results.
[PUBDEV-6995] - Fixed the backend default values for the `inflection_point` and `smoothing` parameters in Target Encoder.
[PUBDEV-7006] - Users can now specify the `noise` parameter when running Target Encoding in the R client or in Flow.
[PUBDEV-7036] - MOJO reader now uses stderr instead of stdout to show warnings.
[PUBDEV-7056] - Fixed an issue that allowed SPNEGO athentication to pass with any HTTP-Basic header.
[PUBDEV-7062] - When connecting to H2O via the Python client, users can now specify `allowed_properties="cacert"`.

New Feature

[PUBDEV-6213] - Added BroadcastJoinForTargetEncoding.

Task

[PUBDEV-6970] - Introduced AllCategorical and Threshold TE application strategies.

Improvement

[PUBDEV-7052] - Added a test to check XGBoost variable importance when trained on frames with shuffled input columns.
[PUBDEV-7053] - The package name for ai.h2o.org.eclipse.jetty.jaas.spi is now independent of the Jetty version.
[PUBDEV-7060] - The `offset_column` is now propogated to MOJO models.

Docs

[PUBDEV-7070] - Improved documentation for `stopping_metric` as it pertains to AutoML.

Yau (3.26.0.10) - 11/7/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yau/10/index.html

Bug

[HEXDEV-743] - Fixed an issue that caused H2O to ignore security configurations when running on Hadoop 3.x.

New Feature

[PUBDEV-7026] - Added a `disable_flow` option that can be specified when starting H2O to disable access to H2O Flow.
[PUBDEV-7040] - Version details are now exposed in cloud information.

Improvement

[PUBDEV-6831] - Removed duplicate definition for sample_rate in DRF, as this is already defined in shared tree model parameters.

Docs

[PUBDEV-7038] - Fixed documentation for Logloss scorer.

Yau (3.26.0.9) - 10/29/2019

Bug

[PUBDEV-6829] - Fixed an issue that caused sort on a multinode cluster (for example, 2 nodes) to be much slower than a single node cluster.
[PUBDEV-6934] - Fixed an issue that caused class conflicts between the released jars for h2o-genmodel-ext-xgboost and other Java packages.
[PUBDEV-6954] - Export checkpoint no longer fails to export all models created during a grid search.
[PUBDEV-7010] - In the Python client, H2OFrame.drop no longer modifies parameters.
[PUBDEV-7011] - Fixed an issue in the Python Client that caused model.actual_params to sometimes return a object instead of a dict.

Task

[PUBDEV-6977] - Fixed an issue that caused XGBoost to exhaust all memory on a node (-xmx+(1.2*-xmx)) on wide datasets.

Improvement

[PUBDEV-6847] - Created a Technical Note (TN) describing how to use MOJO Import when importing models from a different H2O version. This TN is available here: https://0xdata.atlassian.net/browse/TN-14.

Yau (3.26.0.8) - 10/17/2019

Bug

[PUBDEV-6019] - Fixed and ESPC row_layout assertion error that occurrend when run FrameTest.java.
[PUBDEV-6874] - In AutoML fixed an issue that resulted in poor predictions from SE on MNIST.
[PUBDEV-6893] - When saving files in Python, H2O now assumes the provided path is a directory even when an ending "/" is not included in the path.
[PUBDEV-6936] - The custom distribution function in GBM now works correctly for custom multinomial distributions.
[PUBDEV-6941] - In Target Encoding, added blending of posterior and prior during imputation of unseen values.
[PUBDEV-6949] - Removed H2O.STORE.clear() in FrameTest.java so that the deepSlice test could be enabled.
[PUBDEV-6956] - Python users can now specify `verify_ssl_certificates` and `cacert` when connecting to H2O.
[PUBDEV-6967] - Target Encoding now works correctly in Flow.

New Feature

[PUBDEV-6494] - Base models can have different training_frames in blending mode in Stacked Ensembles.
[PUBDEV-6739] - Imported MOJO models now show parameters of the original model.
[PUBDEV-6825] - Added the ability to clone ModelBuilder.

Task

[PUBDEV-6899] - R client users can now specify the Boolean `use_spnego` parameter when starting H2O.

Improvement

[PUBDEV-6263] - System level proxy is now bypassed when connecting to an H2O instance on localhost from R/Python.
[PUBDEV-6922] - Improved performance by sending data to and from external backend after a specified block size rather than by each item.
[PUBDEV-6948] - Disable HTTP TRACE requests.

Docs

[PUBDEV-6951] - Removed the "experimental" note in the AutoML chapter.
[PUBDEV-6953] - Fixed a broken link in XGBoost documentation.

Yau (3.26.0.6) - 10/1/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yau/6/index.html

Bug

[PUBDEV-6865] - download_csv/download_all_logs now works correctly on HTTPS when using the Python client.
[PUBDEV-6870] - Fixed an error in PredictCSV unused column detection.
[PUBDEV-6896] - Fixed a potential deadlock issue with the AutoML leaderboard.
[PUBDEV-6902] - Model summary and model_performance output now display correctly in Zeppelin.
[PUBDEV-6903] - Added support for SPNEGO in h2odriver.
[PUBDEV-6904] - Fixed a missing chunk issue on the external backend.

New Feature

[PUBDEV-6821] - In AutoML, you can now retrieve the leadernode using the REST API.
[PUBDEV-6866] - Added support for MAPR 6.0 and 6.1.
[PUBDEV-6892] - Added support for CDH 6.3.
[PUBDEV-6897] - Added POJO support for one hot explicit encoding in GBM.

Task

[PUBDEV-6883] - Added ability for countingErrorConsumer example to accumulate counters, not just for variables, but for each variable X and for each variable's value.

Improvement

[PUBDEV-6780] - Added a new `melt` function. This is similar to Pandas `melt` and converts an H2OFrame to key-value representation while (optionally) skipping NA values. (This is the inverse operation to pivot.)
[PUBDEV-6863] - Added POJO support for XGBoost models.
[PUBDEV-6880] - Removed the x-h2o-context-path header from H2O.
[PUBDEV-6885] - Upgraded H2O Flow to 0.10.7.

Docs

[PUBDEV-6795] - Moved MOJO Models topic from the Algorithms chapter in the User Guide to the Productionizing chapter.
[PUBDEV-6861] - Added information about GPU usage for XGBoost.
[PUBDEV-6894] - Added MapR 6.0 and 6.1 to list of supported Hadoop platforms.
[PUBDEV-6908] - List all supported Java versions rather than saying Java 8 and greater.
[PUBDEV-6913] - Updated Deep Learning parameter descriptions for `rate`, `rate_annealing`, and `rate_decay`. Also added these to the Parameters Appendix.
[PUBDEV-6914] - Updated the User Guide to indicate that POJOs are available for XGBoost.
[PUBDEV-6925] - Added CDH 6.3 to list of supported Hadoop platforms.

Yau (3.26.0.5) - 9/16/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yau/5/index.html

Bug

[PUBDEV-6886] - Fixes a critical bug in Flow: Flow loads but user cannot perform any action.

Yau (3.26.0.4) - 9/12/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yau/4/index.html

Bug

[PUBDEV-6366] - Fixed several broken metric methods in the Python and R clients.
[PUBDEV-6778] - The temp folder is no longer deleted after running `h2o.mojo_predict_df`.
[PUBDEV-6868] - Flow now works correctly in environments that already have a context path prefixed.

New Feature

[PUBDEV-6776] - Introduced a Transform operation together with Target Encoding Model.
[PUBDEV-6813] - Added support for CDH 5.15 and and CDH 5.16.

Improvement

[PUBDEV-6794] - In the Flow > Models menu, moved MOJO Model to below the list of algorithms and re-labeled it "Import MOJO Model."
[PUBDEV-6803] - Removed unnecessary read confirmation timeout.
[PUBDEV-6839] - Unified the Target Encoding API arguments with other models - using (x,y).
[PUBDEV-6848] - Target Encoder ignores non-categorical encoded columns.
[PUBDEV-6854] - Added "fetch mode" option to Flow. As a result, Hive users can now import tables from hive1.X from within Flow. Note that Hive 1.x doesn't support OFFSET. So for Hive 1.x, use import_hive_table or use non-distributed JDBC import (i.e., `fetch node = single`).

Docs

[PUBDEV-6823] - Added `plug_values` to the Parameters Appendix.
[PUBDEV-6850] - Added Python examples to CoxPH options.
[PUBDEV-6871] - Added Teradata to list of supported JDBC types.
[PUBDEV-6879] - Added CDH 5.15 and CDH 5.16 to list of supported Hadoop platforms.

Yau (3.26.0.3) - 8/23/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yau/3/index.html

Bug

[PUBDEV-6143] - Fixed an issue that caused an H2OResponseError after initialization of H2OSingularValueDecompositionEstimator.
[PUBDEV-6319] - AstGroup is no longer inconsistent after running it multiple times.
[PUBDEV-6413] - Fixed an issue that caused a mismatch between manual standard deviation and reported standard deviation for cross validation scores.
[PUBDEV-6603] - H2OFrame.split_frame() no longer leaks a _splitter object.
[PUBDEV-6719] - h2o.scale no longer modifies a frame in place. Instead, it now returns a new frame.
[PUBDEV-6732] - Users can export a model using java-rest-bindings.
[PUBDEV-6753] - Sorted grid search results by F2, F0point5 now correctly match the corresponding model metric.
[PUBDEV-6756] - Fixed an issue that caused XGBoost cox2 benchmark to fail with an NPE.
[PUBDEV-6770] - Tables with long titles now displaying properly for users who have installed Pandas.
[PUBDEV-6782] - ModelParametersSchemaV3 now displays the correct help messages.
[PUBDEV-6785] - Fixed an issue that caused the MRTask to fail due to race-condition in creating a new Frame. Note that this issue only occurred when assertions were enabled.
[PUBDEV-6789] - Fixed an issue that caused GLM plots to fail in the Python client.

New Feature

[PUBDEV-5504] - Added another mode to treat missing values: plug values. This value must be given by the user.
[PUBDEV-6805] - Implemented a re-try mechanism for requesting the flatfile on Hadoop.

Task

[PUBDEV-6597] - Added Flow support for 2D partial plots.
[PUBDEV-6784] - In Flow, fixed issues with the NPM audit report.

Improvement

[PUBDEV-6567] - Added support for predict_leaf_node_assignment() in XGBoost.
[PUBDEV-6718] - In Isolation Forest, improved documentation for aggregate depth and split ratios and described how these two values are calculated.
[PUBDEV-6724] - Removed MissingValuesHandling from XGBoost.

Docs

[PUBDEV-6703] - Added links to custom distribution and custom loss function demos.
[PUBDEV-6727] - Removed Java 7 from list of supported Java versions.
[PUBDEV-6733] - Added Shapley example to predict_contributions documentation.
[PUBDEV-6744] - Added H2ONode, H2OLeafNode, and H2OSplitNode to the Python client documentation.
[PUBDEV-6750] - In XGBoost, removed "enum" from the list of available categorical_encoding options.
[PUBDEV-6760] - In GLM improved the documentation for handling of categorical values.
[PUBDEV-6771] - Added predict_leaf_node_assignment to list of supported parameters in XGBoost.
[PUBDEV-6790] - The documentation for PSVM now indicates that it can be used for classification only.
[PUBDEV-6791] - The User Guide now includes all options that can be specified when running h2o.init() from the Python client.
[PUBDEV-6792] - Added bind_to_localhost to list of paramters for h2o.init() in the Python client docs.
[PUBDEV-6816] - Updated GLM parameters. "PlugValues" can now be specified for missing_values_handling, and when specified, a new `plug_values` option is available.

Yau (3.26.0.2) - 7/26/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yau/2/index.html

Bug

[PUBDEV-6112] - Fixed an NPE error that occurred on models StackedEnsemble in AutoML.
[PUBDEV-6587] - Improve the error message for rbind failures that resulted when rbinding datasets with long categorical levels.
[PUBDEV-6693] - In Flow, the scoring history deviance graph no longer displays if a custom distribution is not set.
[PUBDEV-6709] - pr_auc() now works correctly in the Python client.

New Feature

[PUBDEV-6255] - Added support for Target Encoding MOJOs.
[PUBDEV-6593] - Added support for Target Encoding transformation of data without a response column.
[PUBDEV-6640] - Added TargetEncoderBuilder (estimator) and TargetEncoderModel (transformer).
[PUBDEV-6682] - Added detailed MOJO metrics for DRF, Isolation Forest, and GLM MOJO models.
[PUBDEV-6684] - Added AUCPR to the list of available stopping_metric options.

Improvement

[PUBDEV-6436] - In Flow, users can now upload a MOJO, and a generic model will automatically be created from it.
[PUBDEV-6681] - Removed duplicated code for obtaining logs in Java.
[PUBDEV-6685] - Improved error handling in the downloadLogs method.
[PUBDEV-6690] - Disabled autocomplete on the Flow login form.

Docs

[PUBDEV-6674] - Added an entry for upload_custom_metric in the Parameters Appendix.
[PUBDEV-6678] - Added list of parameters that can be specified when building a Generic Model (MOJO import).
[PUBDEV-6687] - Updated documentation for MOJO Import.
[PUBDEV-6698] - Added "aucpr" to the list of available stopping_metric options.
[PUBDEV-6706] - Added an entry for export_checkpoints_dir in the Parameters Appendix.

Yau (3.26.0.1) - 7/15/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yau/1/index.html

Bug

[PUBDEV-5595] - Removed an unncessary warning in predict function that occcured when a test set was missing `fold_column`.
[PUBDEV-6359] - AutoML no longer continues training models after a job cancellation.
[PUBDEV-6453] - Fixed an issue that caused h2o Docker image builds to fail.
[PUBDEV-6552] - In XGBoost, parallel sparse matrix conversion is no longer using a non-threadsafe API.
[PUBDEV-6569] - AutoML uses a default value of 5 for `score_tree_interval` with all algorithms.
[PUBDEV-6576] - Fixed an issue that caused the Python client API to break when passing a frame to the constructor.
[PUBDEV-6601] - In Flow, you can now specify `blending_frrame` and `max_runtime_per_model` when running AutoML.
[PUBDEV-6627] - Frame Summary is now available when running the Python client in Zeppelin.
[PUBDEV-6657] - Fixed an issue that caused H2O.CLOUD._memary(idx).getTimestamp to return 0 rather than the timestamp of the remote node.
[PUBDEV-6661] - Fixed a link function NPE in MOJOs.
[PUBDEV-6673] - Fixed the frame.tocsv signature. Instead of passing true, false, this now takes CSVStreamParams.

New Feature

[PUBDEV-4076] - Added support for a custom Loss Metric in GBM.
[PUBDEV-6089] - When running AutoML in R or Python, and EventLog is now available.
[PUBDEV-6090] - When polling an AutoML run, an EventLog displays now rather than a progress bar.
[PUBDEV-6108] - CoxPH is now available in the Python client.
[PUBDEV-6134] - Added support for SVM in the h2o-3 R and Python clients.
[PUBDEV-6492] - Added Isolation Forest to Flow.
[PUBDEV-6510] - In XGBoost improved performance of moving sparse matrices to off-heap memory.
[PUBDEV-6518] - Logs from H2O can now be downloaded in plain text format.

Task

[PUBDEV-6015] - Deprecated support for Java 7.
[PUBDEV-6611] - Fixed an issue that caused h2o.scale to corrupt the frame when run over a frame with categorical columns.
[PUBDEV-6619] - Removed the Deep Water booklet from H2O-3 builds.

Improvement

[PUBDEV-5316] - AutoML runtime information is now stored and available in an EventLog.
[PUBDEV-5885] - Users can now pass an ID to training_frame in h2o.StackedEnsemble.
[PUBDEV-6410] - Added early stopping options to Isolation Forest.
[PUBDEV-6438] - Users can now build 2D Partial Dependence plots with the R and Python clients.
[PUBDEV-6482] - When loading MOJOs that were trained on older versions of H2O-3 into newer versions of H2O-3, users can now access all the information that was saved in the model object and use the MOJO to score.
[PUBDEV-6543] - Users can now specify a `row_index` parameter when building PDPs. This allows partial dependence to be calculated for a row.
[PUBDEV-6553] - Users can now specify a `row_index` parameter when building PDPs in Flow.
[PUBDEV-6573] - Enabled Java scoring for XGBoost MOJOs.
[PUBDEV-6590] - User can now delete an AutoML instance and all its dependencies from any client (including models and other dependencies).
[PUBDEV-6617] - h2o.mojo_predict_csv() and h2o.mojo_predict_pandas() now accept a setInvNumNA parameter.
[PUBDEV-6621] - Added support for TreeShap in DRF.
[PUBDEV-6633] - Added a `feature_frequencies` function in GBM, DRF, and IF, which retrieves the number of times a feature was used on a prediction path in a tree model.
[PUBDEV-6634] - Users can now retrieve variable split information in the Isolation Forest output.
[PUBDEV-6646] - Created a SharedTreeMojoModelWithContributions class, which provides a central location of contribs for DRF and GBM MOJO.
[PUBDEV-6647] - ScoreContributionsTask is no longer abstract.

Docs

[PUBDEV-6452] - Clarified in the GLM docs that h2o-3 determines the values of alpha and theta by minimizing the negative log-likelihood plus the same Regularization Penalty.
[PUBDEV-6500] - Create initial, alpha version of SVM documentation.
[PUBDEV-6554] - Added `upload_custom_distribution` to the Parameters Appendix.
[PUBDEV-6604] - Removed note in XGBoost documentation indicating that "Multi-node support is currently available as a Beta feature."
[PUBDEV-6608] - SVM R client documentation is now available.
[PUBDEV-6610] - Explained how the nthreads parameter can impact reproducibility.
[PUBDEV-6613] - Added stopping parameters to the Isolation Forest chapter.
[PUBDEV-6642] - Fixed the parameters listing display for predict and predict_leaf_node_assignment in the Python documentation.
[PUBDEV-6644] - DRF is now included in the list of supported algorithms for predict_contributions.
[PUBDEV-6648] - Added more examples to the Predict topic.
[PUBDEV-6650] - Improved Data Manipulation Python documentation.
[PUBDEV-6651] - Improved Modeling functions in the Python documentation.
[PUBDEV-6653] - Improved the tree_class Python documentation.
[PUBDEV-6654] - Improved the Model Metrics Python documentation.
[PUBDEV-6656] - Improved GLM documentation by informing users that they can only specify a list in the GLM `interactions` parameter.
[PUBDEV-6660] - Updated Flow documentation to include Isolation Forest.
[PUBDEV-6663] - Improved the Python documentation for h2o.frame().
[PUBDEV-6664] - Added examples to the TargetEncoding Python documentation.

Yates (3.24.0.5) - 6/18/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/5/index.html

Bug

[PUBDEV-6387] - Fixed a segmentation fault that occurred when running XGBoost with `booster=gblinear`.
[PUBDEV-6534] - Users can now rbind two frames when one frame contains all missing values in some of its columns.
[PUBDEV-6549] - ClearDKVTask now detects shared resources when deleting frames and models.
[PUBDEV-6592] - Fixed a TypeError in Python debugging.

New Feature

[PUBDEV-6515] - Fixed an issue that caused MOJO loading to fail when categorical values contained a newline character.
[PUBDEV-6530] - Users can now export a file directly to a compressed format (gzip) and choose a delimiter.
[PUBDEV-6548] - Users can now specify which certificate alias to use when starting H2O with SSL.
[PUBDEV-6582] - Added Conda install instructions to the download page.
[PUBDEV-6591] - Users can now specify a custom separator for CSV export.

Task

[PUBDEV-6457] - Fixed GLM std-error and Tweedie calculations.
[PUBDEV-6472] - Implemented dispersion factor optimization for Tweedie GLM.

Improvement

[PUBDEV-6458] - The MOJO Tree Visualizer and Tree API no longer show categorical splits as numeric and string.
[PUBDEV-6508] - Improved the user experience with Target Encoding in R by providing more meaningful error messages.
[PUBDEV-6520] - Users can now tokenize a frame to the Scala API to enable that using H2O's Word2Vec.
[PUBDEV-6525] - Defined several default values in the R API for Target Encoding.
[PUBDEV-6527] - Improved the user experience with Target Encoding in Python by providing more meaningful error messages.
[PUBDEV-6529] - Set default values for blending hyperparameters in Target Encoding when using the Python client.
[PUBDEV-6533] - Fixed an issue that resulted in a "NaN undefined" label in the Flow cluster status.
[PUBDEV-6538] - Exposed ClearDKVTask via REST API.
[PUBDEV-6547] - H2O-3 now provides a warning when using MOJO prediction with a test/validation dataset that has missing columns.
[PUBDEV-6575] - Upgraded the JTransforms library.

Docs

[PUBDEV-6392] - Added a Best Practices sub section to Starting H2O in the User Guide.
[PUBDEV-6473] - Added Target Encoding options to the Parameters appendix.
[PUBDEV-6516] - Updated the description for the Tweedie family in the User Guide and in the GLM booklet.
[PUBDEV-6537] - Removed ologlog and oprobit from list of `link` options that can be specified in GLM.
[PUBDEV-6568] - Upated documentation to indicate that predict_leaf_node_assignment is not supported with XGBoost.
[PUBDEV-6596] - Added the new `-jks_alias` option to list of options that can be specified when starting H2O.

Yates (3.24.0.4) - 5/28/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/4/index.html

Bug

[PUBDEV-4305] - Fixed an error that occurred when applying as.matrix() to an h2o dataframe with numeric values of size ~ 600K x 300.
[PUBDEV-5937] - Introduced a new xgboost.predict.native.enable property, which ensures that H2OXGBoostEstimator will no longer always predicts the same value.
[PUBDEV-6440] - Users can now parse files from s3 using s3's directory URL with s3 protocol.
[PUBDEV-6475] - Fixed an issue that caused h2o.getModelTree to produce an "invalid object for slot nas" error when XGBoost produced a root-node only decision tree.
[PUBDEV-6476] - Improved performance of H2OXGBoost on OS X.
[PUBDEV-6479] - In Stacked Ensembles, fixed a categorical encoding mismatch error when building the ensemble. Users can now use SE on top of base models that are trained with categorical encoding.
[PUBDEV-6483] - In Isolation Forest, you can now specify that mtries = the number of features.
[PUBDEV-6488] - Fixed an issue that caused XGBoost to produce a tree with split features being all NA.
[PUBDEV-6489] - In h2o.getModelTree, when retrieving a threshold for values that are all NAs, updated the description to state that the "Split value is NA."
[PUBDEV-6490] - Fixed an issue that caused trivial features with NAs to be given inflated importance when monotonicity constraints was enabled. As a result, variable importance values were incorrect.
[PUBDEV-6491] - Fixed an NPE issue at water.init.HostnameGuesser when trying to launch a Sparkling Water cluster.
[PUBDEV-6496] - Removed internal_cv_weights from h2o.predict_contributions() output when the prediction was used on a fold column from a model run with nfolds.
[PUBDEV-6521] - Models that use Label Encoding no longer predict incorrectly on test data.
[PUBDEV-6523] - Predictions now work correctly on a subset of training features when using categorical_encoding.
[PUBDEV-6532] - Fixed an issue that caused XGBoost to format non-integer numbers (doubles, floats) using Locale.ENGLISH to ensure that a decimal point "." was used instead of a comma ",". This locale setting grouped large numbers by thousands and split the groups with ",", which was unparseable to XGBoost.

New Feature

[PUBDEV-6478] - Added support for CDH 6.2.
[PUBDEV-6503] - Users can now specify an external IP for h2odriver callback.

Improvement

[PUBDEV-6519] - Added a "toCategoricalCol" helper function for column type conversion.
[PUBDEV-6522] - Renamed "Generic Models" to "MOJO Import" in the documentation.

Docs

[PUBDEV-6486] - Added CDH 6.2 to list of supported Hadoop platforms.
[PUBDEV-6511] - Added the import_hive_table() and import_mojo() functions to the R HTML documentation.

Yates (3.24.0.3) - 5/7/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/3/index.html

Bug

[PUBDEV-5969] - Updated H2O-3 Plotting Functionality to be Compatible with Matplotlib Version 3.0.0.
[PUBDEV-6384] - Flow now shows the correct long value of a seed.
[PUBDEV-6394] - Fixed an issue that cause Rapids string operations on enum (categorical) columns to yield counterintuitive results.
[PUBDEV-6402] - Fixed an issue that caused monotonicity constraint in XGBoost to fail with certain parameters
[PUBDEV-6408] - Fixed an ArrayIndexOutOfBounds error. that occurred when parsing quotes in CSV files.
[PUBDEV-6416] - Fixed an error with Grid Search that caused the API to print errors not related to model CURRENTLY being added to the grid, but for all previous failures. This occurred even when the model was not added to the grid due to failure.
[PUBDEV-6431] - Fixed an exception that occurred when requesting Jobs from h2o.
[PUBDEV-6439] - When using Python 2.7, fixed an issue with non-ascii character handling in the as_data_frame() method.
[PUBDEV-6449] - Predicting on a dataset that has a response column with domain in a different order no longer leads to memory leaks.
[PUBDEV-6451] - Fixed an issue with retrieving details of a GLM model in Flow due to lack of support for long seeds.

Improvement

[PUBDEV-6419] - Simplified the directory structure of logs within downloaded zip archives.
[PUBDEV-6428] - Upgrades XGBoost to latest stable build.
[PUBDEV-6435] - Users can how import and upload MOJOs in R and Python using `import_mojo()` and `upload_mojo()`.
[PUBDEV-6450] - It is now possible to retrieve a list of features from a trained model.

Docs

[PUBDEV-6024] - Enhanced the GBM Reproducibility FAQ.
[PUBDEV-6456] - Added information about the Target Encoding smoothing parameter to the User Guide.

Yates (3.24.0.2) - 4/16/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/2/index.html

Bug

[PUBDEV-6221] - In the R client, fixed a caching issue that caused tests to fail when running commands line by line after running the entire test at once.
[PUBDEV-6369] - Fixed an issue that caused the h2o.upload_custom_metric to fail when using python3.
[PUBDEV-6370] - Fixed an issue that caused h2o.upload_custom_metric to fail on data that includes strings.
[PUBDEV-6371] - Fixed an issue with the K-Means_Example.flow.
[PUBDEV-6372] - The IP:port that is shown for logging now matches the IP:port that is described in the makeup of the cluster.
[PUBDEV-6377] - In XGBoost, fixed an AIOOB issue that occurred when running large data.
[PUBDEV-6390] - H2O-hive is now published to Maven central.
[PUBDEV-6393] - The Rapids as.factor operation no longer automatically converts non-ASCII strings to sanitized forms.
[PUBDEV-6395] - Fixed an AIOOB error in the AUC builder.
[PUBDEV-6399] - AUCBuilder now finds the first bin to merge when merging per-chunk histograms.
[PUBDEV-6409] - When running H2O on Hadoop, Hadoop now writes only to its container directory.
[PUBDEV-6418] - Users now receive a warning if two different versions of H2O are trying to communicate on the same node.
[PUBDEV-6421] - Fixed an issue that caused the H2O Python package to fail to load on a fresh install from pip.
[PUBDEV-6433] - Fixed an error that occurred when running multiple concurrent Group-By operations.

Improvement

[PUBDEV-6310] - The new GCP Marketplace offering contains the option to add a network tags script.

Docs

[PUBDEV-6040] - Added Python examples to the Target Encoding topic.
[PUBDEV-6401] - Fixed links to Sparkling Water topics in the Sparkling Water FAQ.
[PUBDEV-6425] - In CoxPH chapter, changed the link for the available R demo.

Yates (3.24.0.1) - 3/31/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/1/index.html

Bug

[PUBDEV-6159] - The AutoMLTest.java test suite now runs correctly on a local machine.
[PUBDEV-6189] - Fixed an issue in as_date that occurred when the column included NAs.
[PUBDEV-6208] - AutoML no longer fails if one of the Stacked Ensemble models is deleted.
[PUBDEV-6230] - Removed elipses after the H2O server link when launching the Python client.
[PUBDEV-6231] - In Deep Learning, fixed an issue that occurred when running one-hot-encoding on categoricals.
[PUBDEV-6262] - When running GBM in R without specifically setting a seed, users can now extract the seed that was used to build the model and reproduce that model.
[PUBDEV-6266] - In predictions, fixed an issue that resulted in a "Categorical value out of bounds error" when calling a model.
[PUBDEV-6284] - The Python API no longer reverses the labels for positive and negative values in the standardized coefficients plot legend.
[PUBDEV-6346] - In R, fixed an issue that cause group_by mean to only calculate one column when multiple columns were specified.
[PUBDEV-6350] - Fixed an issue that caused the confusion_matrix method to return matrices for other metrics.
[PUBDEV-6357] - Fixed an issue that resulted in a "Categorical value out of bounds error" when calling a model using Python.
[PUBDEV-6360] - Improved the error message that displays when a user attempts to modify an Enum/categorical column as if it were a string.
[PUBDEV-6367] - Rows that start with a # symbol are no longer dropped during the import process.
[PUBDEV-6368] - Fixed an SVM import failure.
[PUBDEV-6376] - Fixed an issue that caused the default StackedEnsemble prediction to fail when applied to a test dataset without a response column.
[PUBDEV-6379] - Fixed handling of BAD state in CategoricalWrapperVec.

New Feature

[PUBDEV-4680] - Added Blending mode to Stacked Ensembles, which can be specified with the `blending_frame` parameter. With Blending mode, you do not use cross-validation preds to train the metalearner. Instead you score the base models on a holdout set and use those predicted values.
[PUBDEV-5801] - Model output now includes column names and types.
[PUBDEV-5809] - AutoML now includes a max_runtime_secs_per_model option.
[PUBDEV-5925] - In GLM, added support for negative binomial family.
[PUBDEV-5980] - ExposeD Java target encoding to R.
[PUBDEV-6056] - For GBM and XGBoost models, users can now generate feature contributions (SHAP values).
[PUBDEV-6136] - Added support for Generic Models, which provide a means to use external, pretrained MOJO models in H2O for scoring. Currently only GBM, DRF, IF, and GLM MOJO models are supported.
[PUBDEV-6180] - Added the blending_frame parameter to Stacked Ensembles in Flow.
[PUBDEV-6196] - Added an include_algos parameter to AutoML in the R and Python APIs. Note that in Flow, users can specify exclude_algos only.
[PUBDEV-6339] - In the R and Python clients, added a function that calculates the chunk size based on raw size of the data, number of CPU cores, and number of nodes.
[PUBDEV-6344] - Added ability to import from Hive using metadata from Metastore.
[PUBDEV-6358] - Users can now choose the database where import_sql_select creates a temporary table.
[PUBDEV-6365] - Added support for monotonicity constraints for binomial GBMs.
[PUBDEV-6374] - Users can now define custom HTTP headers using an `-add_http_header` option.
[PUBDEV-6386] - XGBoost MOJO now uses Java predictor by default.

Task

[PUBDEV-4982] - Fixed an issue that caused the pyunit_lending_club_munging_assembly_large.py and pyunit_assembly_munge_large.py tests to sometimes fail when run inside a Docker container.
[PUBDEV-5876] - Simplified and improved the GLM COD implementation.

Improvement

[PUBDEV-5491] - SQLite support is available via any JDBC driver in streaming mode.
[PUBDEV-5993] - Updated Retrofit and okHttp dependecies.
[PUBDEV-6129] - Target Encoding is now available in the Python client.
[PUBDEV-6176] - Moved StackedEnsembleModel to hex.ensemble packages. In prior versions, this was in a root hex package.
[PUBDEV-6188] - Secret key ID and secret key are available for s3:// AWS protocol.
- This can be done in the R client using:
  h2o.setS3Credentials(accessKeyId, accesSecretKey)
- and in the Python client using:
  from h2o.persist import set_s3_credentials
  set_s3_credentials(access_key_id, secret_access_key)
[PUBDEV-6217] - Users can now specify AWS credentials at runtime.
[PUBDEV-6254] - The new blending_frame parameter is now available in AutoML.
[PUBDEV-6334] - Fixed an error in the Javadoc for the Frame.java sort function.
[PUBDEV-6363] - Fixed Hive delegation token generation.
[PUBDEV-6388] - Reordered the algorithms train in AutoML and prioritized hardcoded XGBoost models.

Docs

[PUBDEV-4977] - Removed FAQ indicating that Java 9 was not yet supported.
[PUBDEV-6136] - Added a "Generic Models" chapter to the Algorithms section.
[PUBDEV-6179] - Added the blending_frame parameter to Stacked Ensembles documentation.
[PUBDEV-6280] - Added information about the Negative Binomial family to the GLM booklet and the user guide.
[PUBDV-6289] - Improved the R and Python client documentation for the `sum` function.
[PUBDEV-6331] - Added include_algos,e xclude_algos, max_models, and max_runtime_secs_per_model examples to the Parameters appendix.
[PUBDEV-6362] - In the User Guide and R an Python documentation, replaced references to "H2O Cloud" with "H2O Cluster".
[PUBDEV-6375] - Added information about predict_contributions to the Performance and Prediction chapter.
[PUBDEV-6381] - In the GBM chapter, noted that monotone_constraints is available for Bernoulli distributions in addition to Gaussian distributions.
Improved the GBM Reproducibility FAQ.

Xu (3.22.1.6) - 3/13/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/6/index.html

Bug

[PUBDEV-6335] - In GBM, added a check to ensure that monotonicity constraints can only be used when distribution="gaussian".
[PUBDEV-6342] - Fixed an issue that caused decreasing monotonic constraints to fail to work correctly. Min-Max bounds are now properly propagated to the subtrees.

Improvement

[PUBDEV-6343] - Added internal validation of monotonicity of GBM trees.

Docs

[PUBDEV-6337] - Updated the description of monotone_constraints for GBM. This option can only be used for gaussian distributions.
[PUBDEV-6347] - Improved documentation for the EC2 and S3 storage topic for AWS Standalone instances (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/cloud-integration/ec2-and-s3.html#aws-standalone-instance).

Xu (3.22.1.5) - 3/4/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/5/index.html

Bug

[PUBDEV-6283] - Fixed an issue that caused stratified_split to fail when run on same column twice.
[PUBDEV-6290] - Fixed an error that occurred when retreiving AutoML leader model with max_models = 1 in R.
[PUBDEV-6292] - Fixed an issue that ersulted in an extra NA row in the GLM variable importance frame.
[PUBDEV-6298] - h2odriver now works correctly on MapR.
[PUBDEV-6300] - Flow no longer displays an error when searching for a file without first providing a path.
[PUBDEV-6303] - GBM monotonicity constraints now correctly preserves the exact monotonicity.
[PUBDEV-6304] - Fixed the warning message that displays for categorical data with more then 10,000,000 values.
[PUBDEV-6305] - Users can now download logs from R after connecting via Steam.
[PUBDEV-6313] - In AutoML, created new partition rules for generating new validation and leaderboard frames when cross validation is disabled and validation/leaderboard frames are not provided:
- If only the validation frame is missing: training/validation = 90/10.
- If only the leaderboard frame is missing: training/leaderboard = 90/10.
- If both the validation and leaderboard frames are missing: training/validation/leaderboard = 80/10/10.
[PUBDEV-6321] - Fixed resolution of `spark-shell --packages "ai.h2o:h2o-algos:"` by Spark Ivy resolver.
[PUBDEV-6333] - Fixed an issue that caused h2o driver to fail to start when Hive was not configured.

Improvement

[PUBDEV-6271] - In Isolation Forest, fixed an issue that caused the minimum and maximum path length to not be correctly calculated when there are no OOB observations.
[PUBDEV-6294] - A `check_constant_response` option is available in DRF and GBM. When enabled (default), then an exception is thrown if the response column is a constant value.

Docs

[PUBDEV-5554] - When running XGBoost on Hadoop, recommend that users set -extramempercent to 120.
[PUBDEV-6287] - Added the new check_constant_response option to the GBM and DRF chapters. Also added an example usage to the Parameters Appendix.
[PUBDEV-6301] - Added a description of the AUCPR metric to the Model Performance section in the User Guide.
[PUBDEV-6314] - Fixed the Random Grid Search in Python example in the Grid Search chapter.

Xu (3.22.1.4) - 2/15/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/4/index.html

Bug

[PUBDEV-6242] - Users can now save and load Isolation Forest models.
[PUBDEV-6264] - In K-Means, fixed and issue in which time columns were treated as if they were categorical.
[PUBDEV-6267] - Fixed Autoencoder `calculateReconstructionErrorPerRowData` error and set the default value of the result MSE to -1.

Improvement

[HEXDEV-733] - When using h2o.import_sql_table to read from a Hive table, the username and password no longer appear in the logs.
[PUBDEV-6207] - Monotone constraints are now exposed in Flow.
[PUBDEV-6277] - The check for constants in response columns is now optional for all models.

Docs

[PUBDEV-6032] - Added to the documentation that MOJO/POJO predict cannot parse columns enclosed in double quotes (for example, ""2"").
[PUBDEV-6174] - Updated the description for Gini in the User Guide.
[PUBDEV-6183] - Fixed the equation for Tweedie Deviance in the GLM booklet and in the User Guide.
[PUBDEV-6199] - Added a "Tokenize Strings" topic to the Data Manipulation chapter.
[PUBDEV-6245] - Added `predict_leaf_node_assignment` information to the User Guide in the Performance and Prediction chapter.
[PUBDEV-6253] - Noted in the documentation that the `custom` and `custom_increasing` stopping metric options are not available in the R client.

Xu (3.22.1.3) - 1/25/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/3/index.html

Bug

[PUBDEV-6186] - Improved error handling for a wrong Hive JDBC connector error.
[PUBDEV-6233] - Fixed an issue that caused H2O clusters to fail to come up on Cloudera 6 with HTTPS.

New Feature

[PUBDEV-6216] - Added Hive with Kerberos support for H2O on Hadoop.

Docs

[PUBDEV-6219] - Updated the default value for min_rows in the User Guide when used with XGBoost, DRF, and Isolation Forest.

Xu (3.22.1.2) - 1/18/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/2/index.html

Bug

[PUBDEV-6109] - In Flow, fixed an issue that caused POJOs, MOJOs, and genmodel.jar to fail to download. This occurred when Flow was launched via Enterprise Steam and in any deployment where user_context was specified.
[PUBDEV-6177] - Fixed an issue that caused H2OTree to fail with Isolation Forest models trained on data with categorical columns.
[PUBDEV-6178] - When a new tree is assembled from a model, the root node now includes information about the split feature in the description array.
[PUBDEV-6181] - Fixed an issue where Flow failed to provide the ability to ignore certain columns.
[PUBDEV-6192] - In Flow, fixed an issue where users were not able to select a frame when splitting a dataset.
[PUBDEV-6197] - Setting the `ignored_columns` parameter via the Python API now works correctly.
[PUBDEV-6198] - Fixed an issue that caused H2O to hang in Sparkling Water deployments.
[PUBDEV-6200] - Splitting frames now works correctly in Flow.
[PUBDEV-6201] - Import SQL Table now works correctly in Flow.
[PUBDEV-6203] - Fixed an issue with imports in Flow.
[PUBDEV-6204] - Fixed interaction pairs for GLM in Flow.
[PUBDEV-6206] - Fixed broken "Combine predictions with frame" in Flow.

New Feature

[PUBDEV-6146] - Added support for HDP 3.1.

Task

[PUBDEV-6171] - Fixed the pyunit_pubdev_3500_max_k_large.py unit test.
[PUBDEV-6172] - Fixed the runit_PUBDEV_5705_drop_columns_parser_gz.R unit test.

Improvement

[PUBDEV-6167] - Increased the XGBoost stress test timeout.
[PUBDEV-6188] - Implemented secret key credentials for s3:// AWS protocol.
[PUBDEV-6205] - Renamed .jade files to .pug.

Docs

[PUBDEV-6165] - Added HDP 3.0 and 3.1 to list of supported Hadoop versions.
[PUBDEV-6190] - Updated wording for Kmeans Scoring History Graph. This graph shows the number of iterations vs. within the cluster’s sum of squares.

Xu (3.22.1.1) - 12/28/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/1/index.html

Bug

[PUBDEV-5236] - PCA tests now work correctly with the "from h2o.estimators.pca import H2OPrincipalComponentAnalysisEstimator" import statement.
[PUBDEV-5956] - Fixed an AutoMLTest test that was leaking keys in KeepCrossValidationFoldAssignment test.
[PUBDEV-6081] - Reduced the Invocation JMH level setup/teardown to only the training model.
[PUBDEV-6124] - In XGBoost, the default value of L2 regularization for tree models is now 1, which is consistent with native XGBoost.
[PUBDEV-6157] - Fixed an issue that caused Stacked Ensembles to fail with GLM metalearner when the same H2O instance was used to train a GLM multinomial classification model with more classes than what is used in Stacked Ensembles.

New Feature

[PUBDEV-5261] - Users can now specify `custom` and `custom_increasing` when setting the `stopping_criteria` parameter in GBM and DRF.
[PUBDEV-5770] - Checkpoints can now be exported when running Grid Search or AutomL.

Task

[PUBDEV-5894] - Added support for CDH 6.0, which includes Hadoop 3 support. Be sure to review https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_release_notes.html for more information.
[PUBDEV-5953] - Fixed an AutoMLTest that was leaking keys.
[PUBDEV-6085] - Added a test that runs multiple `nfolds>0` DRF models in parallel.
[PUBDEV-6153] - Added support for CDH 6.1

Improvement

[PUBDEV-5820] - Hadoop builds now work with Jetty 8 and 9.
[PUBDEV-5897] - R examples in the R package docs now use Hadley's style guide.

Docs

[PUBDEV-6048] - Added documentation for the new stopping_metric options in GBM and DRF.
[PUBDEV-6154] - Added CDH 6 and 6.1 to list of supported Hadoop versions.
[PUBDEV-6156] - In the XGBoost chapter, updated the default value for reg_lambda to be 1.

Xia (3.22.0.5) - 1/16/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xia/5/index.html

Bug

[PUBDEV-6198] - Fixed an H2O hang issue in Sparkling Water deployments.

Xia (3.22.0.4) - 1/4/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xia/4/index.html

Bug

[PUBDEV-6109] - In Flow, fixed an issue that caused POJOs, MOJOs, and genmodel.jar to fail to download. This occurred when Flow was launched via Enterprise Steam and in any deployment where user_context was specified.
[PUBDEV-6166] - On the external backedn, H2O now explicitly passes the timestamp from the Spark Driver node.

Xia (3.22.0.3) - 12/21/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xia/3/index.html

Bug

[PUBDEV-5829] - Fixed an issue with the REST API. Calling "get model" no longer returns 0 for the timestamp of the model.
[PUBDEV-5959] - The PySparking client no longer hangs after re-connecting to the H2O external backend.
[PUBDEV-5990] - Fixed an OOM issue in h2o.arrange.
[PUBDEV-6059] - Fixed an issue that caused importing Pargue files with large Double data to fail.
[PUBDEV-6076] - After applying group_by to a time stamped column, the original time stamp format is now retained.
[PUBDEV-6079] - In AutoML, cross-validation metrics are now used for early stopping by default. Because of this, the validation_frame argument is now ignored unless nfolds==0 and, in that case, will be used for early stopping.
[PUBDEV-6098] - Fixed an issue that caused the MOJO visualizer to fail for Isolation Forest models.
[PUBDEV-6101] - StackedEnsembleMojoModel is now serializable.
[PUBDEV-6107] - In the R client, fixed an error that occurrred when running getModelTree.
[PUBDEV-6109] - In Flow, fixed an issue that caused POJOs, MOJOs, and genmodel.jar to fail to download. This occurred when Flow was launched via Enterprise Steam and in any deployment where user_context was specified.
[PUBDEV-6111] - Fixed the formula used for calculating L2 distance.
[PUBDEV-6117] - The Python client now allows users to enable XGBoost compare with any H2O frame. The convert_H2OFrame_2_DMatrix method accepts any H2O frame and can convert it to valid data for native XGBoost.
[PUBDEV-6120] - H2O XGBoost now reports correct variable importances. The variable importances are computed from the gains of their respective loss functions during tree construction.
[PUBDEV-6122] - Users can now save PDP plots.
[PUBDEV-6123] - Fixed an issue that resulted in a SQL exception when connecting H2O to a SQL server and importing a table.
[PUBDEV-6137] - Fixed an issue with GCS support on Hadoop environments.

New Feature

[PUBDEV-1984] - Added monotonic variables for GBM.
[PUBDEV-6030] - EasyPredictModelWrapper now calculates reconstruction errors for AutoEncoder.
[PUBDEV-6091] - When running a grid search, a timesteamp column was added that shows when each model was added to the grid summary table.

Improvement

[PUBDEV-5865] - In GBM, users can now specify the `monotone_constraints` parameter.
[PUBDEV-6106] - Prediction contributions from each tree from MOJO to easywrapper are now exposed.
[PUBDEV-6110] - Updated Gradle to version 5.0.
[PUBDEV-6115] - Fixed the output of rankTsv in the AutoML leaderboard.

Docs

[PUBDEV-4377] - Updated the Prediction section to include information on how the prediction threshold is selected for classification problems.
[PUBDEV-6105] - Updated the description of enum_limited to indicate that T=1024.
[PUBDEV-6148] - In the GBM chapter, added `monotone_constraints` to list of available parameters.

Xia (3.22.0.2) - 11/21/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xia/2/index.html

Bug

[PUBDEV-3281] - Fixed an issue that caused ARFF parser to parse some file incorrectly.
[PUBDEV-4737] - When performing a grid search in Python, fixed an issue that caused all models to return a model.type of "supervised."
[PUBDEV-5352] - When running DRF in the Python client, checkpointing on new data now works correctly.
[PUBDEV-5869] - Fixed an issue that caused the confusion matrix recall and precision values to be switched.
[PUBDEV-6036] - In the Python client, fixed an issue that caused the `offset_column` parameter to be ignored when it was passed in the GLM train statement.
[PUBDEV-6042] - The H2O Tree Handler now works correctly on Isolation Forest models.
[PUBDEV-6046] - When running AutoML, fixed an issue that resulted in a "Failed to get metric: auc from ModelMetrics type BinomialGLM" message.
[PUBDEV-6050] - In Flow, Precision and Recall definitions are no longer inverted in the confusion matrix.
[PUBDEV-6052] - Fixed the error message that displays when converting from a pandas dataframe to an h2oframe in Python 3.6.
[PUBDEV-6054] - In XGBoost, fixed an issue that resulted in a "Maximum amount of file descriptors hit" message.
[PUBDEV-6060] - Fixed the description of sample_rate in Isolation Forest.
[PUBDEV-6063] - Cross validation models are no longer deleted by default.
[PUBDEV-6065] - When viewing an AutoML leaderboard, fixed an issue that resulted in an ArrayIndexOutOfBoundsException if `sort_metric` was specified but no model was built.

New Feature

[PUBDEV-5766] - Added monotonicity constraints to H2O XGBoost.

Task

[PUBDEV-6039] - When generating MOJOs, h2o-genmodel.jar now includes a check for MOJO version 1.3 to determine whether the ho2-genmodel.jar and the MOJO version can work together. Prior versions of h2o-3 did not include MOJO 1.3, and as a result, MOJOs silently returned predicted values executed on an empty vector.

Improvement

[PUBDEV-5705] - With a new `skipped_columns` option, users can now specify to drop specific columns before parsing. Note that this functionality is not supported for SVMLight or Avro file formats.
[PUBDEV-6062] - The GLM multinomial coefficient table now includes the original levels as column names.

Docs

[PUBDEV-3216] - Created new Performance & Prediction and Variable Importance sections in the User Guide.
[PUBDEV-5313] - Updatd the default value of `categorical_encoding` for XGBoost. This defaults to Auto (which is one_hot_encoding).
[PUBDEV-6012] - In the parameter entry for `weights_column`, updated the example to exclude the weight column in the list of predictors.
[PUBDEV-6016] - In the DRF FAQ, updated the "What happens when you try to predict on a categorical level not seen during training?" question.
[PUBDEV-6025] - TargetingEncoder is now included in the Python module docs.
[PUBDEV-6041] - In GLM, updated the documentation to indicate that coordinate_descent is no longer experimental.
[PUBDEV-6064] - Added default values for `max_depth`, `sample_size`, and `sample_rate`. Also added a parameter description entry for `sample_size`, showing an Isolation Forest example.
[PUBDEV-6086] - Added the new `monotone_constraints` option to the XGBoost chapter.

Xia (3.22.0.1) - 10/26/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xia/1/index.html

Bug

[PUBDEV-5023] - In Python, the metalearner method is only available for Stacked Ensembles.
[PUBDEV-5658] - Fixed an issue that caused micro benchmark tests to fail to run in the jmh directory.
[PUBDEV-5663] - Fixed an issue that caused H2O to fail to export dataframes to S3.
[PUBDEV-5745] - Added the `keep_cross_validation_models` argument to Grid Search.
[PUBDEV-5746] - Improved efficiency of the `keep_cross_validation_models` parameter in AutoML
[PUBDEV-5777] - Simplified the comparison of H2OXGBoost with native XGBoost when using the Python client.
[PUBDEV-5780] - Fixed JDBC ingestion for Teradata databases.
[PUBDEV-5824] - In the Python client and the Java API, multiple runs of the same AutoML instance no longer fail training new "Best Of Family" SE models that would include the newly generated models.
[PUBDEV-5873] - Fixed an issue that resulted in an AssertionError when calling `cbind` from the Python client.
[PUBDEV-5881] - AutoML now enforces case for the `sort_metric` option when using the Java API.
[PUBDEV-5903] - In AutoML, StackEnsemble models are now always trained, even if we reached `max_runtime_secs` limit.
[PUBDEV-5904] - In the R client, added documentation for helper functions.
[PUBDEV-5922] - Renamed `x` to `X` in the H2O-sklearn fit method to be consistent with the sklearn API.
[PUBDEV-5924] - Merging datasets now works correctly.
[PUBDEV-5931] - Building on Maven with h2o-ext-xgboost on versions later than 3.18.0.11 no longer results in a dependency error.
[PUBDEV-5933] - Fixed a Java 11 ORC file parsing failure.
[PUBDEV-5954] - Upgraded the version of the lodash package used in H2O Flow.
[PUBDEV-5967] - `-ip localhost` now works correctly on WSL.
[PUBDEV-5971] - CSV/ARFF Parser no longer treats blank lines as data lines with NAs.
[PUBDEV-5976] - Starting h2o-3 from the Python Client no longer fails on Java 10.0.2.
[PUBDEV-5995] - Fixed an issue that caused StackedEnsemble MOJO model to return an "IllegalArgumentException: categorical value out of range" message.
[PUBDEV-5996] - Removed the "nclasses" parameter from tree traversal routines.
[PUBDEV-5998] - Exposed H2OXGBoost parameters used to train a model to the Python API. Previously, this information was visible in the Java backend but was not passed back to the Python API.
[PUBDEV-5999] - Removed "illegal reflective access" warnings when starting H2O-3 with Java 10.
[PUBDEV-6004] - In Stacked Ensembles, changes made to data during scoring now apply to all models.
[PUBDEV-6005] - When running AutoML in Flow, updated the list of algorithms that can ber selected in the "Exclude These Algorithms" section.

New Feature

[PUBDEV-5170] - Individual predictions of GBM trees are now exposed in the MOJO API.
[PUBDEV-5378] - Exposed target encoding in the Java API.
[PUBDEV-5399] - The `keep_cross_validation_fold_assignment` option is now available in AutoML.
[PUBDEV-5609] - Added support for the Isolation Forest algorithm in H2O-3. Note that this is a Beta version of the algorithm.
[PUBDEV-5668] - Added the `keep_cross_validation_fold_assignment` option to AutoML in Flow.
[PUBDEV-5681] - `h2o.connect` no longer ignores `strict_version_check=FALSE` when connecting to a Steam cluster.
[PUBDEV-5695] - Created an R demo for CoxPH. This is available here.
[PUBDEV-5775] - It is now possible to combine two models into one MOJO, with the second model using the prediction from the first model as a feature. These models can be from any algorithm or combination of algorithms except Word2Vec.
[PUBDEV-5852] - Implemented h2oframe.fillna(method='backward').
[PUBDEV-5977] - Improved speed-up of AutoML training on smaller datesets in client mode (Sparkling Water).
[PUBDEV-5979] - Exposed Java Target Encoding in the Python client.
[PUBDEV-5988] - Users can now specify a `-features` parameter when starting h2o from the command line. This allows users to remove experimental or beta algorithms when starting H2O-3. Available options for this parameter include `beta`, `stable`, and `experimental`.

Task

[PUBDEV-4507] - Added XGBoost to AutoML.
[PUBDEV-5696] - Added an option to allow users to use a user-specified JDBC driver.
[PUBDEV-5722] - Exposed `pr_auc` to areas where you can find AUC, including scoring_history, model summary. Also added h2o.pr_auc() in R.
[PUBDEV-5901] - Added support for Java 11.
[PUBDEV-6001] - Improved the AutoML documentation in the User Guide.

Improvement

[PUBDEV-5590] - Added a `MAX_USR_CONNECTIONS_KEY` argument to limit number of sessions for import_sql_table.
[PUBDEV-5669] - Improved performance gap when importing data using Hive2.
[PUBDEV-5719] - Improved and cleaned up output for the h2o.mojo_predict_csv and h2o.mojo_predict_df functions.
[PUBDEV-5743] - Users can now visualize XGBoost trees when running predictions.
[PUBDEV-5761] - Added weights to partial depenced plots. Also added a level for missing values.
[PUBDEV-5822] - Users can now download the genmodel.jar in Flow for completed models.
[PUBDEV-5886] - In AutoML, changed the default for `keep_cross_validation_models` and `keep_cross_validation_predictions` from True to False.
[PUBDEV-5888] - Added support for predicting using the XGBoost Predictor.
[PUBDEV-5909] - In XGBoost, optimized the matrix exchange between Java and native C++ code.
[PUBDEV-5913] - Improved the h2o-3 README for installing in R and IntelliJ IDEA.
[PUBDEV-5927] - Introduced a simple "streaming" mode that allows H2O to read from a table using basic SQL:92 constructs.
[PUBDEV-5929] - In AutoML, `stopping_metric` is now based on `sort_metric`.
[PUBDEV-5952] - The requirements.txt file now includes the Colorama version.
[PUBDEV-5961] - In lockable.java, delete is now final in order to prevent inconsistent overrides.
[PUBDEV-5964] - Reverted AutoML naming change from Auto.Algo to Auto.algo.
[PUBDEV-6000] - In AutoML, automatic partitioning of the valiation frame now uses 10% of the training data instead of 20%.
[PUBDEV-6002] - Changed model and grid indexing in autogenerated model names in AutoML to be 1 instead of 0 indexed.
[PUBDEV-6017] - Allow public access to H2O instances started from R/Python. This can be done with the new `bind_to_localhost` (Boolean) parameter, which can be specified in `h2o.init()`.

Docs

[PUBDEV-4505] - Added Scala and Java examples to the Building and Extracting a MOJO topic.
[PUBDEV-4590] - Added a Scala example to the Stacked Ensembles topic.
[PUBDEV-5949] - Added Tree class method to the Python module documentation.
[PUBDEV-5641] - Removed references to UDP in the documentation.
[PUBDEV-5664] - Removed Sparkling Water topics from H2O-3 User Guide. These are in the Sparkling Water User Guide.
[PUBDEV-5674] - Added a Resources section to the Overview and included links to the awesome-h2o repository, H2O.ai blogs, and customer use cases.
[PUBDEV-5693] - Updated GCP Installation documentation with infomation about quota limits.
[PUBDEV-5709] - Updated Gains/Lift documentation. 16 groups are now used by default.
[PUBDEV-5756] - Added Python examples to the Cross-Validation topic in the User Guide.
[PUBDEV-5762] - Added `loss_by_col` and `loss_by_col_idx` to list of GLRM parameters.
[PUBDEV-5810] - Updated documentation for `class_sampling_factors`. `balance_classes` must be enabled when using `class_sampling_factors`.
[PUBDEV-5839] - Added a Python example for initializing and starting h2o-3 in Docker.
[PUBDEV-5857] - Updated the Admin menu documentation in Flow after adding "Download Gen Model" option.
[PUBDEV-5905] - In GBM and DRF, `enum_limited` is a supported option for `categorical_encoding`.
[PUBDEV-5962] - Added the -notify_local flag to list of flags available when starting H2O-3 from the command line.
[PUBDEV-5982] - Added documentation for Isolation Forest (beta).

Wright (3.20.0.10) - 10/16/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/10/index.html

Bug

[PUBDEV-5613] - AutoML now correctly. respects the max_runtime_secs setting.
[PUBDEV-5856] - Fixed a multinomial COD solver bug.
[PUBDEV-5919] - Fixed an issue that caused importing of ARFF files to fail if the header was too large and/or with large datasets with categoricals.

Wright (3.20.0.9) - 10/1/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/9/index.html

Bug

[PUBDEV-5930] - Fixed an issue that caused H2O to fail when loading a GLRM model.

Improvement

[PUBDEV-5938] - log4j.properties can be loaded from classpath.
[PUBDEV-5939] - Buffer configuration is now available for http/https connections.

Wright (3.20.0.8) - 9/21/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/8/index.html

Bug

[PUBDEV-5855] - Fixed an issue that occurred when parsing columns that include double quotation.
[PUBDEV-5880] - The `max_runtime_secs` option is no longer ignored when using the Python client.
[PUBDEV-5906] - Fixed an XGBoost Sparsity detection test to make it deterministic.
[PUBDEV-5907] - Hadoop driver class no longer fails to parse new Java version string.

New Feature

[PUBDEV-5861] - Added a GBM/DRF Tree walker API in the R client.
[PUBDEV-5862] - The R API for obtaining and traversing model trees in GBM/DRF is available in Python.

Improvement

[PUBDEV-5706] - Added support for user defined split points in partial dependence plots.
[PUBDEV-5748] - Confusion matrices can now be generated in Flow.
[PUBDEV-5900] - Java version error messages now reference versions 7 and 8 instead of 1.7 and 1.8.
[PUBDEV-5902] - A Python tree traversal demo is available at https://github.com/h2oai/h2o-3/blob/master/h2o-py/demos/tree_demo.ipynb.

Wright (3.20.0.7) - 8/31/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/7/index.html

Bug

[PUBDEV-5826] - Fixed an issue that caused a mismatch between GLRM MOJO predict and GLRM predict.
[PUBDEV-5841] - Fixed an issue that caused H2O XGBoost grid search to fail even when sizing the sessions 4xs the data size and using extramempercent of 150.
[PUBDEV-5848] - When performing multiple AutoML runs using the H2O R client, viewing the first AutoML leaderboard no longer results in an error.
[PUBDEV-5864] - H2O now only binds to the local interface when started from R/Python.
[PUBDEV-5871] - Fixed an issue that caused DeepLearning and XGBoost MOJOs to get a corrupted input row. This occurred when GenModel's helper functions that perform 1-hot encoding failed to take correctly into considerations cases where useAllFactorLevels = false and corrupted the first categorical value in the input row.
[PUBDEV-5872] - Added gamma, tweedie, and poisson objective functions to the XGBoost Java Predictor.
[PUBDEV-5877] - Fixed an issue in HDFS file import. In rare cases the import could fail due to temporarily inconsistent state of H2O distributed memory.

Wright (3.20.0.6) - 8/24/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/6/index.html

Bug

[PUBDEV-5724] - H2oApi.frameColumn in h2o-bindings.jar now correctly parses responses.
[PUBDEV-5751] - biz.k11i:xgboost-predictor:0.3.0 is now ported to the h2oai repo and released to Maven Central. This allows for easier deployment of H2O and Sparkling Water.
[PUBDEV-5786] - In GLM, the coordinate descent solver is now only disabled for when family=multinomial.
[PUBDEV-5792] - Fixed an issue that caused the H2O parser to hang when reading a Parquet file.
[PUBDEV-5803] - Fixed an issue that resulted in an AutoML "Unauthorized" Error when running through Enterprise Steam via R.
[PUBDEV-5818] - Leaf Node assignment no longer produces the wrong paths for degenerated trees.
[PUBDEV-5823] - Updated the list of Python dependencies on the release download page and in the User Guide.
[PUBDEV-5826] - Fixed an issue that resulted in a mismatch between GLRM predict and GLRM MOJO predict.
[PUBDEV-5844] - Launching H2O on a machine with greater than 2TB no longer results in an integer overflow error.
[PUBDEV-5847] - The HTTP parser no longer reads fewer rows when the data is compressed.
[PUBDEV-5851] - AstFillNA Rapids expression now returns H2O.unimp() on backward methods.

New Feature

[PUBDEV-5735] - In GBM and DRF, tree traversal and information is now accessible from the R and Python clients. This can be done using the new h2o.getModelTree function.
[PUBDEV-5779] - In GBM, added a new staged_predict_proba function.
[PUBDEV-5812] - MOJO output now includes terminal node IDs.
[PUBDEV-5832] - GBM/DRF, the H2OTreeClass function now allows you to specify categorical levels.

Task

[PUBDEV-5845] - Updated the XGBoost dependency to ai.h2o:xgboost-predictor:0.3.1.

Improvement

[PUBDEV-5837] - Terminal node IDs can now be retrieved in the predict_leaf_node_assignment function.

Docs

[PUBDEV-5836] - The User Guide now indicates that only Hive versions 2.2.0 or greater are supported for JDBC drivers. Hive 2.1 is not currently supported.
[PUBDEV-5838] - In GLM, the documentation for the Coordinate Descent solver now notes that Coordinate Descent is not available when family=multinomial.

Wright (3.20.0.5) - 8/8/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/5/index.html

Bug

[PUBDEV-5543] - Hive smoke tests no longer time out on HDP.
[PUBDEV-5793] - AutoML now correctly ignores columns specified in Flow.
[PUBDEV-5794] - In Flow, the Import SQL Table button now works correctly.
[PUBDEV-5806] - XGBoost cross validation now works correctly.
[PUBDEV-5811] - Fixed an issue that caused AutoML to fail in Flow due to the keep_cross_validation_fold_assignment option.
[PUBDEV-5814] - Multinomial Stacked Ensemble no longer fails when either XGBoost or Naive Bayes is the base model.
[PUBDEV-5816] - Fixed an issue that caused XGBoost to generate the wrong metrics for multinomial cases.
[PUBDEV-5819] - Increased the client_disconnect_timeout value when ClientDisconnectCheckThread searches for connected clients.

Improvement

[PUBDEV-5813] - Added automated Flow test for AutoML.

Wright (3.20.0.4) - 7/31/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/4/index.html

Bug

[PUBDEV-5555] - In Flow, increased the height of the summary section for the column summary.
[PUBDEV-5720] - Cross-validation now works correctly in XGBoost.
[PUBDEV-5739] - Documentation for the MOJO predict functions (mojo_predict_pandas and mojo_predict_csv) is now available in the Python User Guide.
[PUBDEV-5744] - Regression comparison tests no longer fail between H2OXGBoost and native XGBoost.
[PUBDEV-5760] - GBM/DRF MOJO scoring no longer allocates unnecessary objects for each scored row.

New Feature

[PUBDEV-5736] - In GBM, added point estimation as a metric.

Task

[PUBDEV-5730] - Reduced the size of the h2o.jar.

Improvement

[PUBDEV-5429] - The h2o.importFile([List of Directory Paths]) function will now import all the files located in the specified folders.
[PUBDEV-5637] - Added Standard Error of Mean (SEM) to Partial Dependence Plots.
[PUBDEV-5718] - Added two new formatting options to hex.genmodel.tools.PrintMojo. The --decimalplaces (or -d) option allows you to set the number of places after the decimal point. The --fontsize (or -f) option allows you to set the fontsize. The default fontsize is 14.
[PUBDEV-5733] - Optimized the performance of ingesting large number of small Parquet files by using sequential parse.
[PUBDEV-5749] - Added support for weights in a calibration frame.
[PUBDEV-5752] - Added a new port_offset command. This parameter lets you specify the relationship of the API port ("web port") and the internal communication port. The previous implementation expected h2o port = api port + 1. Because there are assumptions in the code that the h2o port and API port can be derived from each other, we cannot fully decouple them. Instead, this new option lets the user specify an offset such that h2o port = api port + offset. This enables the user to move the communication port to a specific range, which can be firewalled.
[PUBDEV-5765] - Improved speed of ingesting data from HTTP/HTTPS data sources in standalone H2O.

Docs

[PUBDEV-5694] - The User Guide now specifies that XLS/XLSX files must be BIFF 8 format. Other formats are not supported.
[PUBDEV-5731] - Added to docs that when downloading MOJOs/POJOs, users must specify the entire path and not just the relative path.
[PUBDEV-5774] - Added documentation for the new port_offset command when starting H2O.

Wright (3.20.0.3) - 7/10/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/3/index.html

Bug

[PUBDEV-5353] - The `fold_column` option now works correctly in XGBoost.
[PUBDEV-5560] - Calling `describe` on empty H2O frame no longer results in an error in Python.
[PUBDEV-5576] - In XGBoost, when performing a grid search from Flow, the correct cross validation AUC score is now reported back.
[PUBDEV-5612] - Fixed an issue that cause XGBoost to fail with Tesla V100 drivers 70 and above and with CUDA 9.
[PUBDEV-5654] - H2O's XGBoost results no longer differ from native XGBoost when dmatrix_type="sparse".
[PUBDEV-5672] - In the R documentation, fixed the description for h2o.sum to state that this function indicates whether to return an H2O frame or one single aggregated sum.
[PUBDEV-5673] - H2O data import for Parquet files no longer fails on numeric decimalTypes.
[PUBDEV-5683] - Fixed an error that occurred when viewing the AutoML Leaderboard in Flow before the first model was completed.
[PUBDEV-5686] - When connecting to a Linux H2O Cluster from a Windows machine using Python, the `import_file()` function can now correctly locate the file on the Linux Server.
[PUBDEV-5692] - H2O now reports the project version in the logs.
[PUBDEV-5700] - In CoxPH, fixed an issue that caused training to fail to create JSON output when the dataset included too many features.
[PUBDEV-5707] - Users can now switch between edit and command modes on Scala cells.
[PUBDEV-5721] - Fixed an issue with the way that RMSE was calculated for cross-validated models.
[PUBDEV-5727] - In GLRM, fixed an issue that caused differences between the result of h2o.predict and MOJO predictions.

New Feature

[PUBDEV-5680] - Added a new `-report_hostname` flag that can be specified along with `-proxy` when starting H2O on Hadoop. When this flag is enabled, users can replace the IP address with the machine's host name when starting Flow.
[PUBDEV-5697] - Added support for the Amazon Redshift data warehouse.
[PUBDEV-5725] - Added support for CDH 5.9.

Task

[PUBDEV-5628] - Accessing secured (Kerberized) HDFS from a standalone H2O instance works correctly.
[PUBDEV-5656] - AutoML Python tests always use max models to avoid running out of time.
[PUBDEV-5682] - CoxPH now validates that a `stop_column` is specified. `stop_column` is a required parameter.
[PUBDEV-5688] - Fixed an issue that caused a GCS Exception to display when H2O was launched offline.

Improvement

[PUBDEV-5572] - In Flow, improved the display of the confusion matrix for multinomial cases.
[PUBDEV-5665] - Users will now see a Precision-Recall AUC when training binomial models.
[PUBDEV-5666] - Synchronous and Asynchronous Scala Cells are now allowed in H2O Flow.
[PUBDEV-5687] - H2O now autodetects string columns and skips them before calculating `groupby`. H2O also warns the user when this happens.

Docs

[PUBDEV-5424] - The h2o.mojo_predict_csv and h2o.mojo_predict_df functions now appear in the R HTML documentation.
[PUBDEV-5702] - In GLM, documented that the Poisson family uses the -log(maximum likelihood function) for deviance.
[PUBDEV-5710] - Fixed the R example in the "Replacing Values in a Frame" data munging topic. Columns and rows do not start at 0; R has a 1-based index.
[PUBDEV-5711] - Fixed the R example in the "Group By" data munging topic. Specify the "Month" column instead of the "NumberOfFlights" column when finding the number of flights in a given month based on origin.
[PUBDEV-5714] - Added the new `-report_hostname` flag to the list of Hadoop launch parameters.
[PUBDEV-5715] - Added Amazon Redshift to the list of supported JDBC drivers.
[PUBDEV-5726] - Added CDH 5.9 to the list of supported Hadoop platforms.

Wright (3.20.0.2) - 6/15/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/2/index.html

Bug

[PUBDEV-3950] - Fixed an issue that resulted in a null pointer exception for H2O ensembles.
[PUBDEV-5250] - In AutoML, ignored_columns are now passed in the API call when specifying both x and a fold_column in during training.
[PUBDEV-5622] - Fixed a bug in documentation that incorrectly referenced 'calibrate_frame' instead of 'calibration_frame'.
[PUBDEV-5629] - java -jar h2o.jar no longer fails on Java 7.
[PUBDEV-5634] - Fixed a typo in the AutoML pydocs for sort_metric.
[PUBDEV-5651] - Exported CoxPH functions in R.

Task

[PUBDEV-5621] - Added balance_classes, class_sampling_factors, and max_after_balance_size options to AutoML in Flow.

Improvement

[PUBDEV-3754] - Updated the project URL, bug reports link, and list of authors in the h2o R package DESCRIPTION file.
[PUBDEV-5542] - Update description of the h2o R package in the DESCRIPTION file.
[PUBDEV-5570] - AutoML now produces an error message when a response column is missing.
[PUBDEV-5623] - Fixed intermittent test failures for AutoML.
[PUBDEV-5625] - Removed frame metadata calculation from AutoML.
[PUBDEV-5635] - Removed the keep_cross_validation_models = False argument from the AutoML User Guide examples.
[PUBDEV-5636] - Users can now set a MAX_CM_CLASSES parameter to set a maximum number of confusion matrix classes.

Docs

[PUBDEV-5619] - Updated the AutoML screenshot in Flow to show the newly added parameters.

Wright (3.20.0.1) - 6/6/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wright/1/index.html

Bug

[PUBDEV-4299] - In Scala, the `new H2OFrame()` API no longer fails when using http/https URL-based data sources.
[PUBDEV-4865] - Fixed an issue that caused the Java client JVM to get stuck with a latch/lock leak on the server.
[PUBDEV-5342] - Fixed an issue that caused intermittent NPEs in AutoML.
[PUBDEV-5357] - In parse, each lock now includes the owner rather than locking with null.
[PUBDEV-5359] - LDAP documentation now contains the correct name of the Auth module.
[PUBDEV-5426] - h2o.jar no longer includes a Jetty 6 dependency.
[PUBDEV-5462] - `model_summary` is now available when running Stacked Ensembles in R.
[PUBDEV-5478] - XGBoost now correctly respects the H2O `nthreads` parameter.
[PUBDEV-5488] - Fixed an invalid invariant in the recall calculation.
[PUBDEV-5497] - h2o-genmodel.jar can now be loaded into Spark's spark.executor.extraClassPath.
[PUBDEV-5501] - AutoML now correctly detects the leaderboard frame in H2O Flow.
[PUBDEV-5524] - In XGBoost, fixed an issue that resulted in a "Check failed: param.max_depth < 16 Tree depth too large" error.
[PUBDEV-5551] - Zero decimal values and NAs are now represented correctly in XGBoost.
[PUBDEV-5552] - Response variable datatype checks are now extended to include TIME datatypes.
[PUBDEV-5598] - The `-proxy` argument is now available as part of the h2odriver.args file.
[PUBDEV-5605] - Fixed `stopping_metric` values in user guide. Abbreviated values should be specified using upperchase characters (for example, MSE, RMSE, etc.).
[PUBDEV-5610] - Proxy Mode of h2odriver now supports a notification file (specified with the `-notify` argument).
[PUBDEV-5617] - Fixed an issue that caused h2o.predict to throw an exception in H2OCoxPH models with interactions with stratum.

New Feature

[PUBDEV-3901] - Added MOJO support in Python (via jar file).
[PUBDEV-4927] - Added the `sort_metric` argument to AutoML.
[PUBDEV-4939] - Users now have the option to save CV predictions and CV models in AutoML.
[PUBDEV-4968] - Added an `h2o.H2OFrame.rename` method to rename columns in Python.
[PUBDEV-4991] - MOJO and POJO support are now available for AutoML.
[PUBDEV-5019] - Added support for the Cox Proportional Hazard (CoxPH) algorithm. Note that this is currently available in R and Flow only. It is not yet available in Python.
[PUBDEV-5177] - Added h2o.get_automl()/h2o.getAutoML function to R/Python APIs.
[PUBDEV-5377] - Added the `balance_classes`, `class_sampling_factors`, and max_after_balance_size` arguments to AutoML.
[PUBDEV-5408] - When running GLM in Flow, users can now see the InteractionPairs option.
[PUBDEV-5424] - Added support for MOJO scoring on a CSV or data frame in R.
[PUBDEV-5452] - Added an "export model as MOJO" button to Flow for supported algorithms.
[PUBDEV-5520] - Added support for XGBoost MOJO deployment on Windows 10.
[PUBDEV-5529] - GBM and DRF MOJOs and POJOs now return leaf node assignments.
[PUBDEV-5599] - Added the `sort_metric` option to AutoML in Flow.
[PUBDEV-5600] - keep_cross_validation_predictions and keep_cross_validation_models are now available when running AutoML in Flow.
[PUBDEV-5615] - Deep Learning MOJO now extends Serializable.

Story

[PUBDEV-5398] - In CoxPH, when a categorical column is only used for a numerical-categorical interaction, the algorithm will enforce useAllFactorLevels for that interaction.

Task

[PUBDEV-4570] - When running AutoML and XGBoost, fixed an issue that caused the adapting test frame to be different than the train frame.
[PUBDEV-4826] - Removed Domain length check for Stacked Ensembles.
[PUBDEV-5058] - GLRM predict no longer generates different outputs when performing predictions on training and testing dataframes.
[PUBDEV-5368] - Added support for ingesting data from Hive2 using SQLManager (JDBC interface). Note that this is experimental and is not yet suitable for large datasets.

Improvement

[PUBDEV-4375] - Replaced the Jama SVD computation in PCA with netlib-java library MTJ.
[PUBDEV-4518] - Created more tests in AutoML to ensure that all fold_assignment values and fold_column work correctly.
[PUBDEV-4571] - Fixed an NPE the occurred when clicking on View button while running AutoML.
[PUBDEV-4581] - Bundled Windows XGboost libraries.
[PUBDEV-4618] - Search-based models are no longer duplicated when AutoML is run again on the same dataset with the same seed.
[PUBDEV-4718] - When running Stacked Ensembles in R, added support for a vector of base_models in addition to a list.
[PUBDEV-4956] - Added support for Java 9.
[PUBDEV-5388] - Fixed an issue that resulted in an additional progress bar when running h2o.automl() in R.
[PUBDEV-5411] - Fixed an issue that resulted in an additional progress bar when running AutoML in Python.
[PUBDEV-5440] - The runint_automl_args.R test now always builds at least 2 models.
[PUBDEV-5459] - Improved XGBoost speed by not recreating DMatrix in each iteration (during training).
[PUBDEV-5476] - `offset_column` is now exposed in EasyPredictModelWrapper.
[PUBDEV-5477] - Improved single node XGBoost performance.
[PUBDEV-5486] - Added support for pip 10.0.0.
[PUBDEV-5495] - In GLM, gamma distribution with 0's in the response results in an improved message: "Response value for gamma distribution must be greater than 0."
[PUBDEV-5499] - Added metrics to AutoML leaderboard. Binomial models now also show mean_per_class_error, rmse, and mse. Multinomial problems now also show logloss, rmse and mse. Regression models now also show mse.
[PUBDEV-5533] - Exposed `model dump` in XGBoost MOJOs.
[PUBDEV-5538] - Improved rebalance for Frames.
[PUBDEV-5553] - Introduced the precise memory allocation algorithm for XGBoost sparse matrices.
[PUBDEV-5577] - Improved SSL documentation.
[PUBDEV-5601] - The Exclude Algorithms section in Flow AutoML is now always visible, even if you have not yet selected a training frame.
[PUBDEV-5606] - Removes unused parameters, fields, and methods from AutoML. Also exposed buildSpec in the AutoML REST API.

Docs

[PUBDEV-4977] - Updated documentation to indicate support for Java 9.
[PUBDEV-5154] - Added the new `pca_impl` parameter to PCA section of the user guide.
[PUBDEV-5164] - Added a Checkpointing Models section to the User Guide. This describes how checkpointing works for each supported algorithm.
[PUBDEV-5401] - In the "Getting Data into H2O" section, added a link to the new Hive JDBC demo.
[PUBDEV-5407] - The Import File example now also shows how to import from HDFS.
[PUBDEV-5436] - Fixed markdown headings in the example Flows.
[PUBDEV-5474] - All installation examples use H2O version 3.20.0.1.
[PUBDEV-5494] - Added a "Data Manipulation" topic for target encoding in R.
[PUBDEV-5496] - Added new keep_cross_validation_models and keep_cross_validation_predictions options to the AutoML documentation.
[PUBDEV-5509] - Added an example of using XGBoost MOJO with Maven.
[PUBDEV-5513] - In the XGBoost chapter, added information describing how to disable XGBoost.
[PUBDEV-5554] - When running XGBoost on Hadoop, added a note that users should set -extramempercent to a much higher value.
[PUBDEV-5579] - Added a section for the CoxPH (Cox Proportional Hazards) algorithm.
[PUBDEV-5581] - Added a topic describing how to install H2O-3 from the Google Cloud Platform offering.

Wolpert (3.18.0.11) - 5/24/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/11/index.html

New Feature

[PUBDEV-5584] - Enabled Java 10 support for CRAN release.

Task

[PUBDEV-5585] - GLM tests no longer fail on Java 10.

Wolpert (3.18.0.10) - 5/22/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/10/index.html

Bug

[PUBDEV-5558] - Fixed an issue for adding Double.NaN to IntAryVisitor via addValue().

Task

[PUBDEV-5559] - Removed all code that referenced Google Analytics.
[PUBDEV-5565] - Disabled version check in H2O-3.
[PUBDEV-5567] - Removed all Google Analytics references and code from Flow.
[PUBDEV-5568] - Removed all Google Analytics references and code from Documentation.

Docs

[PUBDEV-5545] - The Security chapter in the User Guide now describes how to enforce system-level command-line arguments in h2odriver when starting H2O.

Wolpert (3.18.0.9) - 5/11/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/9/index.html

Bug

[PUBDEV-5290] - Fixed an issue that caused distributed XGBoost to not be registered in the REST API
[PUBDEV-5325] - Fixed an issue that caused XGBoost to crash due "too many open files."
[PUBDEV-5444] - Frames are now rebalanced correctly on multinode clusters.
[PUBDEV-5464] - Fixed an issue that prevented H2O libraries to load in DBC.
[PUBDEV-5507] - Added more robust checks for Colorama version.
[PUBDEV-5510] - Added more robust checks for Colorama version in H2O Python client.
[PUBDEV-5518] - A response column is no longer required when performing Deep Learning grid search with autoencoder enabled.
[PUBDEV-5527] - Fixed a KeyV3 error message that incorrectly referenced KeyV1.
[PUBDEV-5544] - The external backend now stores sparse vector values correctly.

New Feature

[PUBDEV-5456] - Added a new rank_within_group_by function in R and Python for ranking groups and storing the ranks in a new column.

Improvement

[PUBDEV-5500] - Improved warning messages in AutoML.
[PUBDEV-5537] - System administrators can now create a configuration file with implicit arguments of h2odriver and use it to make sure the h2o cluster is started with proper security settings.

Wolpert (3.18.0.8) - 4/19/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/8/index.html

Task

[PUBDEV-5465] - Release for CRAN submission.

Wolpert (3.18.0.7) - 4/14/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/7/index.html

Bug

[PUBDEV-5485] - Fixed a MOJO/POJO scoring issue caused by a serialization bug in EasyPredictModelWrapper.

Wolpert (3.18.0.6) - 4/13/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/6/index.html

Bug

[PUBDEV-5484] - In XGBoost, fixed a memory issue that caused training to fail even when running on small datasets.
[PUBDEV-5441] - When files have a Ctr-M character as part of data in the row and Ctr-M also signifies the end of line in that file, it is now parsed correctly.
[PUBDEV-5458] - H2O-3 no longer displays the server version in HTTP response headers.
[PUBDEV-5460] - Updated the Mockito library.

Task

[PUBDEV-5449] - Conda packages are now availabe on S3, enabling installation for users who cannot access anaconda.org.

Improvement

[PUBDEV-5473] - Added an offset to predictBinomial Easy wrapper.

Docs

[PUBDEV-5227] - Updated the AutoML chapter of the User Guide to include a link to H2O World AutoML Tutorials and updated code examples that do not use leaderboard_frame.
[PUBDEV-5457] - Fixed links to POJO/MOJO tutorials in the GBM FAQ > Scoring section.

Wolpert (3.18.0.5) - 3/28/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/5/index.html

Bug

[PUBDEV-4933] - AutoML no longer trains a Stacked Ensemble with only one model.
[PUBDEV-5028] - GBM and GLM grids no longer fail in AutoML for multinomial problems.
[PUBDEV-5266] - Users can now merge/sort frames that contain string columns.
[PUBDEV-5303] - Fixed an issue that occured with multinomial GLM POJO/MOJO models.
[PUBDEV-5334] - Users can no longer specify a value of 0 for the col_sample_rate_change_per_level parameter. The value for this parameter must be greater than 0 and <= 2.0.
[PUBDEV-5336] - The H2O-3 Python client no longer returns an incorrect answer when running a conditional statement.
[PUBDEV-5365] - Added support for CDH 5.14.
[PUBDEV-5366] - Fixed an issue that caused XGBoost to fail when running the airlines dataset on a single-node H2O cluster.
[PUBDEV-5370] - The H2O-3 parser can now handle utf-8 characters that appear in the header.
[PUBDEV-5394] - The H2O-3 parser no longer treats the "Ctr-M" character as an end of line on Linux.
[PUBDEV-5414] - H2O no longer generates a warning when predicting without a weights column.

New Feature

[PUBDEV-5402] - The AutoML leaderboard no longer prints NaNs for non-US locales.

Task

[PUBDEV-5235] - Added a demo of XGBoost in Flow.
[PUBDEV-5386] - Improved the ordinal regression parameter optimization by changing the implementation.

Improvement

[PUBDEV-3978] - In Flow, improved the vertical scrolling for training and validation metrics for thresholds.
[PUBDEV-5364] - Added more logging regarding the WatchDog client.
[PUBDEV-5383] - Replaced unknownCategoricalLevelsSeenPerColumn with ErrorConsumer events in POJO log messages.
[PUBDEV-5400] - Improved the logic that triggers rebalance.
[PUBDEV-5404] - AutoML now uses correct datatypes in the AutoML leaderboard TwoDimTable.

Docs

[PUBDEV-5292] - Added ``beta constraints`` and ``prior`` entries to the Parameters Appendix, along with examples in R and Python.
[PUBDEV-5369] - Added CDH 5.14 to the list of supported Hadoop platforms in the User Guide.
[PUBDEV-5413] - Updated the documenation for the Ordinal ``family`` option in GLM based on the new implementation. Also added new solvers to the documenation: GRADIENT_DESCENT_LH and GRADIENT_DESCENT_SQERR.
[PUBDEV-5416] - Added information about Extremely Randomized Trees (XRT) to the DRF chapter in the User Guide.
[PUBDEV-5421] - On the H2O-3 and Sparkling Water download pages, the link to documentation site now points to the most updated version.
[PUBDEV-5432] - The ``target_encode_create`` and ``target_encode_apply`` are now included in the R HTML documentation.

Fault

[PUBDEV-5367] - Fixed an issue that caused SQLManager import to break on cluster with over 100 nodes.

Wolpert (3.18.0.4) - 3/8/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/4/index.html

Fixed minor release process issue preventing Sparkling Water release.

Wolpert (3.18.0.3) - 3/2/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/3/index.html

Bug

[PUBDEV-5102] - In Flow, the metalearner_fold_column option now correctly displays a drop-down of column names.
[PUBDEV-5282] - Fixed an issue that caused data import and building models fail when using Flow in IE 11.1944 on Windows 10 Enterprise.
[PUBDEV-5323] - Stacked Ensemble no longer fails when using a grid or list of GLMs as the base models.
[PUBDEV-5330] - Fixed an issue that caused an error when during Parquet data ingest.
[PUBDEV-5335] - In Random Forest, added back the distribution and offset_column options for backward compatibility. Note that these options are deprecated and will be ignored if used.
[PUBDEV-5339] - MOJO export to a file now works correctly.
[PUBDEV-5343] - Fixed an NPE that occurred when checking if a request is Xhr.

New Feature

[PUBDEV-5008] - Added support for ordinal regression in GLM. This is specified using the `family` option.
[PUBDEV-5274] - Added the exclude_algos option to AutoML in Flow.
[PUBDEV-5308] - Added a Leave-One-Out Target Encoding option to the R API. This can help improve supervised learning results when there are categorical predictors with high cardinality. Note that a similar function for Python will be available at a later date.
[PUBDEV-5324] - POJO now logs error messages for all incorrect data types and includes default values rather than NULL when a data type is unexpected.

Improvement

[PUBDEV-5344] - Moved AutoML to the top of the Model menu in Flow.

Docs

[PUBDEV-5306] - In the GLM chapter, added Ordinal to the list of `family` options. Also added Ologit, Oprobit, and Ologlog to the list of `link` options, which can be used with the Ordinal family.

Wolpert (3.18.0.2) - 2/20/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/2/index.html

Bug

[PUBDEV-5301] - Distributed XGBoost no longer fails silently when expanding a 4G dataset on a 1TB cluster.
[PUBDEV-5254] - Fixed an issue that caused GLM Multinomial to not work properly.
[PUBDEV-5278] - In XGBoost, when the first domain of a categorical is parseable as an Int, the remaining columns are not automatically assumed to also be parseable as an Int. As a result of this fix, the default value of categorical_encoding in XGBoost is now AUTO rather than label_encoder.
[PUBDEV-5294] - Fixed an issue that caused XGBoost models to fail to converge when an unknown decimal separator existed.
[PUBDEV-5326] - Fixed an issue in ParseTime that led to parse failing.

Docs

[PUBDEV-5313] - In the User Guide, the default value for categorical_encoding in XGBoost is now AUTO rather than label_encoder.

Wolpert (3.18.0.1) - 2/12/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/1/index.html

Bug

[PUBDEV-4585] - Fixed an issue that caused XGBoost binary save/load to fail.
[PUBDEV-4593] - Fixed an issue that caused a Levensthein Distance Normalization Error. Levenstein distance is now implemented directly into H2O.
[PUBDEV-5112] - The Word2Vec Python API for pretrained models no longer requires a training frame. In addition, a new `from_external` option was added, which creates a new H2OWord2vecEstimator based on an external model.
[PUBDEV-5128] - Fixed an issue that caused the show function of metrics base to fail to check for a key custom_metric_name and excepts.
[PUBDEV-5129] - The fold column in Kmeans is no longer required to be in x.
[PUBDEV-5130] - The date is now parsed correctly when parsed from H2O-R.
[PUBDEV-5133] - In Flow, the scoring history plot is now available for GLM models.
[PUBDEV-5135] - The Parquet parser no longer fails if one of the files to parse has no records.
[PUBDEV-5145] - Added error checking and logging on all the uses of `water.util.JSONUtils.parse().
[PUBDEV-5155] - In AutoML, fixed an exception in Python binding that occurred when the leaderboard was empty.
[PUBDEV-5156] - In AutoML, fixed an exception in R binding that occurred when the leaderboard was empty.
[PUBDEV-5159] - Removed Pandas dependency for AutoML in Python.
[PUBDEV-5167] - In PySparkling, reading Parquet/Orc data with time type now works correctly in H2O.
[PUBDEV-5174] - Fixed a maximum recursion depth error when using `isin` in the H2O Python client.
[PUBDEV-5175] - When running getJobs in Flow, fixed a ClassNotFoundException that occurred when AutoML jobs existed.
[PUBDEV-5179] - Fixed an issue that caused a list of columns to be truncated in PySparkling. Light endpoint now returns all columns.
[PUBDEV-5186] - In AutoML, fixed a deadlock issue that occurred when two AutoML runs came in the same second, resulting in matching timestamps.
[PUBDEV-5191] - The offset_column and distribution parameters are no longer available in Random Forest.
[PUBDEV-5195] - Fixed an issue in XGBoost that caused MOJOs to fail to work without manually adding the Commons Logging dependency.
[PUBDEV-5203] - Fixed an issue that caused XGBoost to mangle the domain levels for datasets that have string response domains.
[PUBDEV-5213] - In Flow, the separator drop down now shows 3-digit decimal values instead of 2.
[PUBDEV-5215] - Users can now specify interactions when running GLM in Flow.
[PUBDEV-5228] - FrameMetadate code no longer uses hardcoded keys. Also fixed an issue that caused AutoML to fail when multiple AutoMLs are run simultaneously.
[PUBDEV-5229] - A frame can potentially have a null key. If there is a Frame with a null key (just a container for vecs), H2O no longer attempts to track a null key.
[PUBDEV-5256] - Users can now successfully build an XGBoost model as compile chain. XGBoost no longer fails to provide the compatible artifact for an Oracle Linux environment.
[PUBDEV-5265] - GLM no longer fails when a categorical column exists in the dataset along with an empty value on at least one row.
[PUBDEV-5286] - Fixed an issue that cause GBM grid to fail on some datasets when specifying `sample_rate` in the grid.
[PUBDEV-5287] - The x argument is no longer required when performing a grid search.
[PUBDEV-5297] - Fixed an issue that caused the Parquet parser to fail on Spark 2.0 (SW-707).
[PUBDEV-5315] - Fixed an issue that caused XGBoost OpenMP to fail on Ubuntu 14.04.

New Feature

[PUBDEV-4111] - Added support for INT96 timestamp to the Parquet parser.
[PUBDEV-4652] - Added support for XGBoost multinode training in H2O. Note that this is still a BETA feature.
[PUBDEV-4980] - Users can now specify a list of algorithms to exclude during an AutoML run. This is done using the new `exclude_algos` parameter.
[PUBDEV-5204] - In GLM, users can now specify a list of interactions terms to include when building a model instead of relying on the default action of including all interactions.

Task

[PUBDEV-5230] - The Python PCA code examples in github and in the User Guide now use the h2o.estimators.pca.H2OPrincipalComponentAnalysisEstimator method instead of the h2o.transforms.decomposition.H2OPCA method.
[PUBDEV-5251] - Upgraded the XGBoost version. This now supports RHEL 6.

Improvement

[PUBDEV-5086] - Stacked Ensemble allows you to specify the metalearning algorithm to use when training the ensemble. When an algorithm is specified, Stacked Ensemble runs with the specified algorithm's default hyperparameter values. The new ``metalearner_params`` option allows you to pass in a dictionary/list of hyperparameters to use for that algorithm instead of the defaults.
[PUBDEV-5224] - Users can now specify a seed parameter in Stacked Ensemble.
[PUBDEV-5310] - Documented clouding behavior of an H2O cluster. This is available at https://github.com/h2oai/h2o-3/blob/master/h2o-docs/devel/h2o_clouding.rst.

Docs

[PUBDEV-5149] - Updated the documentation to indicate that datetime parsing from R and Flow now is UTC by default.
[PUBDEV-5151] - R documentation on docs.h2o.ai is now available in HTML format.
[PUBDEV-5172] - Added a new Cloud Integration topic for using H2O with AWS.
[PUBDEV-5221] - In the XGBoost chapter, added that XGBoost in H2O supports multicore.
[PUBDEV-5242] - Added `interaction_pairs` to the list of GLM parameters.
[PUBDEV-5283] - Added `metalearner_algorithm` and `metalearner_params` to the Stacked Ensembles chapter.
[PUBDEV-5311] - The H2O-3 download site now includes a link to the HTML version of the R documentation.
[PUBDEV-5312] - Updated the XGBoost documentation to indicate that multinode support is now available as a Beta feature.
[PUBDEV-5314] - Added the seed parameter to the Stacked Ensembles section of the User Guide.

Wheeler (3.16.0.4) - 1/15/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/4/index.html

Bug

[PUBDEV-5206] - Fixed several client deadlock issues.

[PUBDEV-5212] - When verifying that a supported version of Java is available, H2O no longer checks for version 1.6.

[PUBDEV-5216] - The H2O-3 download site has an updated link for the Sparkling Water README.

[PUBDEV-5220] - In Aggregator, fixed the way that a created mapping frame is populated.

New Feature

[PUBDEV-5209] - XGBoost can now be used in H2O on Hadoop with a single node.

Improvement

[PUBDEV-5210] - Deep Water is disabled in AutoML.

[PUBDEV-5211] - This release of H2O includes an upgraded XGBoost version.

Wheeler (3.16.0.3) - 1/8/2018

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/3/index.html

Technical task

[PUBDEV-5184] - H2O-3 now allows definition of custom function directly in Python notebooks and enables iterative updates on defined functions.

Bug

[PUBDEV-4863] - When a frame name includes numbers followed by alphabetic characters (for example, "250ML"), Rapids no longer parses the frame name as two tokens.
[PUBDEV-4897] - Fixed an issue that caused Partial Dependence Plots to a use different order of categorical values after calling as.factor.
[PUBDEV-5148] - Added support for CDH 5.13.
[PUBDEV-5180] - Fixed an issue that caused a Python 2 timestamp to be interpreted as two tokens.
[PUBDEV-5196] - Aggregator supports categorial features. Fixed a discrepency in the Aggregator documentation.

New Feature

[PUBDEV-4622] - In GBM, users can now specify quasibinomial distribution.
[PUBDEV-4965] - H2O-3 now supports the Netezza JDBC driver.

Improvement

[PUBDEV-5171] - Users can now optionally export the mapping of rows in an aggregated frame to that of the original raw data.

Docs

[PUBDEV-5120] - When using S3/S3N, revised the documentation to recommend that S3 should be used for data ingestion, and S3N should be used for data export.
[PUBDEV-5150] - The H2O User Guide has been updated to indicate support for CDH 5.13.
[PUBDEV-5162] - Updated the Anaconda section with information specifically for Python 3.6 users.
[PUBDEV-5178] - The H2O User Guide has been updated to indicate support for the Netezza JDBC driver.
[PUBDEV-5190] - Added "quasibinomial" to the list of `distribution` options in GBM.
[PUBDEV-5192] - Added the new `save_mapping_frame` option to the Aggregator documentation.

Wheeler (3.16.0.2) - 11/30/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/2/index.html

Bug

[PUBDEV-5115] - In AutoML, fixed an issue that caused the leaderboard_frame to be ignored when nfolds > 1.

[PUBDEV-5117] - Improved the warning that displays when mismatched jars exist.

[PUBDEV-5126] - The correct H2O version now displays in setup.py for sdist.

Improvement

[PUBDEV-5111] - Incorporated final improvements to the Sparkling Water booklet.

[PUBDEV-5127] - Automated Anaconda releases.

[PUBDEV-5131] - This version of H2O introduces light rest endpoints for obtaining frames in the python client.

Wheeler (3.16.0.1) - 11/24/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/1/index.html

Technical Task

[PUBDEV-5087] - A backend Java API is now available for custom evaluation metrics.

Bug

[PUBDEV-1465] - Users can now save models to and download models from S3.
[PUBDEV-3567] - When running h2o.merge in the R client, the status line indicator will no longer return quickly. Users can no longer enter new commands until the merge process is completed.
[PUBDEV-4172] - In the R client strings, training_frame says no longer states that it is an optional parameter.
[PUBDEV-4672] - The H2OFrame.mean method now works in Python 3.6.
[PUBDEV-4697] - Early stopping now works with perfectly predictive data.
[PUBDEV-4727] - h2o.group_by now works correctly when specifying a median() value.
[PUBDEV-4778] - In XGBoost fixed an issue that caused prediction on a dataset without a response column to return an error.
[PUBDEV-4853] - When running AutoML in Flow, users can now specify a project name.
[PUBDEV-4857] - h2odriver in proxy mode now correctly forwards the authentication headers to the H2O node.
[PUBDEV-4900] - H2O can ingest Parquet 1.8 files created by Spark.
[PUBDEV-4906] - Loading models and exporting models to/from AWS S3 now works correctly.
[PUBDEV-4907] - Fixed an issue that caused binary model imports and exports from/to S3 to fail.
[PUBDEV-4930] - Users can now load data from s3n resources after setting core-site.xml correctly.
[PUBDEV-4953] - Fixed an error that occurred when exporting data to s3.
[PUBDEV-4985] - Fixed an issue that caused H2O to "forget" that a column is of factor type if it contains only NA values.
[PUBDEV-4996] - The download instructions for Python now indicate that version 3.6 is supported.
[PUBDEV-5002] - In Flow, fixed an issue with retaining logs from the client node.
[PUBDEV-5003] - H2O can now handle the case where I'm the Client and the md5 should be ignored.
[PUBDEV-5005] - h2o.residual_deviance now works correctly.
[PUBDEV-5017] - h2o.predict no longer returns an error when the user does not specify an offset_column.
[PUBDEV-5033] - Fixed an issue with Spark string chunks.
[PUBDEV-5037] - Logs now display correctly on HADOOP, and downloaded logs no longer give an empty folder when the cluster is up.
[PUBDEV-5038] - Added an option for handling empty strings. If compare_empty if set to FALSE, empty strings will be handled as NaNs.
[PUBDEV-5040] - HTTP logs can now be obtained in Flow UI.
[PUBDEV-5048] - Fixed an issue with the progress bar that occurred when running PySparkling + DataBricks.
[PUBDEV-5067] - Fixed reporting of clients with the wrong md5.
[PUBDEV-5070] - In the R and Python clients, updated the strings for max_active_predictors to indicate that the default is now 5000.
[PUBDEV-5072] - h2o.merge now works correctly for one-to-many when all.x=TRUE.
[PUBDEV-5074] - Fixed an issue that caused GLM predict to fail when a weights column was not specified.
[PUBDEV-5081] - Reduced the number of URLs that get sent to google analytics.
[PUBDEV-5095] - When building a Stacked Ensemble model, the fold_column from AutoML is now piped through to the stacked ensemble.
[PUBDEV-5096] - Fixed an issue that cause GLM scoring to produce incorrect results for sparse data.

Epic

[PUBDEV-4684] - This version of H2O includes support for Python 3.6.

New Feature

[PUBDEV-3877] - MOJOs are now supported for Stacked Ensembles.
[PUBDEV-3743] - User can now specify the metalearner algorithm type that StackedEnsemble should use. This can be AUTO, GLM, GBM, DRF, or Deep Learning.
[PUBDEV-3971] - Added a metalearner_folds option in Stacked Ensembles, enabling cross validation.
[PUBDEV-4085] - In GBM, endpoints are now exposed that allow for custom evaluation metrics.
[PUBDEV-4882] - When running AutoML through the Python or R clients, users can now specify the nfolds argument.
[PUBDEV-4891] - Add another Stacked Ensemble (top model for each algo) to AutoML
[PUBDEV-5071] - The AutoML leaderboard now uses cross-validation metrics (new default).
[PUBDEV-4914] - K-Means POJOs and MOJOs now expose distances to cluster centers.
[PUBDEV-4957] - Multiclass stacking is now supported in AutoML. Removed the check that caused AutoML to skip stacking for multiclass.
[PUBDEV-5043] - Users can now specify a number of folds when running AutoML in Flow.
[PUBDEV-5084] - Added a metalearner_fold_column option in Stacked Ensembles, allowing for custom folds during cross validation.
[PUBDEV-4994] - The Aggregator Function is now exposed in the R client.
[PUBDEV-4995] - The Aggregator Function is now available in the Python client.

Story

[PUBDEV-5044] - Fixed a Jaro-Winkler Dependency.

Task

[PUBDEV-4803] - The current version of h2o-py is now published into PyPi.
[PUBDEV-4896] - Change behavior of auto-generation of validation and leaderboard frames in AutoML
[PUBDEV-4931] - Updated the download site and the end user documentation to indicate that Python3.6 is now supported.
[PUBDEV-4935] - PyPi/Anaconda descriptors now indicate support for Python 3.6.

Improvement

[PUBDEV-4791] - Enabled the lambda search for the GLM metalearner in Stacked Ensembles. This is set to TRUE and early_stopping is set to FALSE.
[PUBDEV-4831] - Running `pip install` now installs the latest version of H2O-3.
[PUBDEV-4963] - In EasyPredictModelWrapper, preamble(), predict(), and fillRawData() are now protected rather than private.
[PUBDEV-5082] - MOJOs/POJOs will not be created for unsupported categorical_encoding values.
[PUBDEV-5109] - An AutoML run now outputs two StackedEnsemble model IDs. These are labeled StackedEnsemble_AllModels and StackedEnsemble_BestOfFamily.

Docs

[PUBDEV-4298] - In the Data Manipulation chapter, added a topic for pivoting tables.
[PUBDEV-4662] - Added a topic to the Data Manipulation chapter describing the h2o.fillna function.
[PUBDEV-4747] - Added MOJO and POJO Quick Start sections directly into the Productionizing H2O chapter. Previously, this chapter included links to quick start files.
[PUBDEV-4810] - In the GBM booklet when describing nbins_cat, clarified that factors rather than columns get grouped together.
[PUBDEV-4816] - The description for the GLM lambda_max option now states that this is the smallest lambda that drives all coefficients to zero.
[PUBDEV-4833] - Updated the installation instructions for PySparkling.
[PUBDEV-4864] - Clarified that in H2O-3, sampling is without replacement.
[PUBDEV-4878] - Updated documentation to state that multiclass classification is now supported in Stacked Ensembles.
[PUBDEV-4879] - Updated documentation to state that multiclass stacking is now supported in AutoML.
[PUBDEV-4895] - Added an Early Stopping section the Algorithms > Common chapter.
[PUBDEV-4945] - Added a note in Word2vec stating that binary format is not supported.
[PUBDEV-4946] - In the Parameters Appendix, updated the description for histogram_type=random.
[PUBDEV-4958] - In the Using Flow > Models > Run AutoML section, updated the AutoML screenshot to show the new Project Name field.
[PUBDEV-4971] - Added a Sorting Columns data munging topic describing how to sort a data frame by column or columns.
[PUBDEV-5000] - In KMeans, updated the list of model summary statistics and training metrics that are outputted.
[PUBDEV-5011] - Removed SortByResponse from the list of categorical_encoding options for Aggregator and K-Means.
[PUBDEV-5026] - Updated the Sparkling Water links on docs.h2o.ai to point to the latest release.
[PUBDEV-5032] - Added a section in the Algorithms chapter for Aggregator.
[PUBDEV-5056] - Updated the description for Save and Loading Models to indicate that H2O binary models are not compatible across H2O versions.
[PUBDEV-5057] - Added ignored_columns and 'x' parameters to AutoML section. Also added the 'x' parameter to the Parameters Appendix.
[PUBDEV-5062] - In DRF, add FAQs describing splitting criteria.
[PUBDEV-5085] - Added the new metalearner_folds and metalearner_fold_assignment parameters to the Defining a Stacked Ensemble Model section in the User Guide.
[PUBDEV-5089] - Updated the Sparking Water booklet. (Also PUBDEV-5004.)
[PUBDEV-5092] - Added the new metalearner_algorithm parameter to Defining a Stacked Ensemble Model section in the User Guide.
[PUBDEV-5097] - The User Guide and the POJO/MOJO Javadoc have been updated to indicate that MOJOs are supported for Stacked Ensembles.

Weierstrass (3.14.0.7) - 10/20/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/7/index.html

Bug

[PUBDEV-4987] - h2o.H2OFrame.any() and h2o.H2OFrame.all() not working properly if frame contains only True
[PUBDEV-4988] - Don't check H2O client hash-code ( Fix )

Task

[PUBDEV-4003] - Generate Python API tests for Python Module Data in H2O and Data Manipulation

Weierstrass (3.14.0.6) - 10/9/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/6/index.html

Bug

[SW-542] - Fixed an issue that prevented Sparkling Water from importing Parquet files.

Weierstrass (3.14.0.5) - 10/9/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/5/index.html

Bug

[PUBDEV-4870] - Fixed an issue that caused sorting to be done incorrectly.
[PUBDEV-4917] - Only relevant clients (the ones with the same cloud name) are now reported to H2O.
[PUBDEV-4954] - Improved error messaging in the case where H2O fails to parse a valid Parquet file.
[PUBDEV-4959] - Fixed an issue that allowed nodes from different clusters to kill different H2O clusters.
[PUBDEV-4979] - Fixed an issue that caused K-Means to improperly calculate scaled distance.

Task

[PUBDEV-4925] - Nightly and stable releases will now have published sha256 hashes.

Improvement

[PUBDEV-4404] - The h2o.sort() function now includes an `ascending` parameter that allows you to specify whether a numeric column should be sorted in ascending or descending order.
[PUBDEV-4964] - H2O no longer terminates when an incompatible client tries to connect.

Docs

[PUBDEV-4949] - Updated the list of required packages for the H2O-3 R client on the H2O Download site and in the User Guide.
[PUBDEV-4966] - Added an FAQ to the User Guide FAQ describing how Java 9 users can switch to a supported Java version.

Weierstrass (3.14.0.3) - 9/18/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/3/index.html

Technical Task

[PUBDEV-4873] - Introduced a Python client side AST optimization.

Bug

[PUBDEV-3525] - In R, `h2o.arrange()` can now sort on a float column.
[PUBDEV-4723] - The `as_data_frame()` function no longer drops rows with NAs when `use_pandas` is set to TRUE.
[PUBDEV-4735] - In Deep Learning POJOs, fixed an issue in the sharing stage between threads.
[PUBDEV-4739] - Fixed an issue in R that caused `h2o.sub` to fail to retain the column names of the frame.
[PUBDEV-4757] - Running ifelse() on a constant column no longer results in an error.
[PUBDEV-4846] - Using + on string columns now works correctly.
[PUBDEV-4848] - Fixed an issue that caused a POJO and a MOJO to return different column names with the `getNames()` method.
[PUBDEV-4849] - The R and Python clients now have consistent timeout numbers.
[PUBDEV-4868] - Fixed an issue that resulted in an AIOOB error when predicting with GLM. NA responses are now removed prior to GLM scoring.
[PUBDEV-4909] - The set_name method now works correctly in the Python client.
[PUBDEV-4921] - Replaced the deprecated Clock class in timing.gradle.
[PUBDEV-4937] - The MOJO Reader now closes open files after reading.

New Feature

[PUBDEV-4628] - MOJO support has been extended to include the Deep Learning algorithm.
[PUBDEV-4845] - Added the ability to import an encrypted (AES128) file into H2O. This can be configured glovally by specifying the `-decrypt_tool` option and installing the tool in DKV.
[PUBDEV-4904] - The Decryption API is now exposed in the REST API and in the R client.

Docs

[PUBDEV-4811] - Updated the MOJO Quick Start Guide to show separator differences between Linux/OS X and Windows. Also updated the R example to match the Python example.

Weierstrass (3.14.0.2) - 8/21/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/2/index.html

Bug

[PUBDEV-4804] - Fixed a broken link to the Hive tutorials from the Productionizing section in the User Guide.
[PUBDEV-4822] - Sparkling Water can now pass a data frame with a vector for conversion into H2OFrame. In prior versions, the vector was not properly expanded and resulted in a failure.

Task

[PUBDEV-4802] - Added more tests to ensure that, when max_runtime_secs is set, the returned model works correctly.

Improvement

[PUBDEV-4812] - This version of H2O includes an option to force toggle (on/off) a specific extension. This enables users to enable the XGBoost REST API on a system that does not support XGBoost.
[PUBDEV-4829] - A warning now displays when the minimal XGBoost version is used.

Weierstrass (3.14.0.1) - 8/10/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/1/index.html

Bug

[PUBDEV-2767] - In the R client, making a copy of a factor column and then changing the factor levels no longer causes the levels of the original column to change.
[PUBDEV-4584] - Added a **Leaderboard Frame** option in Flow when configuring an AutoML run.
[PUBDEV-4586] - The `h2o.performance` function now works correctly on XGBoost models.
[PUBDEV-4625] - In the Python client, improved the help string for `h2o_import_file`. This string now indicates that setting `(parse=False)` will return a list instead of an H2OFrame.
[PUBDEV-4654] - Removed the Ecko dependency. This is not needed.
[PUBDEV-4683] - Fixed an issue that caused the parquet parser to store numeric/float values in a string column. This issue occurred when specifying an unsupported type conversion in Parse Setup (for example, numeric -> string). Users will now encounter an error when attempting this. Additionally, users can now change Enums->Strings in parse setup.
[PUBDEV-4686] - Deep Learning POJOs are now thread safe.
[PUBDEV-4688] - Fixed the default print method for H2OFrame in Python. Now when a user types the H2OFrame name, a new line is added, and the header is pushed to the next line.
[PUBDEV-4702] - Fixed an issue that caused the `max_runtime_secs` parameter to fail correctly when run through the Python client. As a result of this fix, the `max_runtime_secs` parameter was added to Word2vec.
[PUBDEV-4704] - Fixed an issue that caused XGBoost grid search to fail when using the Python client.
[PUBDEV-4724] - When running with weighted data and columns that are constant after applying weights, a GLM lambda search no longer results in an AIOOB error.
[PUBDEV-4730] - The XGBoost `max_bin` parameter has been renamed to `max_bins`, and its default value is now 256.
[PUBDEV-4731] - XGBoost Python documentation is now available.
[PUBDEV-4732] - In XGBoost, the `learning_rate` (alias: `eta` parameter now has a default value of 0.3.
[PUBDEV-4734] - In XGBoost, the `max_depth` parameter now has a default value of 6.
[PUBDEV-4735] - Multi-threading is now supported by POJO downloaded.
[PUBDEV-4751] - The XGBoost `min_rows` (alias: `min_child_weight`) parameter now has a default value of 1.
[PUBDEV-4752] - The XGBoost `max_abs_leafnode_pred` (alias: `max_delta_step`) parameter now has a default value of 0.
[PUBDEV-4753] - H2O XGBoost default options are now consistent with XGBoost default values. This fix involved the following changes:
- num_leaves has been renamed max_leaves, and its default value is 0.
- The default value for reg_lambda is 0.
[PUBDEV-4756] - Removed the Guava dependency from the Deep Water API.
[PUBDEV-4776] - In XGBoost, the default value for sample_rate and the alias subsample are now both 1.0.
[PUBDEV-4777] - In XGBoost, the default value for colsample_bylevel (alias: colsample_bytree) has been changed to 1.0.
[PUBDEV-4783] - Hidden files are now ignored when reading from HDFS.

New Feature

[PUBDEV-4446] - Added a `verbose` option to Deep Learning, DRF, GBM, and XGBoost. When enabled, this option will display scoring histories as a model job is running.
[PUBDEV-4682] - Added an `extra_classpath` option, which allows users to specify a custom classpath when starting H2O from the R and Python client.
[PUBDEV-4685] - Users can now override the type of a Str/Cat column in a Parquet file when the parser attempts to auto detect the column type.
[PUBDEV-4738] - Users can now run a standalone H2O instance and read from a Kerberized cluster's HDFS.
[PUBDEV-4745] - Added support for CDH 5.10.
[PUBDEV-4750] - Added support for MapR 5.2.

Improvement

[PUBDEV-3947] - Fixed an issue that caused PCA to take 39 minutes to run on a wide dataset. The wide dataset method for PCA is now only enabled if the dataset is very wide.
[PUBDEV-4596] - XGBoost-specific WARN messages have been converted to TRACE.
[PUBDEV-4624] - When printing frames via `head()` or `tail()`, the `nrows` option now allows you to specify more than 10 rows. With this change, you can print the complete frame, if desired.
[PUBDEV-4630] - Improved the speed of converting a sparse matrix to an H2OFrame in R.
[PUBDEV-4664] - Added the following parameters to the XGBoost R/Py clients:
- categorical_encoding
- sample_type
- normalize_type
- rate_drop
- one_drop
- skip_drop
[PUBDEV-4676] - H2O can now handle sparse vectors as the input of the external frame handler.
[PUBDEV-4692] - Added MOJO support for Spark SVM.
[PUBDEV-4701] - When running AutoML from within Flow, the default `stopping_tolerance` is now NULL instead of 0.001.
[PUBDEV-4748] - Removed dependency on Reflections.

Docs

[PUBDEV-4522] - Updated the list of Python requirements in the README.md, on the download site, and in the User Guide.
[PUBDEV-4553] - Updated the FAQ for Saving and Loading a Model in K-Means.
[PUBDEV-4566] - Added a Run AutoML subsection in the Flow section of the User Guide.
[PUBDEV-4600] - Continued improvements to XGBoost documentation.
[PUBDEV-4629] - Added documentation for using H2O SW with Databricks.
[PUBDEV-4632] - In the http://docs.h2o.ai/h2o/latest-stable/h2o-docs/faq/general.html topic, updated the example for scoring using an exported POJO.
[PUBDEV-4649] - In the About POJOs and MOJOs topic, added text describing the h2o-genmodel jar file.
[PUBDEV-4656] - The User Guide now indicates that Hive files can be saved in ORC format and then imported.
[PUBDEV-4689] - For topics that indicate support for Avro-formatted data, updated the User Guide to reflect that only Avro version 1.8.0 is supported.
[PUBDEV-4720] - A new H2O Python / Pandas Munging Parity document is now available at https://github.com/h2oai/h2o-3/tree/master/h2o-docs/src/cheatsheets
[PUBDEV-4733] - Added parameter defaults to the XGBoost section in the User Guide.

Vapnik (3.12.0.1) 6/6/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vapnik/1/index.html

Epic

[PUBDEV-4273] - AutoML is now available in H2O. AutoML can be used for automatically training and tuning a number of models within a user-specified time limit or model limit. It is designed to run with as few parameters as possible, and the top performing models can be viewed on a leaderboard. More information about AutoML is available here.

New Feature

[PUBDEV-4451] - With the addition of the AutoML feature, a new **Run AutoML** option is available in Flow under the **Models** dropdown menu.

Vajda (3.10.5.4) - 7/17/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/4/index.html

Bug

[PUBDEV-4694] - Fixed an issue that caused tree algos to waste memory by storing categorical values in every tree.

Vajda (3.10.5.3) - 6/30/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/3/index.html

Bug

[PUBDEV-4026] - Fixed an issue that resulted in "Unexpected character after column id:" warnings when parsing an SVMLight file.
[PUBDEV-4445] - h2o.predict now displays a warning if the features (columns) in the test frame do not contain those features used by the model.
[PUBDEV-4572] - The XGBoost REST API is now only registered when backend lib exists.
[PUBDEV-4595] - H2O no longer displays an error if there is a "/" in the user-supplied model name. Instead, a message will display indicating that the "/" is replaced with "_".

Improvement

[PUBDEV-3941] - Added support for autoencoder POJOs in in the EasyPredictModelWrapper.
[PUBDEV-4269] - H2O now warns the user about the minimal required Colorama version in case of python client. Note that the current minimum version is 0.3.8.
[PUBDEV-4537] - Removed deprecation warnings from the H2O build.
[PUBDEV-4548] - Moved the initialization of XGBoost into the H2O core extension.

Docs

[PUBDEV-4515] - Added a link to paper describing balance classes in the balance_classes parameter topic.
[PUBDEV-4610] - Removed `laplace`, `huber`, and `quantile` from list of supported distributions in the XGBoost documentation.
[PUBDEV-4612] - Add heuristics to the FAQ > General Troubleshooting topic.

Vajda (3.10.5.2) - 6/19/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/2/index.html

Bug

[PUBDEV-3860] - In PCA, fixed an issue that resulted in errors when specifying `pca_method=glrm` on wide datasets. In addition, the GLRM algorithm can now be used with wide datasets.
[PUBDEV-4416] - Fixed issues with streamParse in ORC parser that caused a NullPointerException when parsing multifile from Hive.
[PUBDEV-4438] - Fixed an issue that occurred with H2O data frame indexing for large indices that resulted in off-by-one errors. Now, when indexing is set to a value greater than 1000, indexing between left and right sides is no longer inconsistent.
[PUBDEV-4456] - In DRF, fixed an issue that resulted in an AssertionError when run on certain datasets with weights.
[PUBDEV-4579] - Removed an incorrect Python example from the Sparkling Water booklet. Python users must start Spark using the H2O pysparkling egg on the Python path. Using `--package` when running the pysparkling app is not advised, as the pysparkling distribution already contains the required jar file.
[PUBDEV-4594] - In GLM fixed an issue that caused a Runtime exception when specifying the quasibinomial family with `nfold = 2`.

New Feature

[PUBDEV-3624] - Added top an bottom N functions, which allow users to grab the top or bottom N percent of a numerical column. The returned frame contains the original row indices of the top/bottom N percent values extracted into the second column.
[PUBDEV-4096] - When building Stacked Ensembles in R, the base_models parameter can accept models rather than just model IDs. Updated the documentation in the User Guide for the base_models parameter to indicate this.
[PUBDEV-4523] - Added the following new GBM and DRF parameters to the User Guide: `calibrate_frame` and `calibrate_model`.

Improvement

[PUBDEV-4531] - Improved PredictCsv.java as follows:
- Enabled PredictCsv.java to accept arbitrary separator characters in the input dataset file if the user includes the optional flag `--separator` in the input arguments. If a user enters a special Java character as the separator, then H2O will add "\".
- Enabled PredictCsv.java to perform setConvertInvalidNumbersToNa(setInvNumNA)) if the optional flag `--setConvertInvalidNum` is included in the input arguments.
[PUBDEV-4578] - Fixed the R package so that a "browseURL" NOTE no longer appears.
[PUBDEV-4583] - In the R package documentation, improved the description of the GLM `alpha` parameter.

Docs

[PUBDEV-4524] - In the "Using Flow - H2O’s Web UI" section of the User Guide, updated the Viewing Models topic to include that users can download the h2o-genmodel.jar file when viewing models in Flow.
[PUBDEV-4549] - The `group_by` function accepts a number of aggregate options, which were documented in the User Guide and in the Python package documentation. These aggregate options are now described in the R package documentation.
[PUBDEV-4575] - Added an initial XGBoost topic to the User Guide. Note that this is still a work in progress.

Vajda (3.10.5.1) - 6/9/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/1/index.html

Technical Task

[PUBDEV-1584] - Fixed a GLM persist test.
[PUBDEV-4124] - Disabled R tests for OS X.

Bug

[PUBDEV-1457] - PCA no longer reports incorrect values when multiple eigenvectors exist.
[PUBDEV-1571] - Users can now specify the weights_column as a numeric index in R.
[PUBDEV-1578] - Fixed an issue that caused GLM models returned by h2o.glm() and h2o.getModel(..) to be different.
[PUBDEV-1616] - Fixed an issue that caused PCA with GLRM to display incorrect results on data.
[PUBDEV-2286] - Fixed an issue that caused `df.show(any_int)` to always display 10 rows.
[PUBDEV-2415] - Starting an H2O cloud from R no longer results in "Error in as.numeric(x["max_mem"]) : (list) object cannot be coerced to type 'double'"
[PUBDEV-2656] - `h2o::ifelse` now handles NA values the same way that `base::ifelse` does.
[PUBDEV-2715] - Fixed an issue in PCA that resulted in incorrect standard deviation and components results for non standardized data.
[PUBDEV-2759] - When performing a grid search with a `fold_assignment` specified and with `cross_validation` disabled, Python unit tests now display a Java error message. This is because a fold assignment is meaningless without cross validation.
[PUBDEV-2816] - The Python `h2o.get_grid()` function is now in the base h2o object, allowing you to use it the same way as `h2o.get_model()`, `h2o.get_frame()` etc.
[PUBDEV-3196] - The `.mean()` function can now be applied to a row in `H2OFrame.apply()`.
[PUBDEV-3350] - Fixed an issue that caused a negative value to display in the H2O cluster version.
[PUBDEV-3396] - GLM now checks to see if a response is encoded as a factor and warns the user if it is not.
[PUBDEV-3470] - Fixed an issue that resulted in an `h2o.init()` fail message even though the server had actually been started. As a result, H2O did not shutdown automatically upon exit.
[PUBDEV-3502] - Fixed an issue that caused PCA to hang when run on a wide dataset using the Randomized `pca_method`. Note that it is still not recommended to use Randomized with wide datasets.
[PUBDEV-3520] - `h2o.setLevels` now works correctly when wrapped into invisible.
[PUBDEV-3651] - Added a dependency for the roxygen2 package.
[PUBDEV-3711] - `h2o.coef` in R is now functional for multinomial models.
[PUBDEV-3729] - When converting a column to `type = string` with `.ascharacter()` in Python, the `structure` method now correctly recognizes the change.
[PUBDEV-3759] - Fixed an issue that caused GBM Grid Search to hang.
[PUBDEV-3777] - Subset h2o frame now allows 0 row subset - just as data.frame.
[PUBDEV-3815] - Fixed an issue that caused the R `apply` method to fail to work with `h2o.var()`.
[PUBDEV-3859] - PCA no longer reports errors when using PCA on wide datasets with `pca_method = Randomized`. Note that it is still not recommended to use Randomized with wide datasets.
[PUBDEV-3900] - Jenkins builds no longer all share the same R package directory, and new H2O R libraries are installed during testing.
[PUBDEV-3905] - When trimming is done, H2O now checks if it passes the beginning of the string. This check prevents the code from going further down the memory with negative indexes.
[PUBDEV-3973] - Stacked Ensembles no longer fails when the `fold_assignment` for base learners is not `Modulo`.
[PUBDEV-3988] - Fixed an issue that caused H2O to generate invalid code in POJO for PCA/SVM.
[PUBDEV-4079] - Instead of using random charset for getting bytes from strings, the source code now centralizes "byte extraction" in StringUtils. This prevents different build machines from using different default encoders.
[PUBDEV-4090] - When performing a Random Hyperparameter Search, if the model parameter seed is set to the default value but a search_criteria seed is not, then the model parameter seed will now be set to search_criteria seed+0, 1, 2, ..., model_number. Seeding the built models makes random hyperparameter searches more repeatable.
[PUBDEV-4100] - Fixed a bad link that was included in the "A K/V Store for In-Memory Analytics, Part 2" blog.
[PUBDEV-4138] - Comments are now permitted in Content-Type header for application/json mime type. As a result, specifying content-type charset no longer results in the request body being ignored.
[PUBDEV-4143] - Improved the Python `group_by` option count column name to match the R client.
[PUBDEV-4146] - Fixed broken links in the "Hacking Algorithms into H2O" blog post.
[PUBDEV-4156] - The Python API now provides a method to extract parameters from `cluster_status`.
[PUBDEV-4171] - Fixed incorrect parsing of input parameters. Previously, system property parsing logic added the value of any system property other than "ga_opt_out" to the arguments list if a property was prefixed with "ai.h2o.". This caused an attempt to parse the value of a system property as if it were itself a system property and at times resulted in an "Unknown Argument" error.
[PUBDEV-4174] - Fixed intermittent pyunit_javapredict_dynamic_data_paramsDR.
[PUBDEV-4177] - Fixed orc parser test by setting timezone to local time.
[PUBDEV-4185] - H2O can now correctly handle preflight OPTIONS calls - specifically in the event of a (1) CORS request and (2) the request has a content type other than text/plain, application/x-www-form-urlencoded, or multipart/form-data.
[PUBDEV-4202] - In the REST API, POST of application/json requests no longer fails if requests expect required fields.
[PUBDEV-4216] - The R client `impute` function now checks for categorical values and returns an error if none exist.
[PUBDEV-4231] - Fixed a filepath issue that occurred on Windows 7 systems when specifying a network drive.
[PUBDEV-4234] - Added a response column to Stacked Ensembles so that it can be exposed in the Flow UI.
[PUBDEV-4235] - Updated the list of required packages on the H2O download page for the Python client.
[PUBDEV-4250] - Updated the header in the Confusion Matrix to make the list of actual vs predicted values more clear.
[PUBDEV-4300] - Explicit 1-hot encoding in FrameUtils no longer generates an invalid order of column names. MissingLevel is now the last column.
[PUBDEV-4304] - Fixed an issue that caused ModelBuilder to leak xval frames if hyperparameter errors existed.
[PUBDEV-4311] - Fixed an issue that caused PCA model output to fail to display the Importance of Components.
[PUBDEV-4314] - When using the H2O Python client, the varimp() function can now be used in PCA to retrieve the Importance of Components details.
[PUBDEV-4315] - Fixed an issue that caused an ArrayIndexOutOfBoundsException in GLM.
[PUBDEV-4316] - When a main model is cloned to create the CV models, clearValidationMessages() is now called. Messages are no longer all thrown into a single bucket, which previously caused confusion with the `error_count()`.
[PUBDEV-4317] - ModelBuilder.message(...) now correctly bumps the error count when the message is an error.
[PUBDEV-4319] - Fixed an issue with unseen categorical levels handling in GLM scoring. Prediction with "skip" missing value handling in GLM with more than one variable no longer fails.
[PUBDEV-4321] - ModelMetricsRegression._mean_residual_deviance is now exposed. For all algorithms except GLM, this is the mean residual deviance. For GLM, this is the total residual deviance.
[PUBDEV-4326] - Fixed an issue that caused the`~` operator to fail when used in the Python client. Now, all logical operators set their results as Boolean.
[PUBDEV-4328] - Fixed an issue that caused an assertion error in GLM.
[PUBDEV-4330] - In GLM, fixed an issue that caused GLM to fail when `quasibinomial` was specified with a link other than the default. Specifying an incorrect link for the quasibinomial family will now result in an error message.
[PUBDEV-4350] - Improved the doc strings for `sample_rate_per_class` in R and Python.
[PUBDEV-4351] - Fixed a bug in the cosine distance formula.
[PUBDEV-4352] - Fixed an issue with CBSChunk set with long argument.
[PUBDEV-4363] - C0DChunk with con == NaN now works with strings.
[PUBDEV-4378] - When retrieving a Variable Importance plot using the H2O Python client, the default number of features shown is now 10 (or all if < 10 exist). Also reduced the top and bottom margins of the Y axis.
[PUBDEV-4381] - When retrieving a Variable Importance plot using the H2O R client, the default number of features shown is now 10 (or all if < 10 exist).
[PUBDEV-4416] - Fixed an ORC stream parse.
[PUBDEV-4429] - Appended constant string to frame.
[PUBDEV-4495] - Fixed an issue with the View Log option in Flow.
[PUBDEV-4499] - The h2o.deepwater.available function is now working in the R API.
[PUBDEV-4542] - Fixed a bug with Log.info that resulted in bypassing log initialization.
[PUBDEV-4543] - LogsHandler now checks whether logging on specific level is enabled before accessing the particular log.
[PUBDEV-4546] - Fixed a logging issue that caused PID values to be set to an incorrect value. H2O now initializes PID before we initializing SELF_ADDRESS. This change was necessary because initialization of SELF_ADDRESS triggers buffered logged messages to be logged, and PID is part of the log header.

Epic

[PUBDEV-3367] - Added supported for iSAX 2.0. This algorithm is a time series indexing strategy that reduces the dimensionality of a time series along the time axis. For example, if a time series had 1000 unique values with data across 500 rows, you can reduce this dataset to a time series that uses 100 unique values across 10 buckets along the time span. The following demos are available for more information:
- Python - https://github.com/h2oai/h2o-3/blob/master/h2o-py/demos/isax2.ipynb
- R - https://github.com/h2oai/h2o-3/blob/master/h2o-r/demos/isax.R

New Feature

[PUBDEV-47] - Generate R bindings now available for REST API.
[PUBDEV-103] - Flow: Implemented test infrastructure for Jenkins/CI.
[PUBDEV-525] - The R client now reports to the user when memory limits have been exceeded.
[PUBDEV-2022] - Added support to impute missing elements for RandomForest.
[PUBDEV-2348] - Added a probability calibration plot function.
[PUBDEV-2535] - A new h2o.pivot() function is available to allow pivoting of tables.
[PUBDEV-3666] - MOJO support has been extended to K-Means models.
[PUBDEV-3840] - Added two new options in GBM and DRF: `calibrate_model` and `calibrate_frame`. These flags allow you to retrieve calibrated probabilities for binary classification problems.
[PUBDEV-3850] - In Stacked Ensembles, added support for passing in models instead of model IDs when using the R client.
[PUBDEV-3970] - Added support for saving and loading binary Stacked Ensemble models.
[PUBDEV-4104] - Added support for idxmax, idxmin in Python H2OFrame to get an index of max/min values.
[PUBDEV-4105] - Added support for which.max, which.min support for R H2OFrame to get an index of max/min values.
[PUBDEV-4134] - A new h2o.sort() function is available in the H2O Python client. This returns a new Frame that is sorted by column(s) in ascending order. The column(s) to sort by can be either a single column name, a list of column names, or a list of column indices.
[PUBDEV-4147] - Word2vec can now be used with the H2O Python client.
[PUBDEV-4151] - Missing values are filled sequentially for time series data.
[PUBDEV-4168] - Enabled cors option flag behind the sys.ai.h2o. prefix for debugging.
[PUBDEV-4266] - Added support for converting a Word2vec model to a Frame.
[PUBDEV-4280] - Created a Capability rest end point that gives the client an overview of registered extensions.
[PUBDEV-4329] - When viewing a model in Flow, a new **Download Gen Model** button is available, allowing you to save the h2o-genmodel.jar file locally.
[PUBDEV-4425] - Added an `h2o.flow()` function to base H2O. This allows users to open up a Flow window from within R and Python.
[PUBDEV-4472] - The `parse_type` parameter is now case insensitive.
[PUBDEV-4478] - Added automatic reduction of categorical levels for Aggregator. This can be done by setting `categorical_encoding=EnumLimited`.
[NA] - In GBM and DRF, added two new categorical_encoding schemas: SortByResponse and LabelEncoding. More information about these options is available here.

Story

[PUBDEV-3927] - Added support for Leave One Covariate Out (LOCO). This calculates row-wise variable importances by re-scoring a trained supervised model and measuring the impact of setting each variable to missing or its most central value (mean or median & mode for categoricals).
[PUBDEV-4049] - Removed support for Java 6.
[PUBDEV-4274] - Integrated XGBoost with H2O core as a separate extension module.

Task

[PUBDEV-4062] - Users can now run predictions in R using a MOJO or POJO without running h2o running.
[PUBDEV-4087] - Created a test to verify that random grid search honors the `max_runtime_secs` parameter.
[PUBDEV-4193] - Removed javaMess.txt from scripts
[PUBDEV-4238] - A new `node()` function is available for retrieving node information from an H2O Cluster.
[PUBDEV-4353] - Improved the R/Py doc strings for the `sample_rate_per_class` parameter.
[PUBDEV-4412] - Users can now optionally build h2o.jar with a visualization data server using the following: `./gradlew -PwithVisDataServer=true -PvisDataServerVersion=3.14.0 :h2o-assemblies:main:projects`
[PUBDEV-4454] - Removed support for the following Hadoop platforms: CDH 5.2, CDH 5.3, and HDP 2.1.
[PUBDEV-4466] - Added the ability to go from String to Enum in PojoUtils.
[PUBDEV-4479 - Continued modularization of H2O by removing reflections utils and replace them by SPI.
[PUBDEV-4481] - Removed the deprecated `h2o.importURL` function from the R API.
[PUBDEV-4490] - Stacked Ensembles now removes any unnecessary frames, vecs, and models that were produced when compiled.
[PUBDEV-4494] - Updated R and Python doc strings to indicate that users can save and load Stacked Ensemble binary models. In the User Guide, updated the FAQ that previously indicated users could not save and load stacked ensemble models.

Improvement

[PUBDEV-3088] - Improved error handling when users receive the following error: `Error: lexical error: invalid char in json text.
[PUBDEV-3500] - In PCA, when the user specifies a value for k that is <=0, then all principal components will automatically be calculated.
[PUBDEV-3908] - Exposed metalearner and base model keys in R/Py StackedEnsemble object.
[PUBDEV-4072] - The `h2o.download_pojo()` function now accepts a `jar_name` parameter, allowing users to create custom names for the downloaded file.
[PUBDEV-4103] - Added port and ip details to the error logs for h2o cloud.
[PUBDEV-4141] - When using Hadoop with SSL Internode Security, the `-internal_security` flag is now deprecated in favor of the `-internal_security_conf` flag.
[PUBDEV-4169] - Scala version of udf now serializes properly in multinode.
[PUBDEV-4181] - Fixed an NPM warn message.
[PUBDEV-4184] - Updated the documentation for using H2O with Anaconda and included an end-to-end example.
[PUBDEV-4190] - Arguments in h2o.naiveBayes in R are now the same as Python/Java.
[PUBDEV-4207] - StackedEnsembles is now stable vs. experimental.
[PUBDEV-4256] - Introduced latest_stable_R and latest_stable_py links, making it easy to point users to the current stable version of H2O for Python and R.
[PUBDEV-4267] - In the R client, the default for `nthreads` is now -1. The documentation examples have been updated to reflect this change.
[PUBDEV-4307] - ModelMetrics can sort models by a different Frame.
[PUBDEV-4331] - The application type is now reported in YARN manager, and H2O now overrides the default MapReduce type to H2O type.
[PUBDEV-4419] - Added a title option to PrintMOJO utility
[PUBDEV-4431] - Flow now uses ip:port for identifying the node as part of LogHandler.
[PUBDEV-4465] - Reduced the frequency of Hadoop heartbeat logging.
[PUBDEV-4484] - In GLM, quasibinomial models produce binomial metrics when scoring.
[PUBDEV-4492] - Implemented methods to get registered H2O capabilities in Python client.
[PUBDEV-4493] - Implemented methods to get registered H2O capabilities in R client.
[PUBDEV-4498] - Upgraded Flow to version 0.7.0
[PUBDEV-4511] - Removed the `selection_strategy` argument from Stacked Ensembles.
[PUBDEV-4533] - In Stacked Ensembles, added support for passing in models instead of model IDs when using the Python client.
[PUBDEV-4536] - Provided a file that contains a list of licenses for each H2O dependency. This can be acquired using com.github.hierynomus.license.
[PUBDEV-4540] - H2O now explicitly checks if the port and baseport is within allowed port range.

Docs

[PUBDEV-2864] - Added documentation describing how to call Rapids expressions from Flow.
[PUBDEV-3944] - Added parameter descriptions for Naive Bayes parameter.
[PUBDEV-3945] - Added examples for Naive Bayes parameter.
[PUBDEV-4075] - Added `label_encoder` and `sort_by_response` to the list of available `categorical_encoding` options.
[PUBDEV-4095] - Added support for KMeans in MOJO documentation.
[PUBDEV-4078] - Added a topic to the Data Manipulation section describing the `group_by` function.
[PUBDEV-4140] - In the Productionizing H2O section of the User Guide, added an example showing how to read a MOJO as a resource from a jar file.
[PUBDEV-4182] - Improved the R and Python documentation for coef() and coef_norm().
[PUBDEV-4183] - In the GLM section of the User Guide, added a topic describing how to extract coefficient table information. This new topic includes Python and R examples.
[PUBDEV-4184] - Added information about Anaconda support to the User Guide. Also included an IPython Notebook example.
[PUBDEV-4194] - Added Word2vec to list of supported algorithms on docs.h2o.ai.
[PUBDEV-4201] - Uncluttered the H2O User Guide. Combined serveral topics on the left navigation/TOC. Some changes include the following:
- Moved AWS, Azure, DSX, and Nimbix to a new Cloud Integration section.
- Added a new **Getting Data into H2O** topic and moved the Supported File Formats and Data Sources topics into this.
- Moved POJO/MOJO topic into the **Productionizing H2O** section.
[PUBDEV-4206] - In the Security topic of the User Guide, added a section about using H2O with PAM authentication.
[PUBDEV-4211] - Documentation for `h2o.download_all_logs()` now informs the user that the supplied file name must include the .zip extension.
[PUBDEV-4218 - Added an FAQ describing how to use third-party plotting libraries to plot metrics in the H2O Python client. This faq is available in the FAQ > Python topic.
[PUBDEV-4230] - Added an "Authentication Options" section to **Starting H2O > From the Command Line**. This section describes the options that can be set for all available supported authentication types. This section also includes flags for setting the newly supported Pluggable Authentication Module (PAM) authentication as well as Form Authentication and Session timeouts for H2O Flow.
[PUBDEV-4232] - Updated documentation to indicate that Word2vec is now supported for Python.
[PUBDEV-4253] - Added support for HDP 2.6 in the Hadoop Users section.
[PUBDEV-4258] - Added two FAQs within the GLM section describing why H2O's glm differs from R's glm and the steps to take to get the two to match. These FAQs are available in the GLM > FAQ section.
[PUBDEV-4268] - Updated R examples in the User Guide to reflect that the default value for `nthreads` is now -1.
[PUBDEV-4281] - Updated the POJO Quick Start markdown file and Javadoc.
[PUBDEV-4290] - Added the `-principal` keyword to the list of Hadoop launch parameters.
[PUBDEV-4294] - In the Deep Learning topic, deleted the Algorithm section. The information included in that section has been moved into the Deep Learning FAQ.
[PUBDEV-4297] - Documented support for using H2O with Microsoft Azure Linux Data Science VM. Note that this is currently still a BETA feature.
[PUBDEV-4309] - Added an FAQ describing YARN resource usage. This FAQ is available in the FAQ > Hadoop topic.
[PUBDEV-4336] - Added parameter descriptions for PCA parameters.
[PUBDEV-4337] - Added examples for PCA parameters.
[PUBDEV-4348] - A new h2o.sort() function is available in the H2O Python client. This returns a new Frame that is sorted by column(s) in ascending order. The column(s) to sort by can be either a single column name, a list of column names, or a list of column indices. Information about this function is available in the Python and R documentation.
[PUBDEV-4349] - Updated the "Using H2O with Microsoft Azure" topics.
[PUBDEV-4362] - Updated the "What is H2O" section in each booklet.
[PUBDEV-4387] - A Deep Water booklet is now available. A link to this booklet is on docs.h2o.ai.
[PUBDEV-4396] - Updated GLM documentation to indicate that GLM supports both multinomial and binomial handling of categorical values.
[PUBDEV-4397] - Added an FAQ describing the steps to take if a user encounters a "Server error - server 127.0.0.1 is unreachable at this moment" message. This FAQ is available in the FAQ > R topic.
[PUBDEV-4401] - Fixed documentation that described estimating in K-means.
[PUBDEV-4403] - Updated the documentation that described how to download a model in Flow.
[PUBDEV-4444] - The Data Sources topic, which describes that data can come from local file system, S3, HDFS, and JDBC, now also includes that data can be imported by specifying the URL of a file.
[PUBDEV-4467] - H2O now supports GPUs. Updated the FAQ that indicated we do not, and added a pointer to Deep Water.

Ueno (3.10.4.8) - 5/21/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/8/index.html

Bug

[PUBDEV-4123] - Python: Frame summary does not return Python object
[PUBDEV-4315] - AIOOB with GLM
[PUBDEV-4330] - glm : quasi binomial with link other than default causes an h2o crash

Improvement

[PUBDEV-4332] - Create new /3/SteamMetrics REST API endpoint
[PUBDEV-4436] - Steam hadoop user impersonation

Ueno (3.10.4.7) - 5/8/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/7/index.html

Bug

[PUBDEV-4392] - h2o on yarn: H2O does not respect the cloud name in case of flatfile mode

Ueno (3.10.4.6) - 4/26/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/6/index.html

Bug

[PUBDEV-4265] - Problem with h2o.uploadFile on Windows
[PUBDEV-4339] - glm: get AIOOB exception on attached data
[PUBDEV-4341] - External cluster always reports ""Timeout for confirmation exceeded!"

Ueno (3.10.4.5) - 4/19/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/5/index.html

Bug

[PUBDEV-4293] - Problem with h2o.merge in python
[PUBDEV-4306] - Failing SVM parse
[PUBDEV-4308] - Rollups computation errors sometimes get wrapped in a unhelpful exception and the original cause is hidden.

Ueno (3.10.4.4) - 4/15/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/4/index.html

Technical task

[PUBDEV-4244] - Add documentation on how to create a config file

Bug

[PUBDEV-2807] - PCA Rotations not displayed in Python API
[PUBDEV-4081] - Sparse matrix cannot be converted to H2O
[PUBDEV-4229] - Flow/Schema problem, predicting on frame without response returns empty model metrics
[PUBDEV-4246] - Proportion of variance in GLRM for single component has a value > 1
[PUBDEV-4251] - HDP 2.6 add to the build
[PUBDEV-4252] - Set timeout for read/write confirmation in ExternalFrameWriter/ExternalFrameReader
[PUBDEV-4261] - GLM default solver gets AIIOB when run on dataset with 1 categorical variable and no intercept
[PUBDEV-4285] - Correct exit status reporting ( when running on YARN )
[PUBDEV-4287] - Documentation: Update GLM FAQ and missing_values_handling parameter regarding unseen categorical values

New Feature

[PUBDEV-4175] - H2O Flow UI Authentication
[PUBDEV-4226] - Implement session timeout for Flow logins
[PUBDEV-4289] - Document a new parameters for h2odriver.

Task

[PUBDEV-4180] - Wrap R examples in code so that they don't run on Mac OS
[PUBDEV-4215] - Export polygon function to fix CRAN note in h2o R package
[PUBDEV-4248] - Add a parameter that ignores the config file reader when h2o.init() is called

Improvement

[PUBDEV-4239] - Extend Watchdog client extension so cluster is also stopped when the client doesn't connect in specified timeout
[PUBDEV-4288] - Set hadoop user from h2odriver

Ueno (3.10.4.3) - 3/31/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/3/index.html

Bug

[PUBDEV-3281] - ARFF parser parses attached file incorrectly
[PUBDEV-4097] - Proxy warning message displays proxy with username and password.
[PUBDEV-4165] - h2o.import_sql_table works in R but on python gives error
[PUBDEV-4167] - java.lang.IllegalArgumentException with PCA
[PUBDEV-4187] - Impute does not handle catgoricals when values is specified
[PUBDEV-4219] - Increase number of bins in partial plots

New Feature

[PUBDEV-4162] - h2o.transform can produce incorrect aggregated sentence embeddings

Improvement

[PUBDEV-3858] - Errors with PCA on wide data for pca_method = Power
[PUBDEV-4102] - Introduce mode in which failure of H2O client ensures whole H2O clouds goes down
[PUBDEV-4178] - Add support for IBM IOP 4.2
[PUBDEV-4186] - Placeholder for: [SW-334]
[PUBDEV-4191] - Remove minor version from hadoop distribution in buildinfo.json file

Ueno (3.10.4.2) - 3/18/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/2/index.html

Bug

[PUBDEV-4119] - Deep Learning: mini_batch_size >>> 1 causes OOM issues
[PUBDEV-4135] - head(df) and tail(df) results in R are inconsistent for datetime columns
[PUBDEV-4144] - GLM with family = multinomial, intercept=false, and weights or SkipMissing produces error
[PUBDEV-4155] - glm hot fix: fix model.score0 for multinomial

New Feature

[PUBDEV-4133] - Add option to specify a port range for the Hadoop driver callback
[PUBDEV-4139] - Support reading MOJO from a classpath resource

Improvement

[PUBDEV-4056] - Arff Parser doesn't recognize spaces in @attribute
[PUBDEV-4099] - How to generate Precision Recall AUC (PRAUC) from the scala code

Docs

[PUBDEV-3977] - Documentation: Add documentation for word2vec
[PUBDEV-4118] - Documentation: Add topic for using with IBM Data Science Experience
[PUBDEV-4149] - Document "driverportrange" option of H2O's Hadoop driver

Ueno (3.10.4.1) - 3/3/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-ueno/1/index.html

Technical task

[PUBDEV-3943] - Documentation: Naive Bayes links to parameters section

Bug

[PUBDEV-3817] - Error in predict, performance functions caused by fold_column
[PUBDEV-3820] - Kmeans Centroid info not Rendered through Python API
[PUBDEV-3827] - PCA "Importance of Components" returns "data frame with 0 columns and 0 rows"
[PUBDEV-3866] - Stratified sampling does not split minority class
[PUBDEV-3885] - R Kmean's user_point doesn't get used
[PUBDEV-3903] - Setting -context_path doesn't change REST API path
[PUBDEV-3932] - K-means Training Metrics do not match Prediction Metrics with same data
[PUBDEV-3938] - h2o-py/tests/testdir_hdfs/pyunit_INTERNAL_HDFS_timestamp_date_orc.py failing
[PUBDEV-4017] - gradle update broke the build
[PUBDEV-4019] - H2O config (~/.h2oconfig) should allow user to specify username and password
[PUBDEV-4032] - Flow/R/Python - H2O cloudInfo should show if cluster is secured or not
[PUBDEV-4039] - FLOW fails to display custom models including Word2Vec
[PUBDEV-4040] - Import json module as different alias in Python API
[PUBDEV-4041] - Stacked Ensemble docstring example is broken
[PUBDEV-4042] - The autogen R bindings have an incorrect definition for the y argument
[PUBDEV-4047] - AIOOB while training an H2OKMeansEstimator
[PUBDEV-4065] - Fix bug in randomgridsearch and Fix intermittent pyunit_gbm_random_grid_large.py
[PUBDEV-4066] - Typos in Stacked Ensemble Python H2O User Guide example code
[PUBDEV-4073] - StackedEnsemble: stacking fails if combined with ignore_columns
[PUBDEV-4083] - AIOOB in GLM

New Feature

[PUBDEV-3852] - Documentation: Add Data Munging topic for file name globbing
[PUBDEV-4009] - Integration to add new top-level Plot menu to Flow
[PUBDEV-4038] - Add stddev to PDP computation

Task

[PUBDEV-3685] - Update h2o-py README
[PUBDEV-3797] - Generate Python API tests for H2O Cluster commands
[PUBDEV-3914] - Add documentation for python GroupBy class
[PUBDEV-3915] - Document python's Assembly and ConfusionMatrix classes, add python API tests as well
[PUBDEV-3937] - Clean up R docs
[PUBDEV-3986] - Documentation: Summarize the method for estimating k in kmeans and add to docs
[PUBDEV-4006] - Update links to Stacking on docs.h2o.ai
[PUBDEV-4021] - H2O config (~/.h2oconfig) should allow user to specify username and password
[PUBDEV-4067] - Check if strict_version_check is TRUE when checking for config file

Improvement

[PUBDEV-3781] - Documentation: Add info about sparse data support
[PUBDEV-3784] - h2o doc deeplearning: clarify what the (heuristics)defaults for auto are in categorical_encoding
[PUBDEV-3919] - Saving/serializing currently existing, detailed model information
[PUBDEV-3961] - Py/R: Remove unused 'cluster_id' parameter
[PUBDEV-3983] - Update GBM FAQ
[PUBDEV-3994] - Documentation: Add info about imputing data in Flow and in Data Manipulation
[PUBDEV-3998] - Documentation: Add instructions for running demos
[PUBDEV-4005] - AIOOB Exception with fold_column set with kmeans
[PUBDEV-4055] - Modify h2o#connect function to accept config with connect_params field
[PUBDEV-4059] - Change of h2o.connect(config) interface to support Steam

Tverberg (3.10.3.5) - 2/16/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/5/index.html

Bug

[PUBDEV-3848] - GLM with interaction parameter and cross-validation cause Exception
[PUBDEV-3916] - pca: hangs on attached data
[PUBDEV-3964] - StepOutOfRangeException when building GBM model
[PUBDEV-3976] - py unique() returns frame of integers (since epoch) instead of frame of unique dates
[PUBDEV-3979] - py date comparisons don't work for rows > 1
[PUBDEV-3980] - AstUnique drops column types
[PUBDEV-4013] - In R, the confusion matrix at the end doesn’t say: vertical: actual, across: predicted
[PUBDEV-4014] - AIOOB in GLM with hex.DataInfo.getCategoricalId(DataInfo.java:952) is the error with 2 fold cross validation
[PUBDEV-4036] - Parse fails when trying to parse large number of Parquet files
[HEXDEV-683] - POJO doesn't include Forest classes
[PUBDEV-4044] - moment producing wrong dates

Tverberg (3.10.3.4) - 2/3/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/4/index.html

Bug

[PUBDEV-3965] - Importing data in python returns error - TypeError: expected string or bytes-like object

Tverberg (3.10.3.3) - 2/2/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/3/index.html

Bug

[PUBDEV-3835] - Standard Errors in GLM: calculating and showing specifically when called

Improvement

[PUBDEV-3989] - Decrease size of h2o.jar

Tverberg (3.10.3.2) - 1/31/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/2/index.html

Bug

Hotfix: Remove StackedEnsemble from Flow UI. Training is only supported from Python and R interfaces. Viewing is supported in the Flow UI.

Tverberg (3.10.3.1) - 1/30/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/1/index.html

Bug

[PUBDEV-2464] - Using asfactor() in Python client cannot allocate to a variable
[PUBDEV-3111] - R API's h2o.interaction() does not use destination_frame argument
[PUBDEV-3694] - Errors with PCA on wide data for pca_method = GramSVD which is the default
[PUBDEV-3742] - StackedEnsemble should work for regression
[PUBDEV-3865] - h2o gbm : for an unseen categorical level, discrepancy in predictions when score using h2o vs pojo/mojo
[PUBDEV-3883] - Negative indexing for H2OFrame is buggy in R API
[PUBDEV-3894] - Relational operators don't work properly with time columns.
[PUBDEV-3966] - java.lang.AssertionError when using h2o.makeGLMModel

Story

[PUBDEV-3739] - StackedEnsemble: put ensemble creation into the back end

New Feature

[PUBDEV-2058] - Implement word2vec in h2o
[PUBDEV-3635] - Ability to Select Columns for PDP computation in Flow
[PUBDEV-3881] - Add PCA Estimator documentation to Python API Docs
[PUBDEV-3902] - Documentation: Add information about Azure support to H2O User Guide (Beta)

Task

[PUBDEV-3336] - h2o.create_frame(): if randomize=True, `value` param cannot be used
[PUBDEV-3740] - REST: implement simple ensemble generation API
[PUBDEV-3843] - Modify R REST API to always return binary data
[PUBDEV-3844] - Safe GET calls for POJO/MOJO/genmodel
[PUBDEV-3864] - Import files by pattern
[PUBDEV-3884] - StackedEnsemble: Add to online documentation
[PUBDEV-3940] - Add Stacked Ensemble code examples to R docs

Improvement

[PUBDEV-3257] - Documentation: As a K-Means user, I want to be able to better understand the parameters
[PUBDEV-3741] - StackedEnsemble: add tests in R and Python to ensure that a StackedEnsemble performs at least as well as the base_models
[PUBDEV-3857] - Clean up the generated Python docs
[PUBDEV-3895] - Filter H2OFrame on pandas dates and time (python)
[PUBDEV-3912] - Provide way to specify context_path via Python/R h2o.init methods
[PUBDEV-3933] - Modify gen_R.py for Stacked Ensemble
[PUBDEV-3972] - Add Stacked Ensemble code examples to Python docstrings

Tutte (3.10.2.2) - 1/12/2017

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tutte/2/index.html

Bug

[PUBDEV-3876] - Enable HDFS-like filesystems

Task

[PUBDEV-3816] - import functions required for r-release check

Tutte (3.10.2.1) - 12/22/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tutte/1/index.html

Bug

[PUBDEV-3291] - Summary() doesn't update stats values when asfactor() is applied
[PUBDEV-3498] - rectangular assign to a categorical column does not work (should be possible to assign either an existing level, or a new one)
[PUBDEV-3618] - Numerical Column Names in H2O and R
[PUBDEV-3690] - pred_noise_bandwidth parameter is not reproducible with seed
[PUBDEV-3723] - Fix mktime() referencing from 0 base to 1 base for month and day
[PUBDEV-3728] - Binary loss functions return error in GLRM
[PUBDEV-3747] - python hist() plotted bars overlap
[PUBDEV-3750] - Python set_levels doesn't change other methods
[PUBDEV-3753] - h2o doc: glm grid search hyper parameters missing/incorrect listing. Presently glrm's is marked as glm's
[PUBDEV-3764] - Partial Plot incorrectly calculates for constant categorical column
[PUBDEV-3778] - h2o.proj_archetypes returns error if constant column is dropped in GLRM model
[PUBDEV-3788] - GLRM loss by col produces error if constant columns are dropped
[PUBDEV-3796] - isna() overwrites column names
[PUBDEV-3812] - NullPointerException with Quantile GBM, cross validation, & sample_rate < 1
[PUBDEV-3819] - R h2o.download_mojo broken - writes a 1 byte file
[PUBDEV-3831] - Seed definition incorrect in R API for RF, GBM, GLM, NB
[PUBDEV-3834] - h2o.glm: get AIOOB exception with xval and lambda search

New Feature

[PUBDEV-3482] - Supporting GLM binomial model to allow two arbitrary integer values
[PUBDEV-3376] - Implement ISAX calculations per ISAX word
[PUBDEV-3377] - Optimizations and final fixes for ISAX
[PUBDEV-3664] - Implement GLM MOJO
[PUBDEV-3501] - Variance metrics are missing from GLRM that are available in PCA
[PUBDEV-3541] - py h2o.as_list() should not return headers
[PUBDEV-3715] - Modify sum() calculation to work on rows or columns
[PUBDEV-3737] - make sure that the generated R bindings work with StackedEnsemble
[PUBDEV-3833] - Add HDP 2.5 Support

Task

[PUBDEV-3012] - Remove grid.sort_by method in Python API
[PUBDEV-3695] - Documentation: Add GLM to list of algorithms that support MOJOs
[PUBDEV-3791] - Documentation: Add quasibinomomial family in GLM
[PUBDEV-3676] - Add SLURM cluster documentation
[PUBDEV-3692] - Add memory check for GLRM before proceeding
[PUBDEV-3765] - Check to make sure hinge loss works for GLRM
[PUBDEV-3803] - Add parameters from _upload_python_object to H2OFrame constructor
[PUBDEV-3804] - Refer to .h2o.jar.env when detaching R package
[PUBDEV-3805] - Call on proper port when exiting R/detaching package
[PUBDEV-3806] - Modify search for config file in R api
[PUBDEV-3818] - properly handle url in R docs from autogen

Improvement

[PUBDEV-3256] - Documentation: As a GLM user, I want to be able to better understand the parameters
[PUBDEV-3758] - Fix bad/inconsistent/empty categorical (bitset) splits for DRF/GBM
[PUBDEV-3793] - Auto-generate R bindings

Turnbull (3.10.1.2) - 12/14/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turnbull/2/index.html

Bug

[PUBDEV-2801] - Starting h2o server from R ignores IP and port parameters
[PUBDEV-3484] - Treat 1-element numeric list as acceptable when numeric input required
[PUBDEV-3509] - h2o's cor() breaks R's native cor()
[PUBDEV-3592] - h2o.get_grid isn't working
[PUBDEV-3607] - `cor` function should properly pass arguments
[PUBDEV-3629] - Avoid confusing error message when column name is not found.
[PUBDEV-3631] - overwrite_with_best_model fails when using checkpoint
[PUBDEV-3633] - plot.h2oModel in R no longer supports metrics with uppercase names (e.g. AUC)
[PUBDEV-3642] - Fix citibike R demo
[PUBDEV-3697] - Create an Attribute for Number of Interal Trees in Python
[PUBDEV-3704] - Error with early stopping and score_tree_interval on GBM
[PUBDEV-3735] - Python's coef() and coef_norm() should use column name not index
[PUBDEV-3757] - Perfbar does not work for hierarchical path passed via -h2o_context

New Feature

[PUBDEV-3474] - Show Partial Dependence Plots in Flow
[PUBDEV-3620] - Allow setting nthreads > 255.
[PUBDEV-3700] - Add RMSE, MAE, RMSLE, and lift_top_group as stopping metrics
[PUBDEV-3719] - Update h2o.mean in R to match Python API

Task

[PUBDEV-3579] - Document Partial Dependence Plot in Flow
[PUBDEV-3621] - Add R endpoint for cumsum, cumprod, cummin, and cummax
[PUBDEV-3649] - Modify correlation matrix calculation to match R
[PUBDEV-3657] - Remove max_confusion_matrix_size from booklets & py doc

Improvement

[HEXDEV-645] - aggregator should calculate domain for enum columns in aggregated output frames & member frames based on current output or member frame
[HEXDEV-658] - Naive Bayes (and maybe GLM): Drop limit on classes that can be predicted (currently 1000)
[PUBDEV-3625] - Speed up GBM and DRF
[PUBDEV-3756] - Support `-context_path` to change servlet path for REST API

IT Help

[PUBDEV-3279] - Adding a custom loss-function

Turing (3.10.0.10) - 11/7/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/10/index.html

Bug

[PUBDEV-3484] - Treat 1-element numeric list as acceptable when numeric input required
[PUBDEV-3675] - Cannot determine file type

Turing (3.10.0.9) - 10/25/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/9/index.html

Bug

[PUBDEV-3546] - h2o.year() method does not return year
[PUBDEV-3559] - Regression Training Metrics: Deviance and MAE were swapped
[PUBDEV-3568] - h2o.max returns NaN even when na.rf condition is set to TRUE
[PUBDEV-3593] - Fix display of array-valued entries in TwoDimTables such as grid search results

Improvement

[PUBDEV-3585] - Optimize algorithm for automatic estimation of K for K-Means
[HEXDEV-646] - include flow, /3/ API accessible Aggregator model in h2o-3

Turing (3.10.0.8) - 10/10/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/8/index.html

Technical task

[PUBDEV-3363] - R binding for new MOJO

Bug

[PUBDEV-3384] - S3 API method PersistS3#uriToKey breaks expected contract
[PUBDEV-3437] - GLM multinomial with defaults fails on attached dataset
[PUBDEV-3441] - .structure() encounters list index out of bounds when nan is encountered in column
[PUBDEV-3455] - max_active_predi tors option in glm does not work anymore
[PUBDEV-3461] - Printed PCA model metrics in R is missing
[PUBDEV-3477] - R - Unnecessary JDK requirement on Windows
[PUBDEV-3505] - uuid columns with mostly missing values causes parse to fail.
[HEXDEV-599] - Fold Column not available in h2o.grid

New Feature

[PUBDEV-1943] - Compute partial dependence data
[PUBDEV-3422] - Create Method to Return Columns of Specific Type
[PUBDEV-3491] - Find optimal number of clusters in K-Means
[PUBDEV-3492] - Add optional categorical encoding schemes for GBM/DRF

Task

[PUBDEV-3327] - Tasks for completing MOJO support
[PUBDEV-3444] - Ensure functions have `h2o.*` alias in R API

Improvement

[PUBDEV-3465] - Sync up functionality of download_mojo and download_pojo in R & Py
[PUBDEV-3499] - Improve the stopping criterion for K-Means Lloyds iterations
[HEXDEV-596] - Encryption of H2O communication channels
[HEXDEV-636] - add option to Aggregator model to show ignored columns in output frame

Turing (3.10.0.7) - 9/19/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/7/index.html

Bug

[PUBDEV-3300] - NPE during categorical encoding with cross-validation (Windows 8 runit only??)
[PUBDEV-3306] - H2OFrame arithmetic/statistical functions return inconsistent types
[PUBDEV-3315] - Multi file parse fails with NPE
[PUBDEV-3374] - h2o.hist() does not respect breaks
[PUBDEV-3401] - importFiles, with s3n, gives NullPointerException
[PUBDEV-3409] - Python Structure() Breaks When Applied to Entire Dataframe

New Feature

[PUBDEV-2707] - Diff operation on column in H2O Frame
[HEXDEV-619] - calculate residuals in h2o-3 and in flow and create a new frame with a new column that contains the residuals

Task

[PUBDEV-2785] - Clean up Python booklet code in repo

Improvement

[PUBDEV-3296] - In R, allow x to be missing (meaning take all columns except y) for all supervised algo's
[PUBDEV-3329] - median() should return a list of medians from an entire frame
[PUBDEV-3334] - Conduct rbind and cbind on multiple frames
[PUBDEV-3387] - Add argument to H2OFrame.print in R to specify number of rows
[PUBDEV-3418] - Suppress chunk summary in describe()

Turing (3.10.0.6) - 8/25/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/6/index.html

Bug

[HEXDEV-608] - Hashmap in H2OIllegalArgumentException fails to deserialize & throws FATAL
[PUBDEV-2879] - NPE in MetadataHandler
[PUBDEV-3086] - hist() fails for constant numeric columns
[PUBDEV-3173] - Client mode: flatfile requires list of all nodes, but a single entry node should be sufficient
[PUBDEV-3207] - Make CreateFrame reproducible for categorical columns.
[PUBDEV-3208] - Fix intermittency of categorical encoding via eigenvector.
[PUBDEV-3211] - isBitIdentical is returning true for two Frames with different content
[PUBDEV-3222] - AssertionError for DL train/valid with categorical encoding
[PUBDEV-3237] - Wrong MAE for observation weights other than 1.
[PUBDEV-3244] - H2ODriver for CDH5.7.0 does not accept memory settings
[PUBDEV-3276] - H2OFrame.drop() leaves the frame in inconsistent state

New Feature

[PUBDEV-3007] - Implement skewness calculation for H2O Frames
[PUBDEV-3008] - Implement kurtosis calculation for H2O Frames
[PUBDEV-3128] - Add ability to do a deep copy in Python API
[PUBDEV-3163] - Add docs for h2o.make_metrics() for R and Python
[PUBDEV-3218] - Add RMSLE to model metrics
[PUBDEV-3264] - Return unique values of a categorical column as a Pythonic list

Task

[PUBDEV-3235] - Refactor and simplify implementation of Pearson Correlation
[PUBDEV-3238] - Add MAE to CV Summary

Improvement

[PUBDEV-2702] - Create h2o.* functions for H2O primitives
[PUBDEV-3098] - Add methods to get actual and default parameters of a model
[PUBDEV-3132] - Add ability to drop a list of columns or a subset of rows from an H2OFrame
[PUBDEV-3138] - Ensure all is*() functions return a list

Turing (3.10.0.3) - 7/29/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turing/3/index.html

Bug

[PUBDEV-2805] - Error when setting a string column to a single value in R/Py
[PUBDEV-2965] - R h2o.merge() ignores by.x and by.y
[PUBDEV-3135] - Download Logs broken URL from Flow

New Feature

[PUBDEV-2958] - H2O Version Check
[PUBDEV-3022] - Add an h2o.concat function equivalent to pandas.concat
[PUBDEV-3050] - Add Huber loss function for GBM and DL (for regression)
[PUBDEV-3071] - Add RMSE to model metrics
[PUBDEV-3104] - Add Mean Absolute Error to Model Metrics
[PUBDEV-3108] - Add mean absolute error to scoring history and model plotting
[PUBDEV-3116] - Add categorical encoding schemes for DL and Aggregator
[PUBDEV-3155] - Compute supervised ModelMetrics from predicted and actual values in Java/R
[PUBDEV-3162] - Compute supervised ModelMetrics from predicted and actual values in Python

Improvement

[PUBDEV-1888] - Implement gradient checking for DL
[PUBDEV-2627] - Add better warning message to functions of H2OModelMetrics objects
[PUBDEV-3021] - Add demo datasets to Python package
[PUBDEV-3113] - Replace "MSE" with "RMSE" in scoring history table
[PUBDEV-3122] - Make all TwoDimTable Headers Pythonic in R and Python API
[PUBDEV-3129] - Achieve consistency between DL and GBM/RF scoring history in regression case
[PUBDEV-3131] - Disable R^2 stopping criterion in tree model builders
[PUBDEV-3149] - Remove R^2 from all model output except GLM

Turin (3.8.3.4) - 7/15/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turin/4/index.html

Bug

[PUBDEV-3040] - File parse from S3 extremely slow
[PUBDEV-3145] - Fix Deep Learning POJO for hidden dropout other than 0.5

Turin (3.8.3.2) - 7/1/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turin/2/index.html

Bug

[PUBDEV-898] - DRF: sample_rate=1 not permitted unless validation is performed
[PUBDEV-2087] - create a set of tests which create large POJOs for each algo and compiles them
[PUBDEV-2322] - Merge (method="radix") bug1
[PUBDEV-2325] - Merge (method="radix") bug2
[PUBDEV-2565] - Fold Column not available in h2o.grid
[PUBDEV-2964] - h2o.merge(,method="radix") failing 15/40 runs
[PUBDEV-3030] - Parse: java.lang.IllegalArgumentException: 0 > -2147483648
[PUBDEV-3032] - Cached errors are not printed if H2O exits
[PUBDEV-3072] - java.lang.ClassCastException for Quantile GBM
[PUBDEV-3077] - model_summary number of trees is too high for multinomial DRF/GBM models
[PUBDEV-3079] - NPE when accessing invalid null Frame cache in a Frame's vecs()
[PUBDEV-3081] - TwoDimTable version of a Frame prints missing value (NA) as 0
[PUBDEV-3089] - Fix tree split finding logic for some cases where min_rows wasn't satisfied and the entire column was no longer considered even if there were allowed split points
[PUBDEV-3093] - saveModel and loadModel don't work with windows c:/ paths
[PUBDEV-3095] - getStackTrace fails on NumberFormatException
[PUBDEV-3096] - TwoDimTable for Frame Summaries doesn't always show the full precision
[PUBDEV-3097] - DRF OOB scoring isn't using observation weights
[PUBDEV-3099] - AIOOBE when calling 'getModel' in Flow while a GLM model is training

Task

[PUBDEV-2681] - Properly document the addition of missing_values_handling arg to GLM

Improvement

[PUBDEV-1617] - Matt's new merge (aka join) integrated into H2O
[PUBDEV-2822] - Improved handling of missing values in tree models (training and testing)
[PUBDEV-3060] - IPv6 documentation
[PUBDEV-3066] - Stop GBM models once the effective learning rate drops below 1e-6.
[PUBDEV-3094] - Log input parameters during boot of H2O

Turchin (3.8.2.9) - 6/10/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/9/index.html

Bug

[PUBDEV-2920] - Python apply() doesn't recognize % (modulo) within lambda function
[PUBDEV-2940] - Documentation: Add RoundRobin histogram_type to GBM/DRF
[PUBDEV-2957] - Add "seed" option to GLM in documentation
[PUBDEV-2973] - Documentation: Update supported Hadoop versions
[PUBDEV-2981] - Models hang when max_runtime_secs is too small
[PUBDEV-2982] - Default min/max_mem_size to gigabytes in h2o.init
[PUBDEV-2997] - Add "ignore_const_cols" argument to glm and gbm for Python API
[PUBDEV-2999] - AIOOBE in GBM if no nodes are split during tree building
[PUBDEV-3004] - Negative R^2 (now NaN) can prevent early stopping
[PUBDEV-3011] - Two grid sorting methods in Py API - only one works sometimes

New Feature

[PUBDEV-2743] - Add seed argument to GLM
[PUBDEV-2917] - Add cor() function to Rapids

Task

[PUBDEV-3005] - Verify checkpoint argument in h2o.gbm (for R)

Improvement

[PUBDEV-2040] - Sync up argument names in `h2o.init` between R and Python
[PUBDEV-2996] - Change `getjar` to `get_jar` in h2o.download_pojo in R
[PUBDEV-2998] - Change min_split_improvement default value from 0 to 1e-5 for GBM/DRF
[PUBDEV-3013] - Allow specification of "AUC" or "auc" or "Auc" for stopping_metrics, sorting of grids, etc.

Turchin (3.8.2.8) - 6/2/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/8/index.html

Bug

[PUBDEV-2985] - Make Random grid search consistent between clients for same parameters
[PUBDEV-2987] - Allow learn_rate_annealing to be passed to H2OGBMEstimator constructor in Python API
[PUBDEV-2989] - Fix typo in GBM/DRF Python API for col_sample_rate_change_per_level - was misnamed and couldn't be set

New Feature

[PUBDEV-2979] - Add a new metric: mean misclassification error for classification models

Improvement

[PUBDEV-2972] - No longer print negative R^2 values - show NaN instead
[PUBDEV-2984] - Add xval=True/False as an option to model_performance() in Python API

Turchin (3.8.2.6) - 5/24/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/6/index.html

Bug

[PUBDEV-1899] - Number of active predictors is off by 1 when Intercept is included
[PUBDEV-2942] - GLM with cross-validation AIOOBE (+ Grid-Search + Multinomial, may be related)
[PUBDEV-2943] - Improved accuracy for histogram_type="QuantilesGlobal" for DRF/GBM

New Feature

[PUBDEV-1705] - GLM needs 'seed' argument for new (random) implementation of n-folds
[PUBDEV-2743] - Add seed argument to GLM

Improvement

[PUBDEV-2928] - Remove _Dev from file name _DataScienceH2O-Dev
[PUBDEV-2945] - Clean up overly long and duplicate error message in KeyV3
[PUBDEV-2953] - Allow the user to pass column types of an existing H2OFrame during Parse/Upload in R and Python
[PUBDEV-2954] - Tweak Parser Heuristic
[PUBDEV-2955] - GLM improvements and fixes

Turchin (3.8.2.5) - 5/19/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/5/index.html

Technical task

[PUBDEV-2909] - Documentation update for relevel

Bug

[PUBDEV-2282] - DRF: cannot compile pojo
[PUBDEV-2304] - GBM pojo compile failures
[PUBDEV-2878] - Bug in h2o-py H2OScaler.inverse_transform()
[PUBDEV-2880] - Add NAOmit() to Rapids
[PUBDEV-2897] - AIOOBE in Vec.factor (due to Parse bug?)
[PUBDEV-2903] - In grid search, max_runtime_secs without max_models hangs
[PUBDEV-2933] - GBM's fold_assignment = "Stratified" breaks with missing values in response column

New Feature

[PUBDEV-2729] - Implement h2o.relevel, equivalent of base R's relevel function
[PUBDEV-2857] - Add Kerberos authentication to Flow
[PUBDEV-2893] - Summaries Fail in rdemo.citi.bike.small.R
[PUBDEV-2895] - DimReduction for EasyModelAPI
[PUBDEV-2915] - Make histograms truly adaptive (quantiles-based) for DRF/GBM

Task

[PUBDEV-2902] - Add a list of gridable parameters to the docs
[PUBDEV-2904] - Add relevel() to Python API

Improvement

[PUBDEV-2905] - Improve the progress bar based on max_runtime_secs & max_models & actual work
[PUBDEV-2908] - Improve GBM/DRF reproducibility for fixed parameters and hardware
[PUBDEV-2911] - Check sanity of random grid search parameters (max_models and max_runtime_secs)
[PUBDEV-2912] - Add Job's remaining time to Flow
[PUBDEV-2919] - Add enum option 'histogram_type' to DRF/GBM (and remove random_split_points)
[PUBDEV-2923] - JUnit: Separate POJO namespace during junit testing

Turchin (3.8.2.3) - 4/25/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/3/index.html

Bug

[PUBDEV-2852] - Incorrect sparse chunk getDoubles() extraction

New Feature

[PUBDEV-2825] - Create h2o.get_grid
[PUBDEV-2834] - Implement distributed Aggregator for visualization
[PUBDEV-2835] - Add col_sample_rate_change_per_level for GBM/DRF
[PUBDEV-2836] - Add learn_rate_annealing for GBM
[PUBDEV-2837] - Add random cut points for histograms in DRF/GBM (ExtraTreesClassifier)
[PUBDEV-2851] - Add limit on max. leaf node contribution for GBM

Task

[PUBDEV-2848] - Add tests for early stopping logic (stopping_rounds > 0)

Improvement

[PUBDEV-2877] - Make NA split decisions internally more consistent

Turchin (3.8.2.2) - 4/8/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/2/index.html

Bug

[PUBDEV-2820] - Implement max_runtime_secs to limit total runtime of building GLM models with and without cross-validation enabled

New Feature

[PUBDEV-2815] - Add stratified sampling per-tree for DRF/GBM

Turchin (3.8.2.1) - 4/7/2016

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/1/index.html

Bug

[PUBDEV-2766] - AIOOBE for quantile regression with stochastic GBM
[PUBDEV-2770] - Naive Bayes AIOOBE
[PUBDEV-2772] - AIOOBE for GBM if test set has different number of classes than training set
[PUBDEV-2775] - Number of CPUs incorrect in Flow when using a hypervisor
[PUBDEV-2796] - Grid search runtime isn't enforced for CV models
[PUBDEV-2819] - AIOOBE in GLM for dense rows in sparse data

New Feature

[PUBDEV-2540] - Compute and display statistics of cross-validation model metrics
[PUBDEV-2774] - Add keep_cross_validation_fold_assignment and more CV accessors
[PUBDEV-2776] - Set initial weights and biases for DL models
[PUBDEV-2791] - Control min. relative squared error reduction for a node to split (DRF/GBM)
[PUBDEV-2806] - On-the-fly interactions for GLM
[PUBDEV-2815] - Add stratified sampling per-tree for DRF/GBM

Task

[PUBDEV-2055] - Create test cases to show that POJO prediction behavior can be different than in-h2o-model prediction behavior

Improvement

[PUBDEV-2620] - Populate start/end/duration time in milliseconds for all models
[PUBDEV-2695] - Consistent handling of missing categories in GBM/DRF (and between H2O and POJO)
[PUBDEV-2736] - Alert the user if columns can't be histogrammed due to numerical extremities
[PUBDEV-2756] - GLM should generate error if user enter an alpha value greater than 1.
[PUBDEV-2763] - Create full holdout prediction frame for cross-validation predictions
[PUBDEV-2769] - Support Validation Frame and Cross-Validation for Naive Bayes
[PUBDEV-2810] - Add class_sampling_factors argument to DRF/GBM for R and Python APIs

Turan (3.8.1.4) - 3/16/16

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turan/4/index.html

Bug

[PUBDEV-542] - KMeans: Size of clusters in Model Output is different from the labels generated on the training set
[PUBDEV-1976] - GLM fails on negative alpha
[PUBDEV-2718] - countmatches bug
[PUBDEV-2727] - bug in processTables in communication.R
[PUBDEV-2742] - Allow strings to be set to NA

New Feature

[PUBDEV-2719] - Implement Shannon entropy for a string
[PUBDEV-2720] - Implement proportion of substrings that are valid English words
[PUBDEV-2733] - Add utility function, h2o.ensemble_performance for ensemble and base learner metrics
[PUBDEV-2741] - Add date/time and string columns to createFrame.

Task

[PUBDEV-58] - Certify sparkling water on CDH5.2

Improvement

[PUBDEV-277] - Make python equivalent of as.h2o() work for numpy array and pandas arrays

Turan (3.8.1.3) - 3/6/16

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turan/3/index.html

Bug

[PUBDEV-2644] - Collinear columns cause NPE for P-values computation
[PUBDEV-2721] - Update default values in h2o.glm.wrapper from -1 and NaN to NULL
[PUBDEV-2722] - AIOOBE in NewChunk

New Feature

[PUBDEV-2111] - Hive UDF form for Scoring Engine POJO for H2O Models

Turan (3.8.1.2) - 3/4/16

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turan/2/index.html

Bug

[PUBDEV-2713] - /3/scalaint fails with a 404

New Feature

[PUBDEV-2711] - Allow DL models to be pretrained on unlabeled data with an autoencoder

Improvement

[PUBDEV-2708] - H2O Flow does not contain CodeMirror library
[PUBDEV-2710] - Model export fails: parent directory does not exist
[PUBDEV-2712] - Flow doesn't show DL AE error (MSE) plot
[PUBDEV-2717] - Do not compute expensive quantiles during h2o.summary call

Turan (3.8.1.1) - 3/3/16

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-turan/1/index.html

Technical task

[PUBDEV-2705] - implement random (stochastic) hyperparameter search

Bug

[PUBDEV-2639] - Parse: Incorrect assertion error caused by very large few column data
[PUBDEV-2649] - h2o::|,& operator handles NA's differently than base::|,&
[PUBDEV-2655] - h2o::as.logical behavior is different than base::as.logical
[PUBDEV-2682] - Importing CSV file is not working with "java -jar h2o.jar -nthreads -1"
[PUBDEV-2685] - Allow DL reproducible mode to work with user-given train_samples_per_iteration >= 0
[PUBDEV-2690] - Grid Search NPE during Flow display after grid was cancelled
[PUBDEV-2693] - NPE in initialMSE computation for GBM
[PUBDEV-2696] - DL checkpoint restart doesn't honor a change in stopping_rounds

New Feature

[PUBDEV-1883] - Add option to train with mini-batch updates for DL
[PUBDEV-2698] - Return leaf node assignments for DRF + GBM

Improvement

[PUBDEV-2674] - Change default functionality of as_data_frame method in Py H2O
[PUBDEV-2697] - Add method setNames for setting column names on H2O Frame
[PUBDEV-2703] - NPE in Log.write during cluster shutdown

Tukey (3.8.0.6) - 2/23/16

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tukey/6/index.html

Enhancements

The following changes are improvements to existing features (which includes changed default values):

System

PUBDEV-2362: Handling Sparsity with Missing Values
PUBDEV-2683: Fix for erroneous conversion of NaNs to zeros during rebalancing
PUBDEV-2684: Remove bigdata test file (not available)

Bug Fixes

The following changes resolve incorrect software behavior:

Algorithms

PUBDEV-2678: CV models during grid search get overwritten

R

PUBDEV-2648: Di/trigamma handle NA
PUBDEV-2679: Progress bar for grid search with N-fold CV is wrong when max_models is given

Tukey (3.8.0.1) - 2/10/16

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tukey/1/index.html

New Features

These changes represent features that have been added since the previous release:

API

PUBDEV-1798: Ability to conduct a randomized grid search with optional limit of max. number of models or max. runtime
PUBDEV-1822: Add score_tree_interval to GBM to score every n'th tree
PUBDEV-2311: Make it easy for clients to sort by model metric of choice
PUBDEV-2548: Add ability to set a maximum runtime limit on all models
PUBDEV-2632: Return a grid search summary as a table with desired sort order and metric

Algorithms

HEXDEV-495: Added ability to calculate GLM p-values for non-regularized models
PUBDEV-853: Implemented gain/lift computation to allow using predicted data to evaluate the model performance
PUBDEV-2118: Compute the lift metric for binomial classification models
PUBDEV-2212: Add absolute loss (Laplace distribution) to GBM and Deep Learning
PUBDEV-2402: Add observations weights to quantile computation
PUBDEV-2469: For GBM/DRF, add ability to pick columns to sample from once per tree, instead of at every level
PUBDEV-2594: Quantile regression for GBM and Deep Learning
PUBDEV-2625: Add recall and specificity to default ROC metrics

Python

HEXDEV-399: Added support for Python 3.5 and better (in addition to existing support for 2.7 and better)

Enhancements

The following changes are improvements to existing features (which includes changed default values):

Algorithms

PUBDEV-2233: Adjust string substitution and global string substitution to do in place updates on a string column.

Python

PUBDEV-1981: Fix layout issues of Python docs.
PUBDEV-2335: as.numeric for a string column only converts strings to ints rather than reals
PUBDEV-2257: Table printout in Python doesn't warn the user about truncation
PUBDEV-2460: Version mismatch message directs user to get a matching download
HEXDEV-527: Implement secure Python h2o.init
PUBDEV-2504: Check and print a warning if a proxy environment variable is found

R

PUBDEV-2335: as.numeric for a string column only converts strings to ints rather than reals
PUBDEV-2257: Table printout in R doesn't warn the user about truncation
PUBDEV-2430: Improve R's reporting on quantiles
PUBDEV-2460: Version mismatch message directs user to get a matching download

Flow

PUBDEV-2407: Improve model convergence plots in Flow
PUBDEV-2596: Flow shows empty logloss box for regression models
PUBDEV-2617: Flow's histogram doesn't cover the full support

System

HEXDEV-436: exportFile should be a real job and have a progress bar
PUBDEV-2459: Improve parse chunk size heuristic for better use of cores on small data sets
PUBDEV-2606: Print all columns to stdout for Hadoop jobs for easier debugging

Bug Fixes

The following changes resolve incorrect software behavior:

API

PUBDEV-2633: Ability to extend grid searches with more models

Algorithms

PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
PUBDEV-2114: Set GLM to give error when lower bound > upper bound in beta contraints
PUBDEV-2190: Set GLM to default to a value of rho = 0, if rho is not provided when beta constraints are used
PUBDEV-2210: Add check for epochs value when using checkpointing in deep learning
PUBDEV-2241: Set warnings about slowness from wide column counts comes before building a model, not after
PUBDEV-2278: Fix docstring reporting in iPython
PUBDEV-2366: Fix display of scoring speed for autoencoder
PUBDEV-2426: GLM gives different std. dev. and means than expected
PUBDEV-2595: Bad (perceived) quality of DL models during cross-validation due to internal weights handling
PUBDEV-2626: GLM with weights gives different answer h2o vs R

Python

PUBDEV-2319: sd not working inside group_by
PUBDEV-2403: Parser reads file of empty strings as 0 rows
PUBDEV-2404: Empty strings in Python objects parsed as missing

R

PUBDEV-2319: sd not working inside group_by
PUBDEV-2231: Fix bug in summary when zero-count categoricals were present.
PUBDEV-1749: Fix h2o.apply to correctly handle functions (so long as functions contain only H2O supported primitives)

System

PUBDEV-1872: Ability to ignore 0-byte files during parse
PUBDEV-2401: /Jobs fails if you build a Model and then overwrite it in the DKV with any other type
PUBDEV-2603: Improve progress bar for grid/hyper-param searches

Tibshirani (3.6.0.9) - 12/7/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/9/index.html

New Features

These changes represent features that have been added since the previous release:

API

PUBDEV-2189: H2O now allows selection of the non_negative flag in GLM for R and Python

Algorithms

PUBDEB-1540: Added Generalized Low-Rank Model (GLRM) algorithm
PUBDEV-2119: Added gains/lift computation
GitHub commit: Added remove_colinear_columns parameter to GLM

R

PUBDEV-2079: R now retrieves column types for a H2O Frame more efficiently

Python

PUBDEV-2294: Added Python equivalent for h2o.num_iterations
PUBDEV-2233: Added sub and gsub to Python client
GitHub commit: Added weighted quantiles to Python API
PUBDEV-1304: Added sapply operator to Python
PUBDEV-1969: H2O now plots decision boundaries for classifiers in Python

Enhancements

The following changes are improvements to existing features (which includes changed default values):

Algorithms

GitHub commit: Change in behavior in GLM beta constraints - when ignoring constant/bad columns, remove them from beta_constraints as well
GitHub commit: Added ignore_const_cols to all algos
PUBDEV-2311: Improved ability to sort by model metric of choice in client

Python

PUBDEV-2409: H2O now checks for H2O_DISABLE_STRICT_VERSION_CHECK env variable in Python GitHub commit
GitHub commit: H2O now allows l/r values to be null or an empty string
GitHub commit: H2O now accomodates LOAD_FAST and LOAD_GLOBAL in bytecode_to_ast

R

PUBDEV-1378: In R, h2o.getTimezone() previously returned a list of one, now it just returns the string

System

GitHub commit: Added more tweaks to help various low-memory configurations

Bug Fixes

The following changes resolve incorrect software behavior:

API

PUBDEV-2042: h2o.grid failed when REST API version was not default
PUBDEV-2401: /Jobs failed if you built a Model and then overwrote it in the DKV with any other type GitHub commit
PUBDEV-2392: /3/Jobs failed with exception after running /3/SplitFrame
GitHub commit: PUBDEV-2426 - Fixed error where sd and mean were adjusted to weights even if no observation weights were passed

Algorithms

PUBDEV-2396: GLRM validation frames must have the same number of rows as the training frame
PUBDEV-2053: Fixed assertion failure in Deep Learning
PUBDEV-2315: Could not compile POJO using K-means
PUBDEV-2317: Could not compile POJO using PCA
PUBDEV-2320: Could not compile POJO using Naive Bayes
GitHub commit: Fixed weighted mean and standard deviation computation in GLM
GitHub commit: Fixed stopping criteria for lambda search and multinomial in GLM

Python

PUBDEV-2262: H2OFrame indexing was no longer Pythonic on Bleeding Edge 10/23
PUBDEV-2278: Trying to get help in python client displayed the frame
PUBDEV-2371: Fixed ASTEQ str_op bug GitHub commit

R

PUBDEV-1749: h2o.apply did not correctly handle functions
PUBDEV-2335: R: as.numeric for a string column only converted strings to ints rather than reals
PUBDEV-2319: R: sd was not working inside group_by
PUBDEV-2397: R: Ignore Constant Columns was not an argument in Algos in R like it is in Flow
PUBDEV-2134: When a dataset was sliced, the int mapping of enums was returned
PUBDEV-2408: Improved handling when H2O has already been shutdown in R GitHub commit
PUBDEV-2231: Fixed categorical levels mapping bug

System

PUBDEV-2403: Parser read file of empty strings as 0 rows GitHub commit
PUBDEV-2404: Empty strings in python objects were parsed as missing GitHub commit
PUBDEV-2375: Save Model (Deeplearning): the filename for the model metrics file is too long for windows to handle
GitHub commit: Fixed streaming load bug for large files
PUBDEV-2241: Column width slowness warning now prints before model build, not after

Tibshirani (3.6.0.7) - 11/23/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/7/index.html

Enhancements

The following changes are improvements to existing features (which includes changed default values):

Algorithms

GitHub commit: Added Iterations and Epochs to DL job status updates, added Iterations to scoring history
GitHub commit: Cleaned up iteration counter to work for checkpointing
GitHub commit: Cleaned up counter iteration logic

Bug Fixes

The following changes resolve incorrect software behavior:

Algorithms

GitHub commit: Fixed scoring speed display for autoencoder, was showing 0 because wrong runtime was used (ms since 1970 instead of actual runtime)

Tibshirani (3.6.0.2) - 11/5/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/2/index.html

New Features

Algorithms

GitHub commit: Added support for grid search
PUBDEV-2272: Implemented GLRM grid search in R and Python
GitHub commit: PUBDEV-2289: Enabled early convergence-based stopping by default for Deep Learning
GitHub commit: Added L1+LBFGS solver for multinomial GLM

Python

GitHub commit: PUBDEV-2289: Added Python API for convergence-based stopping

R

GitHub commit: Added .Last to Delete InitID
GitHub commit: PUBDEV-2289: Enabled convergence-based early stopping for R API of Deep Learning

Enhancements

Algorithms

GitHub commit: Enable grid search for Deep Learning parameters overwrite_with_best_model, momentum_ramp, elastic_averaging, elastic_averaging_moving_rate, & elastic_averaging_regularization
GitHub commit: PUBDEV-2289: Stopping tolerance and stopping metric are no longer hidden if stopping_rounds is 0
GitHub commit: Added checks to verify the mean, median, nrow, var, and sd are calculated correctly in groupby
GitHub commit: mean and sd now return lists

Python

GitHub commit: [PUBDEV-2257] H2O now gives users [row x col] of Frame in __str__
GitHub commit: sd/var is now sampled for group_by
GitHub commit: Parameter checking is now split between float and strings/unicode
GitHub commit: H2O now only wipes src._ex if src_in_self
GitHub commit: Refactored default arg handling in astfun
GitHub commit: Added new parameters to estimators
GitHub commit: Added session start/end; Python now ends the session on exit
GitHub commit: src and self types are now checked for None
GitHub commit: H2O now passes caches through all prefix ops
GitHub commit: H2O now pushes cached types, names, and ncols forward if possible

R

PUBDEV-1951: Removed the R backward compatibility shim
GitHub commit: Added [rows x cols] to print.Frame in R
GitHub commit: sd can now alias sdev in group_by
GitHub commit: Changed .eval.driver to .fetch.data in h2o.getFrame
GitHub commit: Removed debug printing of ==Finalizer on in R
GitHub commit: Added metalearning function

System

HEXDEV-475: Added EasyPOJO comments and improvements
GitHub commit: [PUBDEV-2204] Enabled Vec#toCategoricalVec to convert string columns to categorical columns
GitHub commit: apply now works in

Bug Fixes

Algorithms

PUBDEV-2317: PCA: Could not compile POJO
GitHub commit: [PUBDEV-2317] Incorrect PCA code was generated

Python

GitHub commit: PUBDEV-2297: Python was not updating exception on job update
GitHub commit: Added missing arguments to DRF/GBM/DL in scikit-learn-like API
GitHub commit: Fixed impute in Python
GitHub commit: Restored ASTRename
GitHub commit: Fixed reference to _quoted in H2O module

R

GitHub commit: [PUBDEV-2301, PUBDEV-2314] Hidden grid parameter was passed incorrectly from R
GitHub commit: H2O now uses deep copy when using assign from one global to another
GitHub commit: Fixed getFrame and directory unlink

System

PUBDEV-1824: h2o.init() failed to launch on the Docker image
PUBDEV-2043: Deep Learning generated an assertion error
GitHub commit: Fixed rm handling of non-frames
GitHub commit: Fixed log_level
GitHub commit: Fixed eq2 slot assign
GitHub commit: Fixed a bug found during benchmarking for small data
GitHub commit: PUBDEV-2295: User-given weights were accidentally passed to N-fold CV models
GitHub commit: Fixed NPE in Grid Schema
GitHub commit: PUBDEV-2289: Convergence checks are now numerically stable

Slotnick (3.4.0.1)

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slotnick/1/index.html

New Features

API

GitHub commit: Added NumList and StrList
PUBDEV-674: Added REST API and R / Python for grid search

Algorithms

GitHub commit: Added option in PCA to use randomized subspace iteration method for calculation
GitHub commit: Deep Learning: Added target_ratio_comm_to_comp to R and Python client APIs
GitHub commit: PUBDEV-1247: Added stochastic GBM parameters (sample_rate and col_sample_rate) to R/Py APIs
PUBDEV-1450: GLRM has been tested and removed from "experimental" status

Hadoop

GitHub commit: Added support for H2O with HDP2.3

Python

GitHub commit: Added _to_string method
PUBDEV-2166: Added Python grid client GitHub commit
PUBDEV-2098: Scoring history in Python is now visualized (GitHub commit)
GitHub commit: PUBDEV-2020: Python implementation and test for split_frame()

R

This software release introduces changes to the R API that may cause previously written R scripts to be inoperable. For more information, refer to the following link.

GitHub commit: Added h2o.getTypes() to the R wrapper
GitHub commit: Added ability to set col.types with a named list
GitHub commit: Added h2o.getId() to get the back-end distributed key/value store ID from a Frame
GitHub commit: Added column types to H2O frame in R, which allows R to set the correct column types when as.data.frame() is used on an H2O frame
GitHub commit: Added @export for exported R functions

System

GitHub commit: Added string length util for Enum columns
[GitHub commit: Added pass-through version of toCategoricalVec(), toNumericVec(), and toStringVec() to Vec.java for code simplicity and backwards compatibility
GitHub commit: Added string column handling to StrSplit()

Web UI

PUBDEV-1977: Added grid search to Flow web UI

Enhancements

Algorithms

PUBDEV-467: Show Frames for DL weights/biases in Flow
PUBDEV-1847: DRF/GBM: nbins_top_level is now configurable
GitHub commit: Deep Learning: Scoring time is now shown in the logs
GitHub commit: Sped up GBM split finding by dynamically switching between single and multi-threaded based on workload
PUBDEV-1247: Implemented Stochastic GBM
GitHub commit: Parallelized split finding for GBM/DRF (useful for large numbers of columns and nbins).
GitHub commit: Added improvements to speed up DRF (up to 35% faster) and stochastic GBM (up to 5x faster)
GitHub commit: Added some straight-forward optimizations for GBM histogram building
GitHub commit: GLRM is now deterministic between one vs. many chunks
GitHub commit: Input parameters are now immutable
GitHub commit: PUBDEV-2135: Cleaned up N-fold CV model parameter sanity checking and error message propagation; now checks all N-fold model parameters upfront and lets the main model carry the message to the user
GitHub commit: PUBDEV-2130: N-fold CV models are no longer deleted when the main model is deleted
GitHub commit: PUBDEV-2107: The title in plot.H2OBinomialMetrics is now editable
GitHub commit: Parse Python lambda (bytecode -> ast -> rapids)
GitHub commit: PUBDEV-1847: Cleaned up/refactored GBM/DRF
GitHub commit: Updated MeanSquare to Quadratic for DL
GitHub commit: PUBDEV-2133: Speed up Enum mapping between train/test from O(N^2) to O(N*log(N))
GitHub commit: Added GLRM scoring history with step size and average change in objective function value
GitHub commit: SVD now outputs the V matrix as a frame with a frame key, rather than a double array in the API
GitHub commit: Modified k-means++ initialization in GLRM to set X to inverse of cluster distance with sum normalized to one, for each observation in training data
GitHub commit: Increased GBM worker thread priority to avoid deadlock with high parallel GBM job counts
GitHub commit: Added input parameter svd_method to GLRM

Python

GitHub commit: centers_std is now returned as a list of columns
GitHub commit: str(Frame) no longer returns an ID; updated ExprNode _to_string to accomodate
GitHub commit: Changed default setting for _isAllAscii to false
GitHub commit: Fixed var to return scalar/frame based on nrow
GitHub commit: Python now checks ncol, not nrow
PUBDEV-1060: Python's h2o.import_frame() now matches R's importFile() parameters where applicable
PUBDEV-1960: Python now uses the streaming endpoint /3/DownloadDataset.bin
PUBDEV-2223: Added normalization and standardization coefficients to the model output in Python
GitHub commit: Renamed logging to h2o_logging to avoid conflict with original logging package
GitHub commit: H2O now recognizes additional parameters (such as column names) for Python objects
GitHub commit: head and tail no longer download the entire dataset
GitHub commit: Truncated DF in head and tail before calling /DownloadDataset
GitHub commit: head() and tail() now default to pretty printing in Python
GitHub commit: Moved setup functionality from parse to parse setup; col_types and na_strings can now be dictionaries
GitHub commit: Updated H2OColSelect to supply extra argument
GitHub commit: PUBDEV-2174: Relative tolerance is now used for floating point comparison
GitHub commit: Added more cloud health output to run.py
GitHub commit: When Pandas frames are returned, they are now wrapped to display nicely in iPython

R

GitHub commit: Added null check
PUBDEV-2185: When appending a vec to an existing data frame, H2O now creates a new data frame while still keeping the original frame in memory
PUBDEV-1959: R now uses the streaming endpoint /3/DownloadDataset.bin
PUBDEV-2020: h2o.splitFrame() in R/Python now uses the runif technique instead of the horizontal slice technique
GitHub commit: Changed T/F to TRUE/FALSE
GitHub commit: xml2 package is now required for rversions package
GitHub commit: Package dependencies are taken into account when installing R packages
GitHub commit: Metrics are now always computer if a dataset is provided (R h2o.performance call)
GitHub commit: Column names are now fetched from H2O
GitHub commit: PUBDEV-2150: Time columns in H2O are now imported as Date columns in R
GitHub commit: h2o.ls() now returns data.frame
GitHub commit: h2o.ls() now returns the whole frame
GitHub commit: Removed unnamed additional parameters (ellipses) in R algos
GitHub commit: Added as.characterto Rapids implementation
GitHub commit: Updated plot.H2OModel in R
GitHub commit: Updated scoring history plot in R for training_frame only
GitHub commit: Instead of : and assign, attr is now used
GitHub commit: Raw strings are now used as accessors
GitHub commit: name.Frame and dimnames.Frame are now visible

System

GitHub commit: Added vertical prefetch of all chunks' worth of data for dense rows
PUBDEV-1426: Scoring is now a non-blocking job with a progress bar
GitHub commit: EasyPojo API is now serializable
GitHub commit: Changed parse setup guess when encountering large NA counts to not favor numeric over dates or UUIDs
GitHub commit: Refactored vector type conversion methods into a class called VecUtils
GitHub commit: Cleaned up ASTStrList to handle frames with more than one vector during column conversion; checks types before converting; added several new column type conversions
GitHub commit: If the job is cancelled, scoring is now canceled
GitHub commit: Refactored doAll_numericResult() -> doAll(nout, type, frame) where all output vecs are of the given type
GitHub commit: Improved hash function
GitHub commit: The output of _train.get() is now passed to a Frame
GitHub commit: Refactored binary/col ops for aesthetics and maintainability
GitHub commit: Added correct types for new Vecs; CategoricalWrappedVec now exports a utility for enum conversions instead of a constructor
GitHub commit: Mean/sigma values are now printed to the logs after parsing
GitHub commit: PUBDEV-2174: Added some optimizations for some chunks (mostly integers) in RollupStats
GitHub commit: PUBDEV-2174: Added instantiations of Rollups for dense numeric chunks
GitHub commit: PUBDEV-2174: Implemented single-pass variance/stddev calculation for rollups
GitHub commit: PUBDEV-2174: Added hasNA() for chunks
GitHub commit: Reordered args in sub/gsub (astid > astparameter, add string -> numeric
GitHub commit: Ensured all chunks get closed
GitHub commit: NewChunk.addString() now accepts a Java string or BufferedString, eliminating needless conversion to a BufferedString before inserting into the NewChunk buffer. Improves efficiency of several ASTStrOps as well as converting Categorical columns to String columns.
GitHub commit: Renamed enums to categoricals system-wide
GitHub commit: Renamed ValueString -> BufferedString
GitHub commit: Removed redundant frame creation; added Java comments to each string utility; changed RAPIDS name of gsub -> replaceall and sub -> replacefirst; added nchar utility to the R client; updated comments in Python and R client
GitHub commit: All NA chunks are now handled in string ops
GitHub commit: Added ability for string utils to handle NA chunks
GitHub commit: Added the ability to handle duplicate rows to merge
GitHub commit: countMatches utilities now only work on string columns
GitHub commit: Changed names of SubStr and GSubStr to ReplaceFirst and ReplaceAll; both methods now only accept string columns as input
GitHub commit: Changed toUpper and toLower to only work on string columns; includes an optimzied version of each method as well as a UTF-safe version
GitHub commit: CStrChunks now track whether they are pure ASCII to allow StringUtilities to use optimized versions of the utilities that operate directly on the string buffer
GitHub commit: Moved frame function to ArrayUtils
GitHub commit: Removed categorical versions of trim() and length()
GitHub commit: Changed the merge defaults to match the implementation
GitHub commit: Merge no longer uses a by argument
GitHub commit: Added trim and length functionality for string columns
GitHub commit: HEXDEV-442: Improved POJO handling
GitHub commit: Config files are now transferred using a hexstring to avoid issues with Hadoop XML parsing
GitHub commit: HEXDEV-445: Added isNA check
GitHub commit: Means, mults, modes, and size now do bulk rollups
GitHub commit: Increased priority of model builder Driver classes to prevent deadlock when bulk-launching parallel unrelated model builds
GitHub commit: Renamed Currents to Rapids
GitHub commit: CRAN-based R clients are now set to opt-out by default
GitHub commit: Assembly states are now saved in the DKV

Web UI

PUBDEV-1961: Flow now uses the streaming endpoit /3/DownloadDataset.bin

Bug Fixes

Algorithms

GitHub commit: Fixed bug with CategoricalWrappedVec
PUBDEV-1664: Corrected math for GBM Tweedie with offsets/weights
PUBDEV-1665: Corrected math for GBM Poisson with offsets/weights
PUBDEV-2130: Deleting Deep Learning n-fold models resulted in a java.lang.AssertionError
GitHub commit: Fixed GLM with nfolds
GitHub commit: Updated GLM InitTsk to run at +1 priority level to avoid deadlock when launching hundreds of GLMs in parallel
GitHub commit: Column names (feature names) are now named correctly for the exported weight matrix connecting the input to the first hidden layer
GitHub commit: Changed isEnum to isCategorical
GitHub commit: Cleaned up DRF and GBM; fixed checkpoint restart logic for trees and changed which parameters are configurable
GitHub commit: Fixed incorrect logistic and hinge loss functions and apply to binary numeric columns in {0,1} only
GitHub commit: Fixed a bug where Poisson loss function was calculated incorrectly for values of 0
GitHub commit: Fixed DL POJO for large input columns

Python

GitHub commit: nrow was not filling cache correctly
GitHub commit: Fixed typo in Python object upload (header -> col_header)
GitHub commit: Append now does so in place
GitHub commit: Seed was not being set
GitHub commit: Fixed group_by
GitHub commit: Corrected .fromPython
GitHub commit: Corrected Python dict col names
GitHub commit: Fixed null/npe in H2O's fit for sklearn (Windows only)
GitHub commit: get_params now keeps "algo" out of params
GitHub commit: Improved compatibility with sklearn by using "train" as a model build verb and reserving "fit" for sklearn; if "fit" method is attempted, a warning displays
GitHub commit: Fixed accessor in Python model predict

R

GitHub commit: Fixed is.numeric
GitHub commit: Fixed h2o.anyFactor and h2o.impute
GitHub commit: Fixed levels
PUBDEV-1808: h2o.splitFrame was not splitting randomly in R
GitHub commit: Fixed range in R
GitHub commit: PUBDEV-2020: Fixed variable name for case where destination_frame is provided.
PUBDEV-2198: h2o.table ran slower than h2o.groupby by magnitudes
GitHub commit: Fixed location of datafile for for R example code
GitHub commit: Fixed length(column.names)==number_columns check
GitHub commit: Parse types can be specified by column index or column name, but not both
GitHub commit: Added connection (close HTTP header) to improve jetty connection pool behavior
GitHub commit: Added a sensible min on N
GitHub commit: Added Windows binaries to R package repo
GitHub commit: Fixed h2o.weights to show frame as output
GitHub commit: Fixed type conversion for time columns when ingested by as.data.frame()
GitHub commit: Fixed h2o.merge R interface
GitHub commit: head and tail now always return data.frame
GitHub commit: Fixed a bug in GLRM init in R
GitHub commit: Fixed bug in h2o.summary (constant categorical columns)
GitHub commit: Fixed bug in plot.H2OModel
PUBDEV-1974: When imputing columns from R, many temp files were created, which did not occur in Flow

System

PUBDEV-2250: During parsing, SVMLight-formatted files failed with an NPE GitHub commit
PUBDEV-2213: During parsing, alphanumeric data in a column was converted to missing values and the column was assigned a type of int
PUBDEV-1990: Spaces are now permitted in the Flow directory name
PUBDEV-1037: Space in the user name was preventing H2O from starting
GitHub commit: Fixed VecUtils.copyOver() to accept a column type for the resulting copy
GitHub commit: Fixed Vec.preWriting so that it does not use an anonymous inner task which causes the entire Vec header to be passed
GitHub commit: Fixed parse to mark categorical references in ParseWriter as transient (enums must be node-shared during the entire multiple parse task)
GitHub commit: PUBDEV-2182: Fixed DL checkpoint restart with given validation set after R (currents) behavior changed; now the validation set key no longer necessarily matches the file name
GitHub commit: Fixed makeCon memory leak when redistribute=T
GitHub commit: PUBDEV-2174: Fixed sigma calculation for sparse chunks
GitHub commit: Restored pre-existing string manipulation utilities for categorical columns
GitHub commit: Fixed syncRPackages task so it doesn't run during the normal build process
GitHub commit: Fixed intermittent failures caused by different default timezone settings on different machines; sets needed timezone before starting test
GitHub commit: Fixed error message for countmatches
GitHub commit: PUBDEV-1443: Fixed size computation in merge
GitHub commit: Fixed h2o.tabulate() to work in multi-node mode
GitHub commit: Fixed integer overflow in printout of CM to TwoDimTable

Slater (3.2.0.7) - 10/09/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slater/7/index.html

Bug Fixes

GitHub commit: Fix Java 6 compatibility

The Java 7 API call _rawChannel.setOption(StandardSocketOptions.TCP_NODELAY, true); has been replaced by the Java 6 API call _rawChannel.socket().setTcpNoDelay(true);

The Java 7 API call sock.getRemoteAddress()) has been replaced by sock.socket().getRemoteSocketAddress()

Slater (3.2.0.5) - 09/24/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slater/5/index.html

Enhancements

Algorithms

PUBDEV-2133: Enum test/train mapping is faster (GitHub commit)

PUBDEV-2030: Improved POJO support to DRF

Slater (3.2.0.3) - 09/21/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slater/3/index.html

New Features

R

PUBDEV-2078: H2O now returns per-feature reconstruction error for h2o.anomaly() (GitHub commit)

Enhancements

Algorithms

GitHub commit: Added back support for sparse activations in DL; currently changes results as numerical values are de-scaled only, no standardized

Python

GitHub commit: Adjusted import_file in Python to accept the same parameters as import_file in R

R

GitHub commit: H2O now sets CRAN-based R clients to permanent opt-out.
GitHub commit: Modified output of h2o.tabulate in R
GitHub commit: Added default plotting for models in R
GitHub commit: Pre-pended graphics pkg to plot.H2OModel methods

Bug Fixes

Algorithms

PUBDEV-2091: All algos: when offset is the same as the response, all train errors should be zero (GitHub commit)
GitHub commit: Fixed DL POJO for large input columns

R

GitHub commit: Fixed bugs in model plotting in R
GitHub commit: Fixed bugs in R plot.H2OModel for DL
GitHub commit: Fixed bug in plot.H2OModel

System

PUBDEV-1850: Parse not setting NA strings properly (GitHub commit)
GitHub commit: H2O now escapes XML entities
GitHub commit: Fixed Java 6 build -replaced AutoCloseable with Closeable
GitHub commit: Restored code that was needed for detecting NA strings

Slater (3.2.0.1) - 09/12/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-slater/1/index.html

New Features

Algorithms

GitHub: PUBDEV-1888: Added loss function calculation for DL.
GitHub: Set more parameters for GLM to be gridable.
GitHub: [KMeans] Enable grid search with max_iterations parameter.
GitHub: Add kfold column builders
GitHub: Add stratified kfold method

Python

PUBDEV-684: Add nfolds to R/Python
GitHub: Improved group-by functionality
GitHub: Added python example for downloading glm pojo.
GitHub: Added countmatches to Python along with a test.
GitHub: Added support for getting false positive rates and true positive rates for all thresholds from binomial models; makes it easier to calculate custom metrics from ROC data (like weighted ROC)

R

PUBDEV-1788: Added a factor function that will allow the user to set the levels for a enum column GitHub
PUBDEV-1881: Fixed bug in h2o.group_by for enumerator columns
GitHub: Refactor SVD method name and add svd_method option to R package to set preferred calculation method
PUBDEV-2071: Accept columns of type integer64 from R through as.h2o()

Sparkling Water

PUBDEV-282: Support Windows OS in Sparkling Water

System

HEXDEV-120: Switch from NanoHTTPD to Jetty
GitHub: Allow for "most" and "mode" in groupby
GitHub: Added NA check to checking for matches in categorical columns
PUBDEV-1470: Dropped UDP mode in favor of TCP
PUBDEV-1431: /3/DownloadDataset.bin is now a registered handler in JettyHTTPD.java. Allows streaming of large downloads from H2O.GitHub
PUBDEV-1865: Implemented per-row 1D, 2D and 3D DCT transformations for signal/image/volume processing
PUBDEV-1686: LDAP Integration
HEXDEV-381: LDAP Integration
HEXDEV-224: Added https support
GitHub: Added mapr5.0 version to builds
GitHub: Add Vec.Reader which replaces lost caching

Web UI

GitHub: Disallow N-fold CV for GLM when lambda-search is on.
GitHub: Added typeahead for http and https.
PUBDEV-1821: Added Save Model and Load Model

Enhancements

Algorithms

GitHub: Don't allocate input dropout helper if input_dropout_ratio = 0.
PUBDEV-1920: Datasets : Unbalanced sparse for binomial and multinomial
GitHub: Major code cleanup for DL: Remove dead code, deprecate sparse/col_major.
PUBDEV-1942: Use prior class probabilities to break ties when making labels GitHub
GitHub: Update DL perf Rmd file to get the overall CM error.
GitHub: Enable training data shuffling if train_samples_per_iteration==0 and reproducible==true
GitHub: Checkpointing for DL now follows the same convention as for DRF/GBM.
GitHub: No longer do sampling with replacement during training with shuffle_training_data
GitHub: Add printout of sparsity ratio for double chunks.
GitHub: Check memory footprint for Gram matrix in PCA and SVD initialization
GitHub: Print more fill ratio debugging.
GitHub: Fix the RNG for createFrame to be more random (since we are setting the seed for each row).
PUBDEV-2010: Improve reporting of unstable DL models GitHub
PUBDEV-2018: Improve auto-tuning for DL on large clusters / large datasets GitHub
GitHub: Add input parameter to h2o.glrm indicating whether to ignore constant columns
GitHub: Missing enums are imputed using the majority class of the column. For other types of missing categorical, just round the mean to the nearest integer.
GitHub: Skip rows in training frame with missing value(s) if requested
GitHub: Speed up direct SVD by working with transpose directly
GitHub: Fix a bug in initialization of SVD and change l2 norm to sum of squared error in convergence test.
GitHub: Use absolute value for mean weight and bias checks.
GitHub: No longer leak constant chunks during AE scoring/reconstruction.
GitHub: No longer differentiate between DL model instabilitites (weights vs biases).
GitHub: Make method static, where possible.
GitHub: Make GLRM seeding independent of number of chunks.

API

GitHub: Added REST end-points for glrm,svd,pca,naive bayes algorithms.
GitHub: Added unicode to frame getter possibilities
GitHub: Added proper lookup of offset/weights/fold_column
GitHub: Data should be eagered before download_csv.
GitHub: Simplified model builder
GitHub: Added None as default for "on" field
GitHub: Removed all of the unnecessary calls to h2o.init and removed the unnecessary environment variable for version checking during testing
PUBDEV-2064: rename the coordinate decent solvers in the REST API / Flow to (experimental)

Grid Search

GitHub: Added check that x is not null before verifying data in unsupervised grid search algorithm
GitHub: Made naivebayes parameters gridable.
PUBDEV-1933: Called drf as randomForest in algorithm option GitHub
GitHub: Validation of grid parameters against algo /parameters rest endpoint.
PUBDEV-1979: Train N-fold CV models in parallel GitHub
PUBDEV-1978: grid: would be good to add to h2o.grid R help example, how to access the individual grid models

Python

GitHub: Refactored into h2o.system_file so it's parallel to R client.
GitHub: Added h2o_deprecated decorator
GitHub: Use import_file in import_frame
GitHub: Handle a list of columns in python group-by api
GitHub: Use pandas if available for twodimtables and h2oframes
GitHub: Transform the parameters list into a dict with keys being the parameter label
GitHub: Added pop option which does inplace update on a frame (Frame.remove)
GitHub: ncol,dim,shape, and friends are now all properties
PUBDEV-193: Write python version of h2o.init() which knows how to start h2o
PUBDEV-1903: Method to get parameters of model in Python API
GitHub: Allow for single alpha specified not be in a list
GitHub: Updated endpoint for python client download_csv
GitHub: Allow for enum in scale/mean/sd (ignore or give NA)
GitHub: Allow for n_jobs=-1 and n_jobs > 1 for Parallel jobs
GitHub: Added frame_id property to frame
GitHub: Removed remaining splats on dicts
GitHub: Removed need to splat pass thru args
GitHub: Added get_jar flag to download_pojo

R

PUBDEV-1866: Rewrote h2o.ensemble to utilize nfolds/fold_column in h2o base learners
GitHub: Added max_active_predictors.
GitHub: Updated REST call from R for model export
PUBDEV-1853: Removed addToNavbar from RequestServer GitHub
GitHub: Add "Open H2O Flow" message.
GitHub: Replaced additive float op by multiplication
GitHub: Reimplement checksum for Model.Parameters
GitHub: Remove debug prints.
PUBDEV-1857: Removed the need for String[] path_params in RequestServer.register() GitHub
PUBDEV-1856: Removed the writeHTML_impl methods from all the schemas
PUBDEV-1854: Made _doc_method optional in the in Route constructors GitHub
PUBDEV-1858: Changed RequestServer so that only one handler instance is created for each Route
GitHub: Swapped out rjson for jsonlite for better handling of odd characters from dataset.
GitHub: Prettify R's grid output.
PUBDEV-1841: R now respects the TwoDimTable's column types
GitHub: Fixes show method for grid object when hyper_params is empty.
GitHub: h2o.levels returns R vector for single column
GitHub: Uses PredictCsv from genmodel now.
GitHub: Exposed stacktraces in R's summary() call.
GitHub: print type of failed value in $<-
GitHub: allow value to be integer in $<-
GitHub: Check for is_client being NULL since older H2O clusters may not have is_client.

Sparkling Water

GitHub: Copy content of h2o-dist into target directory.

System

GitHub: Rename label fields in prediction object.
GitHub: Uses the original Vec's domain in alignment
GitHub: Added columnName and unknownLevel to PredictUnknownCategoricalLevelException.
PUBDEV-1559: Added compression of 64-bit Reals GitHub
GitHub: Added time information to buildinfo.json.
GitHub: Put build metadata into a json file.
-GitHub: Add time information to buildinfo.json.
GitHub: Delete any prior main CV models of the same key if CV model building is cancelled before the main model started to build.
GitHub: Change loading name parameter to a String to address a Flow issue.
GitHub: Remove extra assertion to avoid NPEs after client call of bulk remove after done() is called but before the finally is done with updateModelOutput.
GitHub: Ensures that date time methods return year/month/day values in the currently set timezone.
GitHub: Frees memory from streamed zip reads after the chunk has been parsed.
GitHub: Unifies categorical strings to UTF-8 and warns the user about all conversion.
GitHub: add isNA checks to scale
GitHub: Do not start UDPRecevier thread (unless running with useUDP option)

Web UI

PUBDEV-1961: Flow: use streamining endpoint /3/DownloadDataset.bin

Bug Fixes

Algorithms

PUBDEV-1785: Deadlock while running GBM
GitHub: Fix name for standardized_coefficient_magnitudes.
PUBDEV-1774: Setting gbm's balance_classes to True produces suspect models
PUBDEV-1849: K-Means: negative sum-of-squares after mean imputation
GitHub: Set the iters counter during kmeans center initialization correctly
GitHub: fixed parenthesis in GLM POJO generation
GitHub: Should be updating model each iteration with the newly fitted kmeans clusters, not the old ones!
PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
PUBDEV-1666: GBM:Math correctness for Gamma with offsets/weights
PUBDEV-451: Trees in GBM change for identical models GitHub
PUBDEV-1924: R^2 stopping criterion isn't working GitHub
PUBDEV-1776: GLM: cross-validation bug GitHub
PUBDEV-1682: GLM : Lending club dataset => build GLM model => 100% complete => click on model => null pointer exception GitHub
PUBDEV-1987: error returned on prediction for xval model
PUBDEV-1928: Properly implement Maxout/MaxoutWithDropout GitHub
GitHub: print actual number of columns (was just #cols) in DRF init
PUBDEV-2026: Fix setting the proper job state in DL models GitHub
PUBDEV-1950: Splitframe with rapids is not blocking
PUBDEV-1995: nfold: when user cancels an nfold job, fold data still remains in the cluster memory
PUBDEV-1994: nfold: cancel results in a java.lang.AssertionError
PUBDEV-1910: Canceled GBM with CV keeps lock
GitHub: Fix DL checkpoint restart with new data.

API

PUBDEV-1955: Change Schema behavior to accept a single number in place of array GitHub
PUBDEV-1914: Iced deserialization fails for Enum Arrays

Grid

PUBDEV-1876: Grid: progress bar not working for grid jobs
PUBDEV-1875: Grid: the meta info should not be dumped on the R screen, once the grid job is over
GitHub: [PUBDEV-1876] Fix grid update.
PUBDEV-1874: Grid search: observe issues with model naming/overwriting and error msg propagation GitHub
HEXDEV-402: R: kmeans grid search doesn't work
PUBDEV-1901: Grid appends new models even though models already exist.
PUBDEV-1874: Grid search: observe issues with model naming/overwriting and error msg propagation
PUBDEV-1940: Grid: glm grid on alpha fails with error "Expected '[' while reading a double[], but found 1.0"
PUBDEV-1877: Grid: if user specify the parameter value he is running the grid on, would be good to warn him/her
PUBDEV-1938: Grid: randomForest: unsupported grid params and wrong error msg

Hadoop

PUBDEV-2036: importModel from hdfs doesn't work
PUBDEV-2027: Clicking shutdown in the Flow UI dropdown does not exit the Hadoop cluster

Python

PUBDEV-1789: Python client h2o.remove_vecs (ExprNode) makes bad ast
PUBDEV-1795: Unable to read H2OFrame from Python
PUBDEV-1764: Python importFile does not import all files in directory, only one file GitHub
GitHub: parameter name is "dir" not "path"
PUBDEV-1693: Python: Options for handling NAs in group_by is broken
PUBDEV-1415: Intermittent Unimplemented rapids exception: pyunit_var.py . Also prior test got unimplemented too, but test didn't fail (client wasn't notified)
PUBDEV-1119: Python: Need to be able to access resource genmodel.jar
GitHub: Fix download of pojo in Python.

R

GitHub: Fixed bug in h2o.ensemble .make_Z function
PUBDEV-1796: R: h2o.importFile doesn't allow user to choose column type during parse
PUBDEV-1768: R: Fails to return summary on subsetted frame GitHub
PUBDEV-1909: R: Adding column to frame changes string enums in column to numerics
PUBDEV-1936: R: h2o.levels return only the first factor of factor levels
PUBDEV-1869: R: sd function should convert enum column into numeric and calculate standard deviation GitHub
PUBDEV-1246: R: h2o.hist needs to run pretty function for pretty breakpoints to get same results as R's hist GitHub
PUBDEV-1868: R: h2o.performance returns error (not warning) when model is reloaded into H2O
PUBDEV-1723: h2o R : subsetting data :h2o removing wrong columns, when asked to delete more than 1 columns
GitHub: fix h2o.levels issue
PUBDEV-1972: R: setting weights_column = NULL causes unwanted variables to be used as predictors

Sparkling Water

PUBDEV-1173: create conversion tasks from primitive RDD
GitHub: Fix return value issue in distribution script.

System

HEXDEV-360: getFrame fails on Parsed Data
PUBDEV-366: Fix parsing for high-cardinality categorical features GitHub
PUBDEV-1143: Parse: Cancel parse unreliable; does not work at all times
PUBDEV-1872: Ability to ignore files during parse GitHub
PUBDEV-777: Parse : Parsing compressed files takes too long
PUBDEV-1916: Parse: 2 node cluster takes 49min vs 40sec on a 1 node cluster GitHub
PUBDEV-1431: Convert /3/DownloadDataset to streaming
PUBDEV-1995: nfold: when user cancels an nfold job, fold data still remains in the cluster memory
PUBDEV-1994: nfold: cancel results in a java.lang.AssertionError
PUBDEV-1910: Canceled GBM with CV keeps lock GitHub
PUBDEV-1992: CreateFrame isn't totally random
GitHub: Fixes a bug that allowed big buffers to be constantly reallocated when it wasn't needed. This saves memory and time.
GitHub: Fix print statement.
GitHub: Fixed orderly shutdown to work with flatfile.
PUBDEV-1998: Parse : Lending club dataset parse => cancelled by user
PUBDEV-2028: Shutdown => unimplemented error on curl -X POST 172.16.2.186:54321/3/Shutdown.html
PUBDEV-2070: Download frame brings down cluster
PUBDEV-2067: Cannot mix negative and positive array selection
PUBDEV-2024: Save model to HDFS fails

Web UI

PUBDEV-2012: Histograms in Flow are slightly off
PUBDEV-2029: exportModel from Flow to HDFS doesn't work

Simons (3.0.1.7) - 8/11/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-simons/7/index.html

New Features

The following changes represent features that have been added since the previous release:

Python

PUBDEV-684: Add nfolds to R/Python

Web UI

HEXDEV-390: Print Flow to PDF / Printer

Enhancements

The following changes are improvements to existing features (which includes changed default values):

Algorithms

GitHub: add seed to the model building that uses balance_classes, for determinism/repeatability
GitHub: Reduce the frequency at which tiny tree models are printed to stdout: Only print during the first 4 seconds if score_each_iteration is enabled.
GitHub: Only call the limited printout for TwoDimTables during Model.toString () that prints all TwoDimTables of the model._output.
GitHub: Only print up to 10 rows of TwoDimTables in ASCII logs (first/last 5).
GitHub: Remove some overflow/underflow checks: Let exp(x) be small and log(x) be large.
GitHub: Add nbins_top_level parameter to DRF/GBM. Not yet in R.
GitHub: Disallow N-fold CV for GLM when lambda-search is on.

API

GitHub: Cleanup of public API of Schema.java. Improve its JavaDoc a lot.

Python

PUBDEV-1765: Improve python online documentation
PUBDEV-1497: Python : Weights R tests to be ported from R for GLM/GBM/RF/DL
GitHub: adjust to split frame jobs result
GitHub: allow for update thingy to be a tuple (so rows and columns)
GitHub: when starting h2o jvm with h2o.init(), give h2o child process different id than parent, so it doesn't get killed on Ctrl-C
GitHub: add option to turn off progress bar print out
GitHub: add unicode to frame getter possibilities
GitHub: remove remaining splats on dicts
GitHub: no need to splat pass thru args
GitHub: proper lookup of offset/weights/fold_column
GitHub: data should be eagered before download_csv.
GitHub: simplify model builder
GitHub: use None as default for "on" field
GitHub: add get_jar flag to download_pojo
GitHub:remove all of the unnecessary calls to h2o.init and remove the unnecessary environment variable for version checking during testing

R

PUBDEV-1744: Improve help message of h2o.init function
GitHub: add valid expression to list of accepted R CMD check outputs.
GitHub: added h2o.anomaly demo to r package

System

GitHub: Add -JJ command line argument to allow extra JVM arguments to be passed.
GitHub: Refactored CSVStream to be more understandable. Fix empty chunk bug.
GitHub: Add hintFlushRemoteChunk to CSVStream.
GitHub: Add parameterized route for frame export
GitHub: allow string vecs to be toEnum'd (with a sensible cap)
GitHub: allow lists of numbers in reducer ops
GitHub: Add warning message during POJO export if offset_column is specified (is not supported)
PUBDEV-1853: cleanup: remove addToNavbar from RequestServer GitHub
GitHub: Add "Open H2O Flow" message.
GitHub: Code refactoring to allow GBM JUnits to work with H2OApp in multi-node mode.
GitHub: Replace additive float op by multiplication
GitHub: Reimplement checksum for Model.Parameters
GitHub: Remove debug prints.
PUBDEV-1857: cleanup: remove the need for String[] path_params in RequestServer.register() GitHub
PUBDEV-1856: cleanup: remove the writeHTML_impl methods from all the schemas
PUBDEV-1854: cleanup: make _doc_method optional in the in Route constructors GitHub
PUBDEV-1858: cleanup: change RequestServer so that only one handler instance is created for each Route

Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms

PUBDEV-1674: gbm w gamma: does not seems to split at all; all trees node pred=0 for attached data GitHub
PUBDEV-1760: GBM : Deviance testing for exp family
PUBDEV-1714: gbm gamma: R vs h2o same split variable, slightly different leaf predictions
PUBDEV-1755: DL : Math correctness for Tweedie with Offsets/Weights
PUBDEV-1758: DL : Deviance testing for exp family
PUBDEV-1756: DL : Math correctness for Poisson with Offsets/Weights
PUBDEV-1651: null/residual deviances don't match for various weights cases
PUBDEV-1757: DL : Math correctness for Gamma with Offsets/Weights
PUBDEV-1680: gbm gamma: seeing train set mse incs after sometime
PUBDEV-1724: gbm w tweedie: weird validation error behavior
PUBDEV-1774: setting gbm's balance_classes to True produces suspect models
PUBDEV-1849: K-Means: negative sum-of-squares after mean imputation
GitHub: Set the iters counter during kmeans center initialization correctly
GitHub: fixed parenthesis in GLM POJO generation
GitHub: Should be updating model each iteration with the newly fitted kmeans clusters, not the old ones!
PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
PUBDEV-1666: GBM:Math correctness for Gamma with offsets/weights

Python

PUBDEV-1779: Fixes intermittent failure seen when Model Metrics were looked at too quickly after a cross validation run.
PUBDEV-1409: h2o python h2o.locate() should stop and return "Not found" rather than passing path=None to h2o? causes confusion h2o message GitHub
PUBDEV-1630: GBM getting intermittent assertion error on iris scoring in pyunit_weights_api.py
PUBDEV-1770: sigterm caught by python is killing h2o GitHub
PUBDEV-1409: h2o python h2o.locate() should stop and return "Not found" rather than passing path=None to h2o? causes confusion h2o message
HEXDEV-397: Python fold_column option requires fold column to be in the training data
HEXDEV-394: Python client occasionally throws attached error
GitHub: add missing args to kmeans
GitHub: add missing kmeans params in
GitHub: add missing checkpoint param
PUBDEV-1785: Deadlock while running GBM

R

PUBDEV-1830: h2o.glm throws an error when fold_column and validation_frame are both specified
PUBDEV-1660: h2oR: when try to get a slice from pca eigenvectors get some formatting error GitHub
GitHub: fix broken %in% in R
PUBDEV-1831: Cross-validation metrics are not displayed in R (and Python?)
PUBDEV-1840: Autoencoder model doesn't display properly in R (training metrics) GitHub

System

PUBDEV-1790: can't convert iris species column to a character column.
PUBDEV-1520: Kmeans pojo naming inconsistency
GitHub: fix parse of range ast
GitHub: Sets POJO file name to match the class name. Prior behavior would allow them to be different and give a compile error.

Web UI

PUBDEV-1754: Export frame not working in flow : H2OKeyNotFoundArgumentException

Simons (3.0.1.4) - 7/29/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-simons/4/index.html

New Features

Algorithms

HEXDEV-220: Tweedie distribution for DL
HEXDEV-219: Poisson distribution for DL
HEXDEV-221: Gamma distribution for DL
PUBDEV-683: Enable nfolds for all algos (where reasonable) GitHub
PUBDEV-1791: Add toString() for all models (especially model metrics) GitHub
GitHub: Enabling model checkpointing for DRF
GitHub: Enable checkpointing for GBM.
PUBDEV-1698: fold assignment in N-fold cross-validation

Python

PUBDEV-386: Expose ParseSetup to user in Python
PUBDEV-1239: Python: getFrame and getModel missing
HEXDEV-334: support rbind in python
PUBDEV-1215: python to have exportFile calll
GitHub: add cross-validation parameter to metric accessors and respective pyunit
PUBDEV-1729: Cross-validation metrics should be shown in R and Python for all models

R

PUBDEV-385: Expose ParseSetup to user in R
GitHub: add mean residual deviance accessor to R interface
GitHub: incorporate cross-validation metric access into the R client metric accessors
GitHub: R interface for checkpointing in RF enabled

System

PUBDEV-1735: Add 24-MAR-14 06.10.48.000000000 PM style date to autodetected

Enhancements

#####API

PUBDEV-1451: design for cross-validation APIs GitHub

Algorithms

GitHub: Add proper deviance computation for DL regression.
GitHub: Print GLM model details to the logs.
GitHub: Disallow categorical response for GLM with non-binomial family.
GitHub: Disallow models with more than 1000 classes, can lead to too large values in DKV due to memory usage of 8*N^2 bytes (the Metrics objects which are in the model output)
GitHub: DL: Don't train too long in single node mode with auto-tuning.
GitHub: Use mean residual deviance to do early stopping in DL.
GitHub: Add a "AUTO" setting for fold_assignment (which is Random). This allows the code to reject non-default user-given values if n-fold CV is not enabled.

Python

HEXDEV-317: Python has to play nicely in a polyglot, long-running environment
GitHub: simplify ast in python frame slicer
GitHub: add cross validation metrics and mean residual deviance to model show()
GitHub: any to take a frame, simplify python's __contains__

R

GitHub: On detaching h2o R package, only shut down H2O instance if it was started by the R client
GitHub: update h2o load

System

GitHub: Print a handy message (Open H2O Flow in your web browser) when the cluster comes up like Sparkling Water does.
GitHub: Replace memory leaky RCurl getURL with curlPerform.
GitHub: Add -disable_web parameter.
GitHub: allow numerics in match
GitHub: More refactoring of h2o start. Includes:
- H2OStarter - a generic class to start H2O. It does all dynamic registration
- H2OTestStarter - a generic class to start h2o-core tests
GitHub: Use typed key when it is necessary. Key.make() now returns typed Key. The trick is that type T can be derived by left side of assignment. If it is not possible to derive type of the Key, then developer has to use typed syntax: Key.<Frame>make("myframe.hex") The change simplifies Scala code which will be able to derive type key.
PUBDEV-1793: Add Job state and start/end time to the model's output GitHub
GitHub: add more places to look when trying to start jar from python's h2o.init
GitHub: Cosmetic name changes
GitHub: Fetch local node differently from remote node.
GitHub: Don't clamp node_idx at 0 anymore.
GitHub: Added -log_dir option.

Bug Fixes

API

PUBDEV-776: Schema.parse() needs to be better behaved (like, not crash)

Algorithms

PUBDEV-1725: pca:glrm - give bad results for attached data (bec of plus plus initialization)
GitHub: Fix deviance calculation, use the sanitized parameters from the model info, where Auto parameter values have been replaced with actual values
GitHub: Fix offset in DL for exponential family (that doesn't do standardization)
GitHub: Fix a bug where initial Y was set to all zeroes by kmeans++ when scaling was disabled
PUBDEV-1668: GBM: Math correctness for weights
PUBDEV-1783: dl: deviance off for large dataset GitHub
PUBDEV-1667: GBM: Math correctness for Offsets
PUBDEV-1778: drf: reporting incorrect mse on validation set GitHub
GitHub: Fix DRF scoring with 0 trees.

Python

PUBDEV-1260: Python: Requires asnumeric() function
GitHub: python interface: add folds_column to x, if it doesn't already exist in x
PUBDEV-1763: Python : Math correctness tests for Tweedie/Gamma/Possion with offsets/weights
PUBDEV-1762: Python : Deviance tests for all algos in python GitHub
PUBDEV-1671: intermittent: pyunit_weights_api.py, hex.tree.SharedTree$ScoreBuildOneTree@645acd60java.lang.AssertionError at hex.tree.DRealHistogram.scoreMSE(DRealHistogram.java:118), iris dataset GitHub

R

PUBDEV-1257: R: no is.numeric method for H2O objects
PUBDEV-1622: NPE in water.api.RequestServer, water.util.RString.replace(RString.java:132)...got flagged as WARN in log...I would think we should have all NPE's be ERROR / fatal? or ?? GitHub
PUBDEV-1655: h2o.strsplit needs isNA check
PUBDEV-1084: h2o.setTimezone NPE
PUBDEV-1738: R: cloud name creation can't handle user names with spaces

System

PUBDEV-1410: apply causes assert errors mentioning deadlock in runit_small_client_mode ...build never completes after hours ..deadlock?
PUBDEV-1195: docker build fails
HEXDEV-362: Bug in /parsesetup data preview GitHub
PUBDEV-1766: H2O xval: when delete all models: get Error evaluating future[6] :Error calling DELETE /3/Models/gbm_cv_13
PUBDEV-1767: H2O: when list frames after removing most frames, get: roll ups not possible vec deleted error GitHub

Web UI

PUBDEV-1782: Flow: View Data fails when there is a UUID column (and maybe also a String column)
PUBDEV-1769: xval: cancel job does not work GitHub

Simons (3.0.1.3) - 7/24/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-simons/3/index.html

New Features

Python

PUBDEV-1734: Add save and load model to python api
PUBDEV-1314: Python needs "str" operator, like R's
GitHub: turn on H2OFrame __repr__

Enhancements

API

GitHub: Increase sleep from 2 to 3 because h2o itself does a sleep 2 on the REST API before triggering the shutdown.

System

PUBDEV-1730: Make export file a job GitHub

Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms

PUBDEV-1743: gbm poisson w weights: deviance off
PUBDEV-1736: gbm poisson with offset: seems to be giving wrong leaf predictions

Python

PUBDEV-1731: Python get_frame() results in deleting a frame created by Flow
HEXDEV-389: Split frame from python
HEXDEV-388: python client H2OFrame constructor puts the header into the data (as the first row)

R

PUBDEV-1504: Runit intermittent fails : runit_pub_180_ddply.R
PUBDEV-1678: Client mode jobs fail on runit_hex_1750_strongRules_mem.R

System

GitHub: Model parameters should be always public.

Simons (3.0.1.1) - 7/20/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-simons/1/index.html

New Features

Algorithms

HEXDEV-213: Tweedie distributions for GBM GitHub
HEXDEV-212: Poisson distributions for GBM GitHub
PUBDEV-1115: properly test PCA and mark it non-experimental

Python

PUBDEV-1437: Python needs "nlevels" operator like R
PUBDEV-1434: Python needs "levels" operator, like R
PUBDEV-1355: Python needs h2o.trim, like in R
PUBDEV-1354: Python needs h2o.toupper, like in R
PUBDEV-1352: Python needs h2o.tolower, like in R
PUBDEV-1350: Python needs h2o.strsplit, like in R
PUBDEV-1347: Python needs h2o.shutdown, like in R
PUBDEV-1343: Python needs h2o.rep_len, like in R
PUBDEV-1340: Python needs h2o.nlevels, like in R
PUBDEV-1338: Python needs h2o.ls, like in R
PUBDEV-1344: Python needs h2o.saveModel, like in R
PUBDEV-1337: Python needs h2o.loadModel, like in R
PUBDEV-1335: Python needs h2o.interaction, like in R
PUBDEV-1334: Python needs h2o.hist, like in R
PUBDEV-1351: Python needs h2o.sub, like in R
PUBDEV-1333: Python needs h2o.gsub, like in R
PUBDEV-1336: Python needs h2o.listTimezones, like in R
PUBDEV-1346: Python needs h2o.setTimezone, like in R
PUBDEV-1332: Python needs h2o.getTimezone, like in R
PUBDEV-1329: Python needs h2o.downloadCSV, like in R
PUBDEV-1328: Python needs h2o.downloadAllLogs, like in R
PUBDEV-1327: Python needs h2o.createFrame, like in R
PUBDEV-1326: Python needs h2o.clusterStatus, like in R
PUBDEV-1323: Python needs svd algo
PUBDEV-1322: Python needs prcomp algo
PUBDEV-1321: Python needs naiveBayes algo
PUBDEV-1320: Python needs model num_iterations accessor for clustering models, like R's
PUBDEV-1318: Python needs screeplot and plot methods, like R's. (should probably check for matplotlib)
PUBDEV-1317: Python needs multinomial model hit_ratio_table accessor, like R's
PUBDEV-1316: Python needs model scoreHistory accessor, like R's
PUBDEV-1315: R needs weights and biases accessors for deeplearning models
PUBDEV-1313: Python needs "as.Date" operator, like R's
PUBDEV-1312: Python needs "rbind" operator, like R's
PUBDEV-1345: Python needs h2o.setLevel and h2o.setLevels, like in R
PUBDEV-1311: Python needs "setLevel" operator, like R's
PUBDEV-1306: Python needs "anyFactor" operator, like R's
PUBDEV-1305: Python needs "table" operator, like R's
PUBDEV-1301: Python needs "as.numeric" operator, like R's
PUBDEV-1300: Python needs "as.character" operator, like R's
PUBDEV-1293: Python needs "signif" operator, like R's
PUBDEV-1292: Python needs "round" operator, like R's
PUBDEV-1291: Python need transpose operator, like R's t operator
PUBDEV-1289: Python needs element-wise division and multiplication operators, like %/% and %-%in R
PUBDEV-1330: Python needs h2o.exportHDFS, like in R
PUBDEV-1357: Python and R need which operator GitHub
PUBDEV-1356: Python and R needs isnumeric and ischaracter operators
PUBDEV-1342: Python needs h2o.removeVecs, like in R
PUBDEV-1324: Python needs h2o.assign, like in R GitHub
PUBDEV-1296: Python and R h2o clients need "any" operator, like R's
PUBDEV-1295: Python and R h2o clients need "prod" operator, like R's
PUBDEV-1294: Python and R h2o clients need "range" operator, like R's
PUBDEV-1290: Python and R h2o clients need "cummax", "cummin", "cumprod", and "cumsum" operators, like R's
PUBDEV-1325: Python needs h2o.clearLog, like in R
PUBDEV-1349: Python needs h2o.startLogging and h2o.stopLogging, like in R
PUBDEV-1341: Python needs h2o.openLog, like in R
PUBDEV-1348: Python needs h2o.startGLMJob, like in R
PUBDEV-1331: Python needs h2o.getFutureModel, like in R
PUBDEV-1302: Python needs "match" operator, like R's
PUBDEV-1298: Python needs "%in%" operator, like R's
PUBDEV-1310: Python needs "scale" operator, like R's
PUBDEV-1297: Python needs "all" operator, like R's
GitHub: add start_glm_job() and get_future_model() to python client. add H2OModelFuture class. add respective pyunit

R

PUBDEV-1273: Add h2oEnsemble R package to h2o-3
PUBDEV-1319: R needs centroid_stats accessor like Python, for clustering models

Rapids

PUBDEV-1635: the equivalent of R's "any" should probably implemented in rapids
PUBDEV-1634: the equivalent of R's cummin, cummax, cumprod, cumsum should probably implemented in rapids
PUBDEV-1633: the equivalent of R's "range" should probably implemented in rapids
PUBDEV-1632: the equivalent of R's "prod" should probably implemented in rapids
PUBDEV-1699: the equivalent of R's "unique" should probably implemented in rapids GitHub

System

GitHub: changed to new AMI
PUBDEV-679: Create cross-validation holdout sets using the per-row weights
GitHub: Add user_name. Add ExtensionHandler1.
GitHub: Added auth options to h2o.init().
GitHub: Added H2O.calcNextUniqueModelId().
GitHub: Add ldap arg.

Web UI

HEXDEV-231: Flow: Ability to change column type post-Parse

Enhancements

Algorithms

GitHub: use fixed seed to avoid bad splits with some seeds
GitHub: Change seed to avoid type flip from integer to double after row slicing, which leads to different split decisions
GitHub: Add option during kmeans scoring to return matrix of indicator columns for cluster assignment, which is necessary for initializing GLRM
GitHub: Output number of processed observations in PCA
GitHub: Add validation into PCA with GramSVD
GitHub: Code cleanup of distributions. Also rename _n_folds -> _nfolds for consistency
GitHub: Remove restriction to data frames with more than 1 column
GitHub: Add debugging output for DL auto-tuning.
PUBDEV-556: implement algo-agnostic cross-validation mechanism via a column of weights
GitHub: When initializing with kmeans++ set X to matrix of indicator columns corresponding to cluster assignments, unless closed form solution exists
GitHub: Always print DL auto-tuning info for now.
PUBDEV-1657: pca: would be good to remove the redundant std dev from flow pca model object

API

GitHub: Set Content-Type: application/x-www-form-urlencoded for regular POST requests.
HEXDEV-272: Move response_column parameter above ignored_columns parameter GitHub
- All of the fields of a schema are now stored in the leaf child of the class hierarchy. Changed the implementation of fields() to simply return the fields variable of a schema. The function calls H2O.fail() if it attempts to access a field from a non-leaf child. response_column is now moved above ignored_columns for every applicable schema. 'own_fields' is also now renamed to 'fields'
GitHub: Don't use features from servlet api 3.0 or later anymore. Instead save the response status in a thread local variable and fish it out when needed.

Python

GitHub: don't use the header of the timezone table for a choice
GitHub: never delete models. ever.
GitHub: add na_rm argument
GitHub: add prod to python interface

System

GitHub: use Key instead of Vec in refcnter
GitHub: protect vecs in apply
GitHub: Allows for more than one column to remain unnamed. The new naming will fill in the blanks.
GitHub: Refactoring of hadoop mapper and driver.
GitHub: Remove -hdfs option.
GitHub: Adds more checks for a parse cancel at more stages during the post ingestion file parse.
GitHub: Refactor method name for clarification.
GitHub: Cleans up and comments the freeing of chunks from a parsed file.
GitHub: Since more startup logic is getting added, simplify H2OClientApp as much as possible. Remove H2OClient entirely.
GitHub: Add dedicated AddCommonResponseHeadersHandler handler to set common response headers up-front.
GitHub: More refactoring of startup. Pushed a bunch of code from H2OApp into H2O. Added H2O.configureLogging().
GitHub: Make Progress extend Keyed.
GitHub: Make createServer() protected.
GitHub: model_id should probably be a Key, not Key.
GitHub: Change Jetty version from 9 to 8 to get Java 6 compatibility back.

Web UI

PUBDEV-1521: show REST API and overall UI response times for each cell in Flow
HEXDEV-304: Flow: Emphasize run time in job-progress output
PUBDEV-1522: show wall-clock start and run times in the Flow outline
PUBDEV-1707: Hook up "Export" button for datasets (frames) in Flow.

Bug Fixes

Algorithms

PUBDEV-1641: gbm w poisson: get java.lang.AssertionError' at hex.tree.gbm.GBM$GBMDriver.buildNextKTrees on attached data
PUBDEV-1672: kmeans: get AIOOB with user specified centroids GitHub
- Throw an error if the number of rows in the user-specified initial centers is not equal to k.
PUBDEV-1654: pca: gram-svd std dev differs for v2 vs v3 for attached data
GitHub: Fix DL
GitHub: Fix a bug in PCA utilities for k = 1
PUBDEV-1700: nfolds: flow-when set nfold =1 job hangs for ever; in terminal get java.lang.AssertionError
PUBDEV-1706: GBM/DRF: is balance_classes=TRUE and nfolds>1 valid? GitHub
PUBDEV-806: GLM => runit_demo_glm_uuid.R : water.exceptions.H2OIllegalArgumentException
PUBDEV-1696: Client (model-build) is blocked when passing illegal nfolds value. GitHub
PUBDEV-1690: Cross Validation: if nfolds > number of observations, should it default to leave-one-out cross-validation?
PUBDEV-1537: pca: on airlines get java.lang.AssertionError at hex.svd.SVD$SVDDriver.compute2(SVD.java:219) GitHub
PUBDEV-1603: pca: glrm giving very different std dev than R and h2o's other methods for attached data
GitHub: Fix a potential race condition in tree validation scoring.
GitHub: Fix GLM parameter schema. Clean up hasOffset() and hasWeights()

Python

PUBDEV-1627: column name missing (python client)
PUBDEV-1629: python client's tail() header incorrect GitHub
PUBDEV-1413: intermittent assertion errors in pyunit_citi_bike_small.py/pyunit_citi_bike_large.py. Client apparently not notified
PUBDEV-1590: "Trying to unlock null" assertion during pyunit_citi_bike_large.py
PUBDEV-1400: match operator should take numerics

R

PUBDEV-1663: R CMD Check failures GitHub
PUBDEV-1695: R CMD Check failing on running examples GitHub
PUBDEV-1721: R: group_by causes h2o to hang on multinode cluster
PUBDEV-1501: Python and R h2o clients need "unique" operator, like R's GitHub - R GitHub - Python
PUBDEV-1711: is.numeric in R interface faulty GitHub
PUBDEV-1719: Intermittent: runit_deeplearning_autoencoder_large.R : gets wrong answer?
PUBDEV-1688: 2 nfolds tests fail intermittently: runit_RF_iris_nfolds.R and runit_GBM_cv_nfolds.R GitHub
PUBDEV-1718: Intermittent: runit_deeplearning_anomaly_large.R : training slows down to 0 samples/ sec GitHub

Rapids

PUBDEV-1713: Rapids ASTAll faulty GitHub

Sparkling Water

PUBDEV-1562: Migration to Spark 1.4

System

PUBDEV-1551: Parser: Multifile Parse fails with 0-byte files in directory GitHub
HEXDEV-325: Empty reply when parsing dataset with mismatching header and data column length
PUBDEV-1509: Split frame : Big datasets : On 186K rows 3200 Cols split frame took 40 mins => which is too long
PUBDEV-1438: Column naming can create duplicate column names
PUBDEV-1105: NPE in Rollupstats after failed parse
PUBDEV-1142: H2O parse: When cancel a parse job, key remains locked and hence unable to delete the file GitHub
GitHub: client mode deadlock issue resolution
PUBDEV-1670: Client mode fails consistently sometimes : GBM_offset_tweedie.R.out.txt :
GitHub: nbhm bug: K == TOMBSTONE not key == TOMBSTONE
GitHub: Pulls out a GAID from resource in jar if the GAID doesn't equal the default. Presumably the GAID has been changed by the jar baking program.

Web UI

PUBDEV-872: Flows : Not able to load saved flows from hdfs/local GitHub
PUBDEV-554: Flow:Parse two different files simultaneously, flow should either complain or fill the additional (incompatible) rows with nas
PUBDEV-1527: missing .java extension when downloading pojo GitHub
PUBDEV-1642: Changing columns type takes column list back to first page of columns
PUBDEV-1508: Flow : Import file => Parse => Error compiling coffee-script Maximum call stack size exceeded
PUBDEV-1606: Flow :=> Cannot save flow on hdfs
PUBDEV-1527: missing .java extension when downloading pojo
PUBDEV-1653: Flow: the column names do not modify when user changes the dataset in model builder

Shannon (3.0.0.26) - 7/4/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/26/index.html

New Features

Algorithms

PUBDEV-1592: Expose standardization shift/mult values in the Model output in R/Python. GitHub

Python

GitHub: add h2o.shutdown to python client
GitHub: add h2o.hist and respective pyunit
GitHub: gbm weight pyunit (variable importances)

R

HEXDEV-375: Github home for R demos

Web UI

PUBDEV-203: Change data type in flow
PUBDEV-1277: Flow needs as.factor and as.numeric after parse

Enhancements

Algorithms

PUBDEV-1494: GBM : Weights math correctness tests in R
PUBDEV-1523: GLM w tweedie: for attached data, R giving much better res dev than h2o
PUBDEV-1396: Offsets/Weights: Math correctness for GLM
PUBDEV-1496: RF : Weights Math correctness tests in R
HEXDEV-366: remove weights option from DRF and GBM in REST API, Python, R
PUBDEV-1553: Threshold in GLM is hardcoded to 0
GitHub: Make min_rows a double instead of int: Is now weighted number of observations (min_obs in R).
GitHub: Don't use sample weighted variance, but full weighted variance.
GitHub: Fix R^2 computation.
GitHub: Skip rows with missing response in weighted mean computation.
_binomial_double_trees disabled by default for DRF (was enabled).
GitHub: Relax tolerance.
HEXDEV-329 : Offset for GBM
HEXDEV-211 : Tweedie distributions for GLM

API

PUBDEV-1491: generated REST API POJOS should be compiled and jar'd up as part of the build
GitHub: Change schema for PCA, SVD, and GLRM to version 99

Python

GitHub: is factor returns TRUE/FALSE cast to scalar 1/0
GitHub: take a slightly different syntactic approach to dropping column
GitHub: better list comp in interaction call
GitHub: if weights_column argument is specified, attach the column to the training and/or validation frame (if not already specified as part of x/validation_x). if weights_column is not already part of x/validation_x, then a training_frame/validation_frame needs to be provided and the weights column is taken from here. respective pyunit added

R

GitHub: better ref handling in the [<- for python and R
GitHub: Pass binomial_double_trees in the R wrapper for DRF.
GitHub: carefully format NAs and non NAs
GitHub: for loop over the x[[j]] to format NAs properly
GitHub: Added example to h2o-r/ensemble/create_h2o_wrappers.R

System

GitHub: allow for no y in model_builder
GitHub: Enable auto-flag for Java6 generation.
GitHub: better compression in split frame
PUBDEV-1594: All basic file accessors in PersistHDFS should check file permissions
PUBDEV-1518: getFrames should show a Parse button for raw frames

Web UI

PUBDEV-1545: Flow => Build model => ignored columns table => should have column width resizing based on column names width => looks odd if column names are short
PUBDEV-1546: Flow: Build model => Search for 1 column => select it => build model shows list of columns instead of 1 column
PUBDEV-1254: Flow: Add Impute

Bug Fixes

Algorithms

PUBDEV-1554: dl with offset: when offset same as response, do not get 0 mse
PUBDEV-1555: h2oR: dl with offset giving : Error in args$x_ignore : object of type 'closure' is not subsettable
PUBDEV-1487: gbm weights: give different terminal node predictions than R for attached data
PUBDEV-1569: Investigate effectiveness of _binomial_double_trees (DRF) GitHub
PUBDEV-1574: Actually pass 'binomial_double_trees' argument given to R wrapper to DRF.
PUBDEV-1444: DL: h2o.saveModel cannot save metrics when a deeplearning model has a validation_frame
PUBDEV-1579: GBM test time predictions without weights seem off when training with weights GitHub
PUBDEV-1533: GLM: doubled weights should produce the same result as doubling the observations GitHub
PUBDEV-1531: GLM: it appears that observations with 0 weights are not ignored, as they should be.
GitHub: Fix a bug in PCA scoring that was handling categorical NAs inconsistently
PUBDEV-1581: Regression 3060 fails on GLRM in R tests
PUBDEV-1586: change Grid endpoints and schemas to v99 since they are still in flux
PUBDEV-1589: GLM : build model => airlinesbillion dataset => IRLSM/LBFGS => fails with array index out of bound exception
PUBDEV-1607: gbm w offset: predict seems to be wrong
PUBDEV-1600: Frame name creation fails when file name contains csv or zip (not as extension)
PUBDEV-1577: DL predictions on test set require weights if trained with weights
PUBDEV-1598: Flow: After running pca when call get Model/ jobs get: Failed to find schema for version: 3 and type: PCA
PUBDEV-1576: Test variable importances for weights for GBM/DRF/DL
PUBDEV-1517: With R, deep learning autoencoder using all columns in frame, not just those specified in x parameter
PUBDEV-1593: dl var importance:there is a .missing(NA) variable in Dl variable importnce even when data has no nas

Python

PUBDEV-1538: h2o.save_model fails on windoz due to path nonsense
GitHub: python leaked key check for Vecs, Chunks, and Frames
PUBDEV-1609: frame dimension mismatch between upload/import method

R

PUBDEV-1601: h2o.loadModel() from hdfs
PUBDEV-1611: R CMD Check failing on : The Date field is over a month old.

System

PUBDEV-1514: Large number of columns (~30000) on importFile (flow) is slow / unresponsive for long time
PUBDEV-841: Split frame : Flow should not show raw frames for SplitFrame dialog (water.exceptions.H2OIllegalArgumentException)
PUBDEV-1459: bug in GLM POJO: seems threshold for binary predictions is always 0
PUBDEV-1566: Cannot save model on windows since Key contains '@' (illegal character to path)
GitHub: Fixes the timezone lists.
GitHub: R CMD check fix for date
GitHub: add ec2 back into project

Web UI

HEXDEV-54: Flow : Import file 100k.svm => Something went wrong while displaying page

Shannon (3.0.0.25) - 6/25/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/25/index.html

Enhancements

API

PUBDEV-1452: branch 3.0.0.2 to REGRESSION_REST_API_3 and cherry-pick the /99/Rapids changes to it

##Web UI

PUBDEV-1545: Flow => Build model => ignored columns table => should have column width resizing based on column names width => looks odd if column names are short
PUBDEV-1546: Flow : Build model => Search for 1 column => select it => build model shows list of columns instead of 1 column

Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms

PUBDEV-1487: gbm weights: give different terminal node predictions than R for attached data
GitHub: Fix offset for DL.
GitHub: Gracefully handle 0 weight for GBM.

Python

PUBDEV-1547: Weights API: weights column not found in python client

R

GitHub: Fix R wrapper for DL for weights/offset.

Web UI

PUBDEV-1528: Flow model builder: the na filter does not select all ignored columns; just the first 100.

Shannon (3.0.0.24) - 6/25/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/24/index.html

New Features

Algorithms

GitHub: Allow validation for unsupervised models.

R

GitHub: Added runit GBM weights
GitHub: Updated runit_GBM_weights.R

Python

GitHub: add h2o.set_timezone h2o.get_timezone and h2o.list_timezones to python client and respective pyunit.
GitHub: add h2o.save_model and h2o.load_model to python client and respective pyunit

Enhancements

Algorithms

GitHub: Skip rows with weight 0.
GitHub: x_ignore must be set when autoencoder is TRUE

System

GitHub: Fix Java bindings generator to generate code under project's location.
GitHub: Adds input parameter check to ParseSetup.

Bug Fixes

Algorithms

PUBDEV-1529: dl with ae: get ava.lang.UnsupportedOperationException: Trying to predict with an unstable model.
GitHub: Bring back accidentally removed hiding of classification-related fields for unsupervised models.

API

PUBDEV-1456: fix REST API POJO generation for enums, + java.util.map import

Shannon (3.0.0.23) - 6/19/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/23/index.html

New Features

Algorithms

HEXDEV-21: Offset for GLM
HEXDEV-208: Add observation weights to GLM (was HEXDEV-4)
PUBDEV-677: Add observation weights to all metrics
PUBDEV-675: Pass a weight Vec as input to all algos
HEXDEV-6: Add observation weights to GBM
HEXDEV-7: Add observation weights to DL
HEXDEV-10: Add observation weights to DRF
PUBDEV-291: Add observation weights to GLM, GBM, DRF, DL (classification)
HEXDEV-332: Support Offsets for DL GitHub
GitHub: Use weights/offsets in GBM.

API

PUBDEV-61: do back-end work to allow document navigation from one Schema to another
PUBDEV-133: doing summary means calling it with each columns name, index not supported?

Python

GitHub: add num_iterations accessor to python client and respective pyunit
GitHub: add score_history accessor to python client and respective pyunit
GitHub: add hit ratio table accessor to python interface and respective pyunit
GitHub: add h2o.naivebayes and respective pyunits
GitHub: add h2o.prcomp and respective pyunits.
PUBDEV-681: Add user-given input weight parameters to Python
GitHub: add h2o.create_frame to python client and respective pyunit
GitHub: add h2o.interaction and respective pyunit
GitHub: add h2o.strplit to python client and respective pyunit
GitHub: add h2o.toupper and h2o.tolower to python client and respective pyunit
GitHub: add h2o.sub and h2o.gsub to python interface and respective pyunit
GitHub: add h2o.trim() to python client and respective pyunit
GitHub: add h2o.rep_len to python client and respective pyunit
GitHub: add h2o.svd to python client and respective golden pyunit
GitHub: add scree plot functionality to python client and respective pyunit
GitHub: add plotting functionality to python client and respective pyunit

R

GitHub: added h2o.weights and h2o.biases accessors to R client and update respective runit
GitHub: add h2o.centroid_stats to R client and respective runit
PUBDEV-680: Add user-given input weight parameters to R
GitHub: Add offset/weights to DRF/GBM R wrappers.

Web UI

PUBDEV-1513: Add cancelJob() routine to Flow

Enhancements

Algorithms

PUBDEV-676: Use the user-given weight Vec as observation weights for all algos
GitHub: Refactor the code to let the caller compute the weighted sigma.
GitHub: Modify prior class distribution to be computed from weighted response.
GitHub: Put back the defaultThreshold that's based on training/validation metrics. Was accidentally removed together with SupervisedModel.
GitHub: Always sample to at least #class labels when doing stratified sampling.
GitHub: Cutout for NAs in GLM score0(data[],...), same as for score0(Chunk[],…)

R

PUBDEV-856: All h2o things in R should have an h2o.something version so it's unambiguous GitHub
GitHub: export clusterIsUp and clusterInfo commands
GitHub: update accessors in the shim
GitHub: gbm with async exec

System

HEXDEV-361: Wide frame handling for model builders
GitHub: Remove application plugin from assembly to speedup build process.
GitHub: add byteSize to ls
GitHub: option to launch randomForest async
GitHub: Return HDFS persist manager for URIs starting with s3n and s3a
GitHub: quote strings when writing to disk

Bug Fixes

Algorithms

PUBDEV-1217: pca: when cancel the job the key remains locked
PUBDEV-1468: Error in GBM if response column is constant GitHub
PUBDEV-1476: dl with obs weights: nas in weights cause 'java.lang.AssertionError GitHub
PUBDEV-1458: pca: data with nas, v2 vs v3 slightly different results GitHub
PUBDEV-1477: dl w/obs wts: when all wts are zero, get java.lang.AssertionError GitHub
GitHub: Fix check for offset (allow offset for logistic regression).
GitHub: Gracefully handle exception when launching single-node DRF/GBM in client mode.
GitHub: Hack around the fact that hasWeights()/hasOffset() isn't available on remote nodes and that SharedTree is sent to remote nodes and its private internal classes need access to the above methods...
GitHub: Fix scoring when NAs are predicted.

Python

PUBDEV-1469: pyunit_citi_bike_large.py : test failing consistently on regression jobs
PUBDEV-1472: Regression job : Pyunit small tests groupie and pub_444_spaces failing consistently
PUBDEV-1372: Regression of pyunit_small, Groupby.py
PUBDEV-1386: intermittent fail in pyunit_citi_bike_small.py: -Unimplemented- failed lookup on token
PUBDEV-1471: pyunit_citi_bike_small.py : failing consistently on regression jobs
PUBDEV-1466: matplotlib.pyplot import failure on MASTER jenkins pyunit small jobs GitHub
GitHub: minor fix to python's h2o.create_frame
GitHub: update the path to jar in connection.py

R

PUBDEV-1475: Client mode failed tests : runit_GBM_one_node.R, runit_RF_one_node.R, runit_v_3_apply.R, runit_v_4_createfunctions.R GitHub
PUBDEV-1235: Split Frame causes AIOOBE on Chicago crimes data GitHub
PUBDEV-746: runit_demo_NOPASS_h2o_impute_R : h2o.impute() is missing. seems like we want that?
PUBDEV-582: H2O-R- does not give the full column summary
PUBDEV-1473: Regression : Runit small jobs failing on tests :
PUBDEV-741: runit_NOPASS_pub-668 R tests uses all() ...h2o says all is unimplemented
PUBDEV-1506: R: h2o.ls() needs to return data sizes
PUBDEV-1436: Intermitent runit fail : runit_GBM_ecology.R GitHub
PUBDEV-1464: R: toupper/tolower don't work GitHub GitHub
PUBDEV-1194: R: dataset is imported but can't return head of frame

Sparkling Water

PUBDEV-975: Download page for Sparkling Water should point to the right R-client and Python client
PUBDEV-1428: Sparkling water => Flow => Million song/KDD Cup path issues GitHub

Web UI

PUBDEV-1433: Flow UI: Change Help > FAQ link to h2o-docs/index.html#FAQ

Shannon (3.0.0.22) - 6/13/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/22/index.html

#New Features

##API

PUBDEV-633: Generate Java bindings for REST API: POJOs for the entities (schemas)

##Python

GitHub: added h2o.anyfactor() and respective pyunit
GitHub: add h2o.scale and respective pyunit
GitHub: added levels, nlevels, setLevel and setLevels and respective pyunit...PUBDEV-1434 PUBDEV-1437 PUBDEV-1434 PUBDEV-1345 PUBDEV-1311
GitHub: add H2OFrame.as_date and pyunit addition. H2OFrame.setLevel should return a H2OFrame not a H2OVec.

#Enhancements

##Algorithms

GitHub: Add _build_tree_one_node option to GBM

## API

HEXDEV-352: Additional attributes on /Frames and /Frames/foo/summary

##R

PUBDEV-706: Release h2o-dev to CRAN
Adding parameter parse_type to upload/import file (GitHub)

##Python

GitHub: print out where h2o jar is looked for
GitHub:add h2o.ls and respective pyunit

##System

PUBDEV-717: refector the duplicated code in FramesV2
PUBDEV-1281: Add horizontal pagination of frames to Flow GitHub
PUBDEV-607: Add Xmx reporting to GA
GitHub:Added support for Freezable[][][] in serialization (added addAAA to auto buffer and DocGen, DocGen will just throw H2O.fail())
GitHub: No longer set yyyy-MM-dd and dd-MMM-yy dates that precede the epoch to be NA. Negative time values are fine. This unifies these two time formats with the behavior of as.Date.
GitHub: Reduces the verbosity of parse tracing messages.
GitHub: Rename AUTO->GUESS for figuring out file type.

## Web UI

HEXDEV-276: Add frame pagination
PUBDEV-1405: Flow : Decision to be made on display of number of columns for wider datasets for Parse and Frame summary
PUBDEV-1404: Usability improvements
PUBDEV-244: "View Data" display may need to be modified/shortened.

#Bug Fixes

##Algorithms

PUBDEV-1365: GLM: Buggy when likelihood equals infinity
PUBDEV-1394: GLM: Some offsets hang
PUBDEV-1268: GLM: get java.lang.AssertionError at hex.glm.GLM$GLMSingleLambdaTsk.compute2 for attached data
PUBDEV-1403: pca: h2o-3 reporting incorrect proportion of variance and cum prop GitHub
HEXDEV-281: GLM - beta constraints with categorical variables fails with AIOOB
HEXDEV-280: GLM - gradient not within tolerance when specifying beta_constraints w/ and w/o prior values

## Python

PUBDEV-1425: Class Cast Exception ValStr to ValNum GitHub
PUBDEV-1421: python client parse fail on hdfs /datasets/airlines/airlines.test.csv
PUBDEV-1153: Demo: Airlines Demo in Python GitHub
PUBDEV-1286: Python ifelse on H2OFrame never finishes
PUBDEV-1435: Run.py modify to accept phantomjs timeout command line option GitHub

## R

PUBDEV-1154: Demo: Chicago Crime Demo in R
PUBDEV-1240: Merge causes IllegalArgumentException
PUBDEV-1447: R: no argument parser_type in h2o.uploadFile/h2o.importFile (GitHub)

## System

PUBDEV-1423: Phantomjs : Add timeout command line option
PUBDEV-1401: Flow : Import file 15 M Rows 2.2K cols=> Parse these files => Change first column type => Unknown => Try to change other columns => Kind of hangs
PUBDEV-1406: make the ParseSetup / Parse API more efficient for high column counts GitHub

Shannon (3.0.0.21) - 6/12/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/21/index.html

New Features

Python

HEXDEV-29: The ability to define features as categorical or continuous in the web UI and in the python API

Enhancements

Algorithms

GitHub Made intercept option public and added it to field list in parameter schema
GitHub GLM: Updated null model intercept fit.
GitHub GLM: Updated null-model constant term fitting when running with offset
GitHub glm update
GitHub DL code refactoring to reduce file sizes

Python

GitHub add h2o.round() and h2o.signif() and additional pyunit checks
GitHub add h2o.all() and respective pyunit checks

R

GitHub added intercept option top R

System

PUBDEV-607: Add Xmx reporting to GA GitHub

Web UI

GitHub Add horizontal pagination of /Frames to handle UI navigation of wide datasets more efficiently.
GitHub Only show the top 7 metrics for the max metrics table
GitHub Make the max metrics table entries be called max f1 etc.

Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms

PUBDEV-1365: GLM: Buggy when likelihood equals infinity GitHub
PUBDEV-1394: GLM: Some offsets hang
PUBDEV-1268: GLM: get java.lang.AssertionError at hex.glm.GLM$GLMSingleLambdaTsk.compute2 for attached data
PUBDEV-1382: pca: giving wrong std- dev for mentioned data
PUBDEV-1383: pca: std dev numbers differ for v2 and v3 for attached data GitHub
PUBDEV-1381: GBM, RF: get an NPE when run with a validation set with no response GitHub
GitHub GLM fix - fixed fitting of null model constant term
GitHub Fix remote bug
GitHub Remove elastic averaging parameters from Flow.
PUBDEV-1398: pca: predictions on the attached data from v2 and v3 differ

Python

PUBDEV-1286: Python ifelse on H2OFrame never finishes GitHub

R

PUBDEV-761: Save model and restore model (from R)
PUBDEV-1236: h2o-r/tests/testdir_misc/runit_mergecat.R failure (client mode only)

System

PUBDEV-1402: move Rapids to /99 since it's going to be in flux for a while GitHub
GitHub Fixes an operator precedence issue, and replaces debug GA target with actual one.
GitHub Fix log download bug where all nodes were getting the same zip file.

Shannon (3.0.0.18) - 6/9/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/18/index.html

New Features

System

PUBDEV-1163: implement h2o1-style model save/restore in h2o-3 GitHub

Python

GitHub: Added --h2ojar option

Enhancements

Python

PUBDEV-277: Make python equivalent of as.h2o() work for numpy array and pandas arrays

Bug Fixes

Algorithms

PUBDEV-1371: pca: get java.lang.AssertionError at hex.svd.SVD$SVDDriver.compute2(SVD.java:198)
PUBDEV-1376: pca: predictions from h2o-3 and h2o-2 differs for attached data
PUBDEV-1380: DL: when try to access the training frame from the link in the dl model get: Object not found

R

PUBDEV-761: Save model and restore model (from R) GitHub

Shannon (3.0.0.17) - 6/8/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/17/index.html

New Features

Algorithms

HEXDEV-209:Poisson distributions for GLM

Python

PUBDEV-1270: Python Interface needs H2O Cut Function GitHub
PUBDEV-1242: Need equivalent of as.Date feature in Python GitHub
PUBDEV-1165: H2O Python needs Modulus Operations
HEXDEV-29: The ability to define features as categorical or continuous in the web UI and in the python API
PUBDEV-1237: environment variable to disable the strict version check in the R and Python bindings

Web UI

PUBDEV-1175: Flow: Good interactive confusion matrix for binomial
PUBDEV-1176: Flow: Good confusion matrix for multinomial

Enhancements

Algorithms

GitHub: GLM weights fix: regularize by sum of weights rather than number of observations
GitHub: GLM fix: added line search (and limited number of iterations) to constant term model fitting with offset (could enter infinite loop)
GitHub: No longer warn if binomial_double_trees option is enabled for _nclass!=2
GitHub: Fix CM table to have integer entries unless there are real-valued entries
GitHub: Add extra assertion for train_samples_per_iteration
GitHub: Update model during runtime of algorithm.
GitHub: Changes to glm forloop to add offsets and add NOPASS/NOFEATURE functionality back to run.py

R

GitHub: month was off by one, runit test edited
GitHub: Comments to clarify the policy on dates in H2O.

System

HEXDEV-344: Logs should include JVM launch parameters

Web UI

PUBDEV-467: Show Frames for DL weights/biases in Flow
PUBDEV-1221: add a "I like this" style button with LinkedIn or Github (beside the Flow Assist Me button)
PUBDEV-1245: Flow: use new _exclude_fields query parameter to speed up REST API usage

Bug Fixes

Algorithms

PUBDEV-1353: GLM: model with weights different in R than in H2o for attached data
PUBDEV-1358: GLM: when run with -ive weights, would be good to tell the user that -ive weights not allowed instead of throwing exception
PUBDEV-1264: GLM: reporting incorrect null deviance GitHub
PUBDEV-1362: GLM: when run with weights and offset get wrong ans
PUBDEV-1263: GLM: name ordering for the coefficients is incorrect GitHub
PUBDEV-1261: pca: wrong std dev for data with nas rest numeric cols GitHub
PUBDEV-1218: pca: progress bar not showing progress just the initial and final progress status GitHub
PUBDEV-1204: pca: from flow when try to invoke build model, displays-ERROR FETCHING INITIAL MODEL BUILDER STATE
PUBDEV-1212: pca: with enum column reporting (some junk) wrong stdev/ rotation GitHub
PUBDEV-1228: pca: no std dev getting reported for attached data
PUBDEV-1233: pca: std dev for attached data differ when run on h2o-3 and h2o-2
PUBDEV-1258: h2o.glm with offset column: get Error in .h2o.startModelJob(conn, algo, params) : Offset column 'logInsured' not found in the training frame.

R

PUBDEV-1234: h2o.setTimezone throwing an error GitHub
PUBDEV-1229: R: Most GLM accessors fail GitHub
PUBDEV-1227: R: Cannot extract an enum value using data[row,col] GitHub
HEXDEV-339: Feature engineering: log (1+x) fails GitHub
PUBDEV-1249: h2o.glm: no way to specify offset or weights from h2o R GitHub
PUBDEV-1255: create_frame: hangs with following msg in the terminal, java.lang.IllegalArgumentException: n must be positive
PUBDEV-1361: runit_hex_1841_asdate_datemanipulation.R fails intermittently GitHub
PUBDEV-1361: runit_hex_1841_asdate_datemanipulation.R fails intermittently

Sparkling Water

PUBDEV-692: Upgrade SparklingWater to Spark 1.3

System

PUBDEV-1288: Confusion Matrix: class java.lang.ArrayIndexOutOfBoundsException', with msg '2' java.lang.ArrayIndexOutOfBoundsException: 2 at hex.ConfusionMatrix.createConfusionMatrixHeader Github
HEXDEV-323: SVMLight Parse Bug GitHub
PUBDEV-1207: implement JSON field-filtering features: _exclude_fields
GitHub: Fix a missing field update in Job.
PUBDEV-65: Handling of strings columns in summary is broken
PUBDEV-1230: Parse: get AIOOB when parses the attached file with first two cols as enum while h2o-2 does fine
PUBDEV-1377: Get AIOOBE when parsing a file with fewer column names than columns GitHub
PUBDEV-1364: Variable importance Object

Web UI

PUBDEV-1198: Flow: Selecting "Cancel" for "Load Notebook" prompt clears current notebook anyway
PUBDEV-1172: Model builder takes forever to load the column names in Flow, hence cannot build any models
PUBDEV-1248: Flow GLM: from Flow the drop down with column names does not show up and hence not able to select the offset column
PUBDEV-1380: DL: when try to access the training frame from the link in the dl model get: Object not found GitHub

Shannon (3.0.0.13) - 5/30/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/13/index.html

New Features

Algorithms

HEXDEV-260: Add Random Forests for regression GitHub

Python

PUBDEV-1166: Converting H2OFrame into Python object
PUBDEV-1165: H2O Python needs Modulus Operations

R

PUBDEV-1188: Merge should handle non-numeric columns (github)
PUBDEV-1096: R: add weekdays() function in addition to month() and year()

Enhancements

Algorithms

github: Updated weights handling, test.
HEXDEV-324poor GBM performance on KDD Cup 2009 competition dataset (github)
HEXDEV-326: varImp() function for DRF and GBM (github)
github: Change some of the defaults

API

PUBDEV-669: have the /Frames/{key}/summary API call Vec.startRollupStats

R/Python

PUBDEV-479: Port MissingInserter to R/Python
PUBDEV-632: Display TwoDimTable of HitRatios in R/Python
github: minor change to h2o.demo()
github: add h2o.demo() facility to python package, along with some built-in (small) data
github: remove cols param

Bug Fixes

Algorithms

PUBDEV-1211: pca: descaled pca, std dev seems to be wrong for attached data github
PUBDEV-1213: pca: would be good to have the std dev numbered bec difficult to relate to the principal components (github)
PUBDEV-1201: pca: get ArrayIndexOutOfBoundsException (github)
PUBDEV-1203: pca: giving wrong std dev/rotation-labels for iris with species as enum (github)
PUBDEV-1199: DL with <1 epochs has wrong initial estimated time (github)
github: Fix missing AUC for training data in DL.
github: Add the seed back to GBM imbalanced test (was set to 0 by default before, now explicit)

R

PUBDEV-1189: R: h2o.hist broken for breaks that is a list of the break intervals (github)
PUBDEV-1206: Frame summary from R and Python need to use the Frame summary endpoint (github)
PUBDEV-1177: R summary() is slow when large number of columns
PUBDEV-1097: R: R should be able to take a of paths similar to how python does

Shannon (3.0.0.11) - 5/22/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/11/index.html

Enhancements

Algorithms

PUBDEV-1179: DRF: investigate if larger seeds giving better models
PUBDEV-1178: Add logloss/AUC/Error to GBM/DRF Logs & ScoringHistory
PUBDEV-1169: Use only 1 tree for DRF binomial (github)
PUBDEV-1170: Wrong ROC is shown for DRF (Training ROC, even though Validation is given)
PUBDEV-1162: Speed up sorting of histograms with O(N log N) instead of O(N^2)

System

PUBDEV-1152: Accept s3a URLs
HEXDEV-316: ImportFiles should not download files from HTTP

Bug Fixes

Algorithms

HEXDEV-253: model output consistency
HEXDEV-319: DRF in h2o 3.0 is worse than in h2o 2.0 for Airline
PUBDEV-1180: DRF has wrong training metrics when validation is given

API

PUBDEV-501: H2OPredict: does not complain when you build a model with one dataset and predict on completely different dataset

Python

PUBDEV-1183: Python version check should fail hard by default
PUBDEV-1185: Python binding version mismatch check should fail hard and be on by default
HEXDEV-138: Port Python tests for Deep Learning

##R

PUBDEV-1160: R: h2o.hist doesn't support breaks argument
PUBDEV-1159: R: h2o.hist takes too long to run
PUBDEV-1150: R CMD Check: URLs not working
PUBDEV-1149: R CMD check not happy with our use of .OnAttach
PUBDEV-1174: R: h2o.hist FD implementation broken
PUBDEV-1167: R: h2o.group_by broken
HEXDEV-318: the fix to H2O startup for the host unreachable from R causes a security hole
PUBDEV-1187: FramesHandler.summary() needs to run summary on all Vecs concurrently.

System

PUBDEV-862: Building a model without training file -> NPE
HEXDEV-315: importFile fails: Error in fromJSON(txt, ...) : unexpected character: A
PUBDEV-1137: Parse: upload and import gives different chunk compression on the same file
PUBDEV-1054: Parse: h2o parses arff file incorrectly
PUBDEV-1181: Rapids should queue and block on the back-end to prevent overlapping calls
PUBDEV-1184: importFile fails for paths containing spaces

Web UI

PUBDEV-1182: Flow: when upload file fails, the control does not come back to the flow screen, and have to refresh the whole page to get it back
PUBDEV-1131: GBM crashes after calling getJobs in Flow

Shannon (3.0.0.7) - 5/18/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/7/index.html

Enhancements

API

PUBDEV-711: take a final look at all REST API parameter names and help strings
PUBDEV-757: Rename DocsV1 + DocsHandler to MetadataV1 + MetadataHandler
PUBDEV-1138: Performance improvements for big data sets => getModels
PUBDEV-1126: Performance improvements for big data sets => Get frame summary

System

HEXDEV-316: ImportFiles should not download files from HTTP

Web UI

PUBDEV-1144: Update/Fix Flow API for CreateFrame

Bug Fixes

The following changes are to resolve incorrect software behavior:

API

PUBDEV-501: H2OPredict: does not complain when you build a model with one dataset and predict on completely different dataset
PUBDEV-1047: API : Get frames and Build model => takes long time to get frames
HEXDEV-149: Allow JobsV3 to return properly typed jobs, not always instances of JobV3
PUBDEV-1036: rename straggler V2 schemas to V3

R

PUBDEV-1159: R: h2o.hist takes too long to run

System

PUBDEV-1034: Windows 7/8/2012 Multicast Error UDP
PUBDEV-862: Building a model without training file -> NPE
HEXDEV-253: model output consistency
PUBDEV-1135: While predicting get:class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.ArrayIndexOutOfBoundsException: 5
PUBDEV-1090: POJO: Models with "." in key name (ex. pros.glm) can't access pojo endpoint
PUBDEV-1077: Getting an IcedHashMap warning from H2O startup

Web UI

PUBDEV-1133: getModels in Flow returns error
PUBDEV-926: Flow: When user hits build model without specifying the training frame, it would be good if Flow guides the user. It presently shows an NPE msg
PUBDEV-1131: GBM crashes after calling getJobs in Flow

Shannon (3.0.0.2) - 5/15/15

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-shannon/2/index.html

New Features

ModelMetrics

PUBDEV-411: ModelMetrics by model category

WebUI

PUBDEV-942: ModelMetrics by model category - Autoencoder

Enhancements

Algorithms

github: GLM update: skip lambda max during lambda search
github: removed higher accuracy option
github: Rename constant col parameter
github: GLM update: added stopping criteria to lbfgs, tweaked some internal constants in ADMM
github: Add support for ignore_const_col in DL

Python

PUBDEV-852: Binomial: show per-metric-optimal CM and per-threshold CM in Python
github: add filterNACols to python
github: h2o.delete replaced with h2o.removeFrameShallow
github: Add distribution summary to Python

R

github: add filterNACols to R
github: explicitly set cols=TRUE for R style str on frames
github: enable faster str, bulk nlevels, bulk levels, bulk is.factor
github: Add optional blocking parameter to h2o.uploadFile

System

PUBDEV-672 HTML version of the REST API docs should be available on the website
PUBDEV-827: class GenModel duplicates part of code of Model

Web UI

HEXDEV-181 Flow: Handle deep features prediction input and output
github: removed use_all_factor_levels from glm flows

Bug Fixes

Algorithms

HEXDEV-302: AIOOBE during Prediction with DL github
github: glm fix: don't force in null model for lambda search with user given list of lambdas
github: Fix domain in glm scoring output for binomial
github: GLM Fix - fix degrees of freedom when running without intercept (+/-1)
github: GLM fix: make valid data info be clone of train data info (needs exactly the same categorical offsets, ignore unseen levels)
github: Fix glm scoring, fill in default domain {0,1} for binary columns when scoring

R

PUBDEV-1116: R: Parse that works from flow doesn't work from R using as.h2o
PUBDEV-798: R: String Munging Functions Missing
PUBDEV-584: R: hist() doesn't currently work for H2O objects
PUBDEV-820: H2oR: model objects should return the CM when run classification like h2o1
PUBDEV-1113: Remove Keys : Parse => Remove => doesn't complete
PUBDEV-1102: R: h2o.rbind fails to join two dataset together
PUBDEV-899: R: all doesn't work
PUBDEV-555: H2O-R: str does not work
PUBDEV-1110: H2OR: while printing a gbm model object, get invalid format '%d'; use format %f, %e, %g or %a for numeric objects
PUBDEV-903: R: Errors from some rapids calls seem to fail to return an error
HEXDEV-311: Performance bug from R with Expect: 100-continue
PUBDEV-1030: h2o.performance: ignores the user specified threshold
PUBDEV-1071: R: regression models don't show in print statement r2 but it exists in the model object
PUBDEV-1072: R: missing accessors for glm specific fields
PUBDEV-1032: After running some R and py demos when invoke a build model from flow get- rollup stats problem vec deleted error
PUBDEV-1069: R: missing implementation for h2o.r2
PUBDEV-1064: Passing sep="," to h2o.importFile() fails with '400 Bad Request'
PUBDEV-1092: Get NPE while predicting

System

PUBDEV-1091: S3 gzip parse failure
PUBDEV-1081: Probably want to cleanly disable multicast (not retry) and print suggestion message, if multicast not supported on picked multicast network interface
PUBDEV-1112: User has no way to specify whether to drop constant columns
PUBDEV-1109: Change all extdata imports to uploadFile
PUBDEV-1104: .gz file parse exception from local filesystem

Web UI

PUBDEV-1134: getPredictions in Flow returns error
PUBDEV-1020: Flow : Drop NA Cols enable => Should automatically populate the ignored columns
PUBDEV-1041: Flow GLM: formatting needed for the model parameter listing in the model object github
PUBDEV-1108: Flow: When predict on data with no response get :Error processing POST /3/Predictions/models/gbm-a179db76-ba96-420f-a643-0e166aea3af3/frames/subset_1 'undefined' is not an object (evaluating 'prediction.model')

H2O-Dev

Shackleford (0.2.3.6) - 5/8/15

Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-shackleford/6/index.html

New Features

Python

Set up POJO download for Python client (PUBDEV-908) (github)

##Sparkling Water

Publish h2o-scala and h2o-app latest version to maven central (PUBDEV-443)

Enhancements

Algorithms

Use AUC's default threshold for label-making for binomial classifiers predict() (PUBDEV-1063) (github)
GLM update (github)
Cleanup AUC2, make incremental version (github)
Name change: override_with_best_model -> overwrite_with_best_model (github)
Couple of GLM updates (github)
Disable _replicate_training_data for data that's larger than 10GB (github)
Added replicate_training_data param for DL (github)
Change a few kmeans output parameters so no longer dividing by nrows or num_clusters (github)
GLMValidation Updated auc computation (github)
Do not delete model metrics at end of GBM/DRF (github)

API

Clean REST api for Parse (PUBDEV-993)
Removes is_valid, invalid_lines, and domains from REST api (github)
Annotate domains output field as expert level (github)

Python

Implement h2o.interaction() (PUBDEV-854) (github)
nice tables in ipython! (github)
added deeplearning weights and biases accessors and respective pyunit. (github)

R

Cleaner client POJO download for R (PUBDEV-907)
Implement h2o.interaction() (PUBDEV-854) (github)
R: h2o.impute missing (PUBDEV-796)
validation_frame is passed through to h2o (github)
Adding GBM accessor function runits (github)
Adding changes to h2o.hit_ratio_table to be like other accessors (i.e., no train) (github)
add h2o.getPOJO to R, fix impute ast build in python (github)

System

Change NA strings to an array in ParseSetup (PUBDEV-995)
Document way of passing S3 credentials for S3N (PUBDEV-947)
Add H2O-dev doc on docs.h2o.ai via a new structure (proposed below) (PUBDEV-355)
Rapids Ref Doc (PUBDEV-667)
Show Timestamp and Duration for all model scoring histories (PUBDEV-1018) (github)
Logs slow reads, mainly meant for noting slow S3 reads (github)
Make prediction frame column names non-integer (github)
Add String[] factor_columns instead of int[] factors (github)
change the runtime exception to a Log.info() if interface doesn't support multicast (github)
More robust way to copy Flow files to web root per Prithvi (github)
Switches na_string from a single value per column to an array per column (github)

Web UI

Model output improvements (HEXDEV-150)

Bug Fixes

Algorithms

H2O cloud shuts down with some H2O.fail error, while building some kmeans clusters (PUBDEV-1051) (github)
GLM:beta constraint does not seem to be working (PUBDEV-1083)
GBM - random attack bug (probably because max_after_balance_size is really small) (PUBDEV-1061) (github)
GLM: LBFGS objval java lang assertion error (PUBDEV-1042) (github)
PCA Cholesky NPE (PUBDEV-921)
GBM: H2o returns just 5525 trees, when ask for a much larger number of trees (PUBDEV-860)
CM returned by AUC2 doesn't agree with manual-made labels from F1-optimal threshold (HEXDEV-263)
AUC: h2o reporting wrong auc on a modified covtype data (PUBDEV-891)
GLM: Build model => Predict => Residual deviance/Null deviance different from training/validation metrics (PUBDEV-991)
KMeans metrics incomplete (PUBDEV-1029)
GLM: Java Assertion Error (PUBDEV-1025)
Random forest bug (PUBDEV-1015)
A particular random forest model has an empty (training) metric json max_criteria_and_metric_scores (PUBDEV-1001)
PCA results exhibit numerical inaccuracies compared to R (PUBDEV-550)
DRF: reporting wrong depth for attached dataset (PUBDEV-1006)
added missing "names" column name to beta constraints processing (github)
Fix balance_classes probability correction consistency between H2O and POJO (github)
Fix in GLM scoring - check actual for NaNs as well (github)

Python

Cannot import_file path=url python interface (PUBDEV-1059)
head()/tail() should show labels, rather than number encoding, for enum columns (PUBDEV-1017)
h2o.py: for binary response printing transpose and hence wrong cm (PUBDEV-1013)

R

Broken Summary in R (PUBDEV-1073
h2oR summary: displaying no labels in summary (PUBDEV-1008)
R/Python impute bugs (PUBDEV-1055)
R: h2o.varimp doubles the print statement (PUBDEV-1068)
R: h2o.varimp returns NULL when model has no variable importance (PUBDEV-1078)
h2oR: h2o.confusionMatrix(my_gbm, validation=F) should not show a null (PUBDEV-849)
h2o.impute doesn't impute (PUBDEV-1024)
R: as.h2o cutting entries when trying to import data.frame into H2O (HEXDEV-293)
The default names are too long, for an R-datafile parsed to H2O, and needs to be changed (PUBDEV-976)
H2o.confusionMatrix: when invoked with threshold gives error (PUBDEV-1010)
removing train and adding error messages for valid = TRUE when there's not validation metrics (github)

System

Download logs is returning the same log file bundle for every node (PUBDEV-1056)
ParseSetup is useless and misleading for SVMLight (PUBDEV-994)
Fixes bug that was short circuiting the setting of column names (github)

Web UI

Flow: Predict should not show mse confusion matrix etc (PUBDEV-987) (github)
Flow: Raw frames left out after importing files from directory (PUBDEV-1046)

Shackleford (0.2.3.5) - 5/1/15

Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-shackleford/5/index.html

New Features

API

Need a /Log REST API to log client-side errors to H2O's log (HEXDEV-291)

##Python

add impute to python interface (github)

System

Job admission control (PUBDEV-536) (github)
Get Flow Exceptions/Stack Traces in H2O Logs (PUBDEV-920)

Enhancements

Algorithms

GLM: Name to be changed from normalized to standardized in output to be consistent between input/output (PUBDEV-954)
GLM: It would be really useful if the coefficient magnitudes are reported in descending order (PUBDEV-923)
PUBDEV-536: Limit DL models to 100M parameters (github)
PUBDEV-536: Add accurate memory-based admission control for GBM/DRF (github)
relax the tolerance a little more...(github)
Tree depth correction (github)
Comment out duration_in_ms for now, as it's always left at 0 (github)
Updated min mem computation for glm (github)
GLM update: added lambda search info to scoring history (github)

Python

python .show() on model and metric objects should match R/Flow as much as possible (HEXDEV-289)
GLM model output, details from Python (HEXDEV-95)
GBM model output, details from Python (HEXDEV-102)
Run GBM from Python (HEXDEV-99)
map domain to result from /Frames if needed (github)
added confusion matrix to metric output (github)
update metrics_base_confusion_matrices() (github)
fetch out string_data if type is string (github)

R

GBM model output, details from R (HEXDEV-101)
Run GBM from R (HEXDEV-98)
check if it's a frame then check NA (github)

System

Report MTU to logs (PUBDEV-614) (github)
Make parameter changes Log.info() instead of Log.warn() (github)

Web UI

Flow: Confusion matrix: good to have consistency in the column and row name (letter) case (PUBDEV-971)
Run GBM Multinomial from Flow (HEXDEV-111)
Run GBM Regression from Flow (HEXDEV-112)
Sort model types in alphabetical order in Flow (PUBDEV-1011)

Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms

GLM: Model output display issues (PUBDEV-956)
h2o.glm: ignores validation set (PUBDEV-958)
DRF: reports wrong number of leaves in a summary (PUBDEV-930)
h2o.glm: summary of a prediction frame gives na's as labels (PUBDEV-959)
GBM: reports wrong max depth for a binary model on german data (PUBDEV-839)
GLM: Confusion matrix missing in R for binomial models (PUBDEV-950) (github)
GLM: On airlines(40g) get ArrayIndexOutOfBoundsException (PUBDEV-967)
GLM: Build model => Predict => Residual deviance/Null deviance different from training/validation metrics (PUBDEV-991)
Domains returned by GLM for binomial classification problem are integers, but should be mapped to their label (PUBDEV-999)
GLM: Validation on non training data gives NaN Res Deviance and AIC (PUBDEV-1005)
Confusion matrix has nan's in it (PUBDEV-1000)
glm fix: pass model_id from R (was being dropped) (github)

Python

H2OPy: warns about version mismatch even when installed the latest from master (PUBDEV-980)
Columns of type enum lose string label in Python H2OFrame.show() (PUBDEV-965)
Bug in H2OFrame.show() (HEXDEV-295) (github)

R

h2o.confusionMatrix for binary response gives not-found thresholds (PUBDEV-957)
GLM: model_id param is ignored in R (PUBDEV-1007)
h2o.confusionmatrix: mixing cases(letter) for categorical labels while printing multinomial cm (PUBDEV-996)
fix the dupe thresholds error (github)
extra arg in impute example (github)
fix missing param data (github)

System

Builds : Failing intermittently due to java.lang.StackOverflowError (PUBDEV-972)
Get H2O cloud hang with NPE and roll up stats problem, when click on build model glm from flow, on laptop after running a few python demos and R scripts (PUBDEV-963)

Web UI

Flow :=> Airlines dataset => Build models glm/gbm/dl => water.DException$DistributedException: from /172.16.2.183:54321; by class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.NullPointerException: null (PUBDEV-603)
Flow => Preview Pojo => collapse not working (PUBDEV-977)
Flow => Any algorithm => Select response => Select Add all for ignored columns => Try to unselect some from ignored columns => Build => Response column IsDepDelayed not found in frame: allyears_1987_2013.hex. (PUBDEV-978)
Flow => ROC curve select something on graph => Table is displayed for selection => Collapse ROC curve => Doesn't collapse table, collapses only graph (PUBDEV-1003)

Severi (0.2.2.16) - 4/29/15

New Features

Python

Release h2o-dev to PyPi (PUBDEV-762)
Python Documentation (PUBDEV-901)
Python docs Wrap Up (PUBDEV-966)
add getters for res/null dev, fix kmeans,dl getters (github)

Enhancements

Algorithms

Use partial-sum version of mat-vec for DL POJO (PUBDEV-936)
Always store weights and biases for DLTest Junit (github)
Show the DL model size in the model summary (github)
Remove assertion in hot loop (github)
Rename ADMM to IRLSM (github)
Added no intercept option to glm (github)
Code cleanup. Moved ModelMetricsPCAV3 out of H2O-algos (github)
Improve DL model checkpoint logic (github)
Updated glm output (github)
Renamed normalized coefficients to standardized coefficients in glm output (github)
Use proper tie breaking for NB (github)
Add check that DL parameters aren't modified by model training (github)
Reduce tolerances (github)
If no observations of a response leveland prediction is numeric, assume it is drawn from standard normal distribution (mean 0, standard deviation 1). Add validation test with split frame for naive Bayes (github)

Python

replaced H2OFrame.send_frame() calls with cbind Exprs so that lazy evaluation is enforced (github)
change default xmx/s behavior of h2o.init() (github)
better handling of single row return and print (github)

R

Added interpolation to quantile to match R type 7 (github)
Removed and tidied if's in quantile.H2OFrame since it now uses match.arg (github)
Connected validation dataset to glm in R (github)
Removing h2o.aic from seealso link (doesn't exist) and updating documentation (github)

System

Add number of rows (per node) to ChunkSummary (PUBDEV-938) (github)
allow nrow as alias for count in groupby (github)
Only launches task to fill in SVM zeros if the file is SVM (github)
Adds more log traces to track progress of post-ingest actions (github)
Adds svm as a file extension to the hex name cleanup (github)

Web UI

Flow: Inspect data => Round decimal points to 1 to be consistent with h2o1 (PUBDEV-453)
Setup POJO download method for Flow (PUBDEV-909)
Pretty-print POJO preview in flow (PUBDEV-940)
Flow: It would be good if 'get predictions' also shows the data (PUBDEV-883)
GBM model output, details in Flow (HEXDEV-103)
Display a linked data table for each visualization in Flow (PUBDEV-318)
Run GBM binomial from Flow (needs proper CM) (PUBDEV-943)

Bug Fixes

Algorithms

GLM: results from model and prediction on the same dataset do not match (PUBDEV-922)
GLM: when select AUTO as solver, for prostate, glm gives all zero coefficients (PUBDEV-916)
Large (DL) models cause oversize issues during serialization (PUBDEV-941)
Fixed name change for ADMM (github)

API

Fix schema warning on startup (PUBDEV-946) (github)

Python

H2OVec.row_select(H2OVec) fails on case where only 1 row is selected (PUBDEV-948)
fix pyunit (github)

R

R: Parse of zip file fails, Summary fails on citibike data (PUBDEV-835)
h2o. performance reports a different Null Deviance than the model object for the same dataset (PUBDEV-816)
h2o.glm: no example on h2o.glm help page (PUBDEV-962)
H2O R: Confusion matrices from R still confused (PUBDEV-904) (github)
R: h2o.confusionMatrix("H2OModel", ...) extra parameters not working (PUBDEV-953) (github)
h2o.confusionMatrix for binomial gives not-found thresholds on S3 -airlines 43g (PUBDEV-957)
H2O summary quartiles outside tolerance of (max-min)/1000 (PUBDEV-671)
fix space headers issue from R (was not url-encoding the column strings) (github)
R CMD fixes (github)
Fixed broken R interface - make validation_frame non-mandatory (github)

Sparkling Water

Sparkling water : #UDP-Recv ERRR: UDP Receiver error on port 54322java.lang.ArrayIndexOutOfBoundsException:(PUBDEV-311)

System

Mapr 3.1.1 : Memory is not being allocated for what is asked for instead the default is what cluster gets (PUBDEV-937)
GLM: AIOOBwith msg '-14' at water.RPC$2.compute2(RPC.java:593) (PUBDEV-917)
h2o.glm: model summary listing same info twice (PUBDEV-915)
Parse: Detect and reject UTF-16 encoded files (HEXDEV-285)
DataInfo Row categorical encoding AIOOBE (HEXDEV-283)
Fix POJO Preview exception (github)
Fix NPE in ChunkSummary (github)
fix global name collision (github)

Severi (0.2.2.15) - 4/25/15

Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-severi/15/index.html

New Features

Python

added min, max, sum, median for H2OVecs and respective pyunit (github)
added min(), max(), and sum() functionality on H2OFrames and respective pyunits (github)

Web UI

View POJO in Flow (PUBDEV-781)
help > about page or add version on main page for easy bug reporting. (PUBDEV-804)
POJO generation: GLM (PUBDEV-712) (github)
GLM model output, details in Flow (HEXDEV-96)

Enhancements

Algorithms

K means output clean up (HEXDEV-187)
Add FNR/TNR/FPR/TPR to threshold tables, remove recall, specificity (github)
Add accessor for variable importances for DL (github)
Relax CM error tolerance for F1-optimal threshold now that AUC2 doesn't necessarily create consistent thresholds with its own CMs. (github)
Added scoring history to glm (github)
Added model summary to glm (github)
Add flag to support reading data from S3N (github)
Added degrees of freedom to GLM metrics schemas (github)
Allow DL scoring_history to be unlimited in length (github)
add plotting for binomial models (github)
Ignore certain parameters that are not applicable (class balancing, max CM size, etc.) (github)
Updated glm scoring, fill training/validation metrics in model output (github)
Rename gbm loss parameter to distribution (github)
Fix GBM naming: loss -> distribution (github)
GLM LBFGS update (github)
na.rm for quantile is default behavior (github)
GLM update: enabled max_predictors in REST, updated lbfgs (github)
Remove keep_cross_validation_splits for now from DL (github)
Get rid of sigma in the model metrics, instead show r2 (github)
Don't show score_every_iteration for DL (github)
Don't print too large confusion matrices in Tree models (github)

API

publish h2o-model.jar via REST API (PUBDEV-779)
move all schemas and endpoints to v3 (PUBDEV-471)
clean up routes (remove AddToNavbar, fix /Quantiles, etc) (PUBDEV-618) (github)
More data in chunk_homes call. Add num_chunks_per_vec. Add num_vec. (github)
Added chunk_homes route for frames (github)
Update to use /3 routes (github)

Python

Python client should check that version number == server version number (PUBDEV-799)
Add asfactor for month (github)
in Expr.show() only show 10 or less rows. remove locate from runit test because full path used (github)
change nulls to () (github)
sigma is no longer part of ModelMetricsRegressionV3 (github)

R

Fix integer -> int in R (github)
add autoencoder show method (github)
accessor is $ not @ (github)
add hit_ratio_table and varimp calls to R (github)
add h2o.predict as alternative (github)
update model output in R (github)

System

Port MissingValueInserter EndPoint to h2o-dev. (PUBDEV-465)
Rapids: require a (put "key" %frame) (PUBDEV-868)
Need pojo base model jar file embedded in h2o-dev via build process (PUBDEV-780) (github)
Make .json the default (PUBDEV-619) (github)
Rename class for clarification (github)
Classifies all NA columns as numeric. Also improves preview sampling accuracy by trimming partial lines at end of chunk. (github)
Implements sampling of files within the ParseSetup preview. This prevents poor column type guesses from only sampling the beginning of a file. (github).
Rename fields drop_na20_col (github)
allow for many deletes as final statements in a block (github)
rename initF -> init_f, dropNA20Cols -> drop_na20_cols (github)
Removed tweedie param (github)
thresholds -> threshold (github)
JSON of TwoDimTable with all null values in the first column (no row headers) now doesn't have an empty column for of "" or nulls. (github)
move H2O_Load, fix all the timezone functions (github)
Add extra verbose printout in case Frames don't match identically (github)
allow delayed column lookup (github)
add mixed type list (github)
Added WaterMeterIo to count persist info (github)
Remove special setChunkSize code in HDFS and NFS file vec (github)
add check for Frame on string parse (github)
Disable Memory Cleaner (github)
Handle '<' chars in Keys when swapping (github)
allow for colnames in slicing (github)
Adjusts parse type detection. If column is all one string value, declare it an enum (github)

Web UI

nice algo names in the Flow dropdown (full word names) (PUBDEV-707)
Compute and Display Hit Ratios (PUBDEV-630)
Limit POJO preview to 1000 lines (github)

Bug Fixes

Algorithms

GLM: lasso i.e alpha =1 seems to be giving wrong answers (PUBDEV-769)
AUC: h2o reports .5 auc when actual auc is 1 (PUBDEV-879)
h2o.glm: No output displayed for the model (PUBDEV-858)
h2o.glm model object output needs a fix (PUBDEV-815)
h2o.glm model object says : fill me in GLMModelOutputV2; I think I'm redundant [1] FALSE (PUBDEV-765)
GLM : Build GLM Model => Java Assertion error (PUBDEV-686)
GLM :=> Progress shows -100% (PUBDEV-861)
GBM: Negative sign missing in initF value for ad dataset (PUBDEV-880)
K-Means takes a validation set but doesn't use it (PUBDEV-826)
Absolute_MCC is NaN (sometimes) (PUBDEV-848) (github)
GBM: A proper error msg should be thrown when the user sets the max depth =0 (PUBDEV-838) (github)
DRF Regression Assertion Error (PUBDEV-824)
h2o.randomForest: if h2o is not returning the mse for the 0th tree then it should not be reported in the model object (PUBDEV-811)
GBM: Got exception class java.lang.AssertionError with msg null java.lang.AssertionError at hex.tree.gbm.GBM$GBMDriver$GammaPass.map (PUBDEV-693)
GBM: Got exception class java.lang.AssertionError with msg null java.lang.AssertionError at hex.ModelMetricsMultinomial$MetricBuildMultinomial.perRow (HEXDEV-248)
GBM get java.lang.AssertionError: Coldata 2199.0 out of range C17:5086.0-19733.0 step=57.214844 nbins=256 isInt=1 (HEXDEV-241)
GLM: glmnet objective function better than h2o.glm (PUBDEV-749)
GLM: get AIOOB:-36 at hex.glm.GLMTask$GLMIterationTask.postGlobal(GLMTask.java:733) (PUBDEV-894) (github)
Fixed glm behavior in case no rows are left after filtering out NAs (github)
Fix memory leak in validation scoring in K-Means (github)

API

API unification: DataFrame should be able to accept URI referencing file on local filesystem (PUBDEV-709) (github)

Python

Python: describe returning all zeros (PUBDEV-875)
python/R & merge() (PUBDEV-834)
python Expr min, max, median, sum bug (PUBDEV-845) (github)

R

(R and Python) clients must not pass response to DL AutoEncoder model builder (PUBDEV-897) (github)
h2o.varimp, h2o.hit_ratio_table missing in R (PUBDEV-842)
GLM: No help for h2o.glm from R (PUBDEV-732)
h2o.confusionMatrix not working for binary response (PUBDEV-782) (github)
h2o.splitframe complains about destination keys (PUBDEV-783)
h2o.assign does not work (PUBDEV-784) (github)
H2oR: should display only first few entries of the variable importance in model object (PUBDEV-850)
R: h2o.confusion matrix needs formatting (PUBDEV-764)
R: h2o.confusionMatrix => No Confusion Matrices for H2ORegressionMetrics (PUBDEV-710)
h2o.deeplearning: model object output needs a fix (PUBDEV-821)
h2o.varimp, h2o.hit_ratio_table missing in R (PUBDEV-842)
force gc more frequently (github)

System

MapR FS loads are too slow (PUBDEV-927)
ensure that HDFS works from Windows (PUBDEV-812)
Summary: on a time column throws,'null' is not an object (evaluating 'column.domain[level.index]') in Flow (PUBDEV-867)
Parse: An enum column gets parsed as int for the attached file (PUBDEV-606)
Parse => 40Mx1_uniques => class java.lang.RuntimeException (PUBDEV-729)
if there are fewer than 5 unique values in a dataset column, mins/maxs reports e+308 values (PUBDEV-150) (github)
Sparkling water - DataFrame[T_UUID] to SchemaRDD[StringType] (PUDEV-771)
Sparkling water - DataFrame[T_NUM(Long)] to SchemaRDD[LongType] (PUBDEV-767)
Sparkling water - DataFrame[T_ENUM] to SchemaRDD[StringType] (PUBDEV-766)
Inconsistency in row and col slicing (HEXDEV-265) (github)
rep_len expects literal length only (HEXDEV-268) (github)
cbind and = don't work within a single rapids block (HEXDEV-237)
Rapids response for c(value) does not have frame key (HEXDEV-252)
S3 parse takes forever (PUBDEV-876)
Parse => Enum unification fails in multi-node parse (PUBDEV-718) (github)
All nodes are not getting updated with latest status of each other nodes info (PUBDEV-768)
Cluster creation is sometimes rejecting new nodes (post jenkins-master-1128+) (PUBDEV-807)
Parse => Multiple files 1 zip/ 1 csv gives Array index out of bounds (PUBDEV-840)
Parse => failed for X5MRows6KCols ==> OOM => Cluster dies (PUBDEV-836)
/frame/foo pagination weirded out (HEXDEV-277) (github)
Removed code that flipped enums to strings (github)

Web UI

Flow: It would be really useful to have the mse plots back in GBM (PUBDEV-889)
State change in Flow is not fully validated (PUBDEV-919)
Flows : Not able to load saved flows from hdfs (PUBDEV-872)
Save Function in Flow crashes (PUBDEV-791) (github)
Flow: should throw a proper error msg when user supplied response have more categories than algo can handle (PUBDEV-866)
Flow display of a summary of a column with all missing values fails. (HEXDEV-230)
Split frame UI improvements (HEXDEV-275)
Flow : Decimal point precisions to be consistent to 4 as in h2o1 (PUBDEV-844)
Flow: Prediction frame is outputing junk info (PUBDEV-825)
EC2 => Cluster of 16 nodes => Water Meter => shows blank page (PUBDEV-831)
Flow: Predict - "undefined is not an object (evaluating prediction.thresholds_and_metric_scores.name) (PUBDEV-559)
Flow: inspect getModel for PCA returns error (PUBDEV-610)
Flow, RF: Can't get Predict results; "undefined is not an object (evaluating prediction.confusion_matrices.length)" (PUBDEV-695)
Flow, GBM: getModel is broken -Error processing GET /3/Models.json/gbm-b1641e2dc3-4bad-9f69-a5f4b67051ba null is not an object (evaluating source.length) (PUBDEV-800)

Severi (0.2.2.1) - 4/10/15

New Features

R

Implement /3/Frames/<my_frame>/summary (PUBDEV-6) (github)
add allparameters slot to allow default values to be shown (github)
add log loss accessor (github)

Enhancements

Algorithms

POJO generation: GBM (PUBDEV-713)
POJO generation: DRF (PUBDEV-714)
Compute and Display Hit Ratios (PUBDEV-630) (github)
Add DL POJO scoring (PUBDEV-585)
Allow validation dataset for AutoEncoder (PUDEV-581)
PUBDEV-580: Add log loss to binomial and multinomial model metric (github)
Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
increase tolerance to 2e-3 (was 1e-3 ..failed with 0.001647 relative difference (github)
change tolerance to 1e-3 (github)
Add option to export weights and biases to REST API / Flow. (github)
Add scree plot for H2O PCA models and fix Runit test. (github)
Remove quantiles from the model builders list. (github)
GLM update: added row filtering argument to line search task, fixed issues with dfork/asyncExec (github)
Updated rho-setting in GLM. (github)
No threshold 0.5; use the default (max F1) instead (github)
GLM update: updated initilization, NA row filtering, default lambda is now empty, will be picked based on the fraction of lambda_max. (github)
Updated ADMM solver. (github)
Added makeGLMModel call. (github)
Start with classification error NaN at t=0 for DL, not with 1. (github)
Relax DL POJO relative tolerance to 1e-2. (github)
Override nfeatures() method in DLModelOutput. (github)
Renaming of fields in GLM (github)
GLM: Take out Balance Classes (PUBDEV-795)

API

schema metadata for Map fields should include the key and value types (PUBDEV-753) (github)
schema metadata should include the superclass (PUBDEV-754)
rest api naming convention: n_folds vs ntrees (PUBDEV-737)
schema metadata for Map fields should include the key and value types (PUBDEV-753)
Create REST Endpoint for exposing .java pojo models (PUBDEV-778)

Python

Run GLM from Python (including LBFGS) (HEXDEV-92)
added H2OFrame show(), as_list(), and slicing pyunits (github)
changed solver parameter to "L_BFGS" (github)
added multidimensional slicing of H2OFrames and Exprs. (github)
add h2o.groupby to python interface (github)
added H2OModel.confusionMatrix() to return confusion matrix of a prediction (github)

R

PUBDEV-578, PUBDEV-541, PUBDEV-566. -R client now sends the data frame column names and data types to ParseSetup. -R client can get column names from a parsed frame or a list. -Respects client request for column data types (github)
R: Cannot create new columns through R (PUBDEV-571)
H2O-R: it would be more useful if h2o.confusion matrix reports the actual class labels instead of [,1] and [,2] (PUBDEV-553)
Support both multinomial and binomial CM (github)

System

Flow: Standardize max_iters/max_iterations parameters (PUBDEV-447) (github)
Add ERROR logging level for too-many-retries case (PUBDEV-146) (github)
Simplify checking of cluster health. Just report the status immediately. (github)
reduce timeout (github)
strings can have ' or " beginning (github)
Throw a validation error in flow if any training data cols are non-numeric (github)
Add getHdfsHomeDirectory(). (github)
Added --verbose. (github)

Web UI

PUBDEV-707: nice algo names in the Flow dropdown (full word names) (github)
Unbreak Flow's ConfusionMatrix display. (github)
POJO generation: DL (PUBDEV-715)

Bug Fixes

Algorithms

GLM : Build GLM model with nfolds brings down the cloud => FATAL: unimplemented (PUBDEV-731) (github)
DL : Build DL Model => FATAL: unimplemented: n_folds >= 2 is not (yet) implemented => SHUTSDOWN CLOUD (PUBDEV-727) (github)
GBM => Build GBM model => No enum constant hex.tree.gbm.GBMModel.GBMParameters.Family.AUTO (PUBDEV-723)
GBM: When run with loss = auto with a numeric column get- error :No enum constant hex.tree.gbm.GBMModel.GBMParameters.Family.AUTO (PUBDEV-708) (github)
gbm: does not complain when min_row >dataset size (PUBDEV-694) (github)
GLM: reports wrong residual degrees of freedom (PUBDEV-668)
H2O dev reports less accurate aucs than H2O (PUBDEV-602)
GLM : Build GLM model fails => ArrayIndexOutOfBoundsException (PUBDEV-601)
divide by zero in modelmetrics for deep learning (PUBDEV-568)
GBM: reports 0th tree mse value for the validation set, different than the train set ,When only train sets is provided (PUDEV-561)
GBM: Initial mse in bernoulli seems to be off (PUBDEV-515)
GLM : Build Model fails with Array Index Out of Bound exception (PUBDEV-454) (github)
Custom Functions don't work in apply() in R (PUBDEV-436)
GLM failure: got NaNs and/or Infs in beta on airlines (PUBDEV-362)
MetricBuilderMultinomial.perRow AssertionError while running GBM (HEXDEV-240)
Problems during Train/Test adaptation between Enum/Numeric (HEXDEV-229)
DRF/GBM balance_classes=True throws unimplemented exception (HEXDEV-226) (github)
AUC reported on training data is 0, but should be 1 (HEXDEV-223) (github)
glm pyunit intermittent failure (HEXDEV-199)
Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
get rid of nfolds= param since it's not supported in GLM yet (github)
Fixed degrees of freedom (off by 1) in glm, added test. (github)
GLM fix: fix filtering of rows with NAs and fix in sparse handling. (github)
Fix GLM job fail path to call Job.fail(). (github)
Full AUC computation, bug fixes (github)
Fix ADMM for upper/lower bounds. (updated rho settings + update u-vector in ADMM for intercept) (github)
Few glm fixes (github)
DL : KDD Algebra data set => Build DL model => ArrayIndexOutOfBoundsException (PUBDEV-696)
GBm: Dev vs H2O for depth 5, minrow=10, on prostate, give different trees (PUBDEV-759)
GBM param min_rows doesn't throw exception for negative values (PUBDEV-697)
GBM : Build GBM Model => Too many levels in response column! (java.lang.IllegalArgumentException) => Should display proper error message (PUBDEV-698)
GBM:Got exception 'class java.lang.AssertionError', with msg 'Something is wrong with GBM trees since returned prediction is Infinity (PUBDEV-722)

API

Cannot adapt numeric response to factors made from numbers (PUBDEV-620)
not specifying response_column gets NPE (deep learning build_model()) I think other algos might have same thing (PUBDEV-131)
NPE response has null msg, exception_msg and dev_msg (HEXDEV-225)
Flow :=> Save Flow => On Mac and Windows 8.1 => NodePersistentStorage failure while attempting to overwrite (?) a flow (HEXDEV-202) (github)
the can_build field in ModelBuilderSchema needs values[] to be set (PUBDEV-755)
value field in the field metadata isn't getting serialized as its native type (PUBDEV-756)

Python

python api asfactor() on -1/1 column issue (HEXDEV-203)

R

Rapids: Operations %/% and %% returns Illegal Argument Exception in R (PUBDEV-736)
quantile: H2oR displays wrong quantile values when call the default quantile without specifying the probs (PUBDEV-689)(github)
as.factor: If a user reruns as.factor on an already factor column, h2o should not show an exception (PUBDEV-622)
as.factor works only on positive integers (PUBDEV-617) (github)
H2O-R: model detail lists three mses, the first MSE slot does not contain any info about the model and hence, should be removed from the model details (PUBDEV-605) (github)
H2O-R: Strings: While slicing get Error From H2O: water.DException$DistributedException (PUBDEV-592)
R: h2o.confusionMatrix should handle both models and model metric objects (PUBDEV-590)
R: as.Date not functional with H2O objects (PUBDEV-583) (github)
R: some apply functions don't work on H2OFrame objects (PUBDEV-579) (github)
h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
R: slicing issues (PUBDEV-573)
R: length and is.factor don't work in h2o.ddply (PUBDEV-572) (github)
R: apply(hex, c(1,2), ...) doesn't properly raise an error (PUBDEV-570) (github)
R: Slicing negative indices to negative indices fails (PUBDEV-569) (github)
h2o.ddply: doesn't accept anonymous functions (PUBDEV-567) (github)
ifelse() cannot return H2OFrames in R (PUBDEV-543)
as.h2o loses track of headers (PUBDEV-541)
H2O-R not showing meaningful error msg (PUBDEV-502)
H2O.fail() had better fail (PUBDEV-470) (github)
fix issue in toEnum (github)
fix colnames and new col creation (github)
R: h2o.init() is posting warning messages of an unhealthy cluster when the cluster is fine. (PUBDEV-734)
h2o.split frame is failing (PUBDEV-560)

System

key type failure should fail the request, not the cloud (PUBDEV-739) (github)
Parse => Import Medicare supplier file => Parse = > Illegal argument for field: column_names of schema: ParseV2: string and key arrays' values must be quoted, but the client sent: " (PUBDEV-719)
Overwriting a constant vector with strings fails (PUBDEV-702)
H2O - gets stuck while calculating quantile,no error msg, just keeps running a job that normally takes less than a sec (PUBDEV-685)
Summary and quantile on a column with all missing values should not throw an exception (PUBDEV-673) (github)
View Logs => class java.lang.RuntimeException: java.lang.IllegalArgumentException: File /home2/hdp/yarn/usercache/neeraja/appcache/application_1427144101512_0039/h2ologs/h2o_172.16.2.185_54321-3-info.log does not exist (PUBDEV-600)
Parse: After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)
Parse: Numbers completely parsed wrong (PUBDEV-574)
Flow: converting a column to enum while parsing does not work (PUBDEV-566)
Parse: Fail gracefully when asked to parse a zip file with different files in it (PUBDEV-540)(github)
toDataFrame doesn't support sequence format schema (array, vectorUDT) (PUBDEV-457)
Parse : Parsing random crap gives java.lang.ArrayIndexOutOfBoundsException: 13 (PUBDEV-428)
The quote stripper for column names should report when the stripped chars are not the expected quotes (PUBDEV-424)
import directory with large files,then Frames..really slow and disk grinds. Files are unparsed. Shouldn't be grinding (PUBDEV-98)
NodePersistentStorage gets wiped out when hadoop cluster is restarted (HEXDEV-185)
h2o.exec won't be supported (github)
fixed import issue (github)
fixed init param (github)
fix repeat as.factor NPE (github)
startH2O set to False in init (github)
hang on glm job removal (PUBDEV-726)
Flow - changed column types need to be reflected in parsed data (HEXDEV-189)
water.DException$DistributedException while running kmeans in multinode cluster (PUBDEV-691)
Frame inspection prior to file parsing, corrupts parsing (PUBDEV-425)

Web UI

Flow, DL: Need better fail message if "Autoencoder" and "use_all_factor_levels" are both selected (PUBDEV-724)
When select AUTO while building a gbm model get ERROR FETCHING INITIAL MODEL BUILDER STATE (PUBDEV-595)
Flow : Build h2o-dev-0.1.17.1009 : Building GLM model gives java.lang.ArrayIndexOutOfBoundsException: (PUBDEV-205 (github)
Flow:Summary on flow broken for a long time (PUBDEV-785)

Serre (0.2.1.1) - 3/18/15

Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-serre/1/index.html

New Features

Algorithms

Naive Bayes in H2O-dev (PUBDEV-158)
GLM model output, details from R (HEXDEV-94)
Run GLM Regression from Flow (including LBFGS) (HEXDEV-110)
PCA (PUBDEV-157)
Port Random Forest to h2o-dev (PUBDEV-455)
Enable DRF model output (github)
Add DRF to Flow (Model Output) (PUBDEV-533)
Grid for GBM (github)
Run Deep Learning Regression from Flow (HEXDEV-109)

Python

Add Python wrapper for DRF (PUBDEV-534)

R

Add R wrapper for DRF (PUBDEV-530)

System

Include uploadFile (PUBDEV-299) (github)
Added -flow_dir to hadoop driver (github)

Web UI

Add Flow packs (HEXDEV-190) (PUBDEV-247)
Integrate H2O Help inside Help panel (PUBDEV-108) (github)
Add quick toggle button to show/hide the sidebar (github)
Add New, Open toolbar buttons (github)
Auto-refresh data preview when parse setup input parameters are changed (PUBDEV-532)
Flow: Add playbar with Run, Continue, Pause, Progress controls (HEXDEV-192)
You can now stop/cancel a running flow

Enhancements

Algorithms

Display GLM coefficients only if available (PUBDEV-466)
Add random chance line to RoC chart (HEXDEV-168)
Speed up DLSpiral test. Ignore Neurons test (MatVec) (github)
Use getRNG for Dropout (github)
PUBDEV-598: Add tests for determinism of RNGs (github)
PUBDEV-598: Implement Chi-Square test for RNGs (github)
Add DL model output toString() (github)
Add LogLoss to MultiNomial ModelMetrics (PUBDEV-580)
Print number of categorical levels once we hit >1000 input neurons. (github)
Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the do_classification flag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to use as.factor() in their response to get the desired bernoulli behavior. The score_each_iteration flag has been removed as well. (github)
Fully remove _convert_to_enum in all algos (github)
Port MissingValueInserter EndPoint to h2o-dev. (PUBDEV-465)

API

Display point layer for tree vs mse plots in GBM output (PUBDEV-504)
Rename API inputs/outputs (github)
Rename Inf to Infinity (github)

Python

added H2OFrame.setNames(), H2OFrame.cbind(), H2OVec.cbind(), h2o.cbind(), and pyunit_cbind.py (github)
Make H2OVec.levels() return the levels (github)
H2OFrame.dim(), H2OFrame.append(), H2OVec.setName(), H2OVec.isna() additions. demo pyunit addition (github)

System

Customize H2O web UI port (PUBDEV-483)
Make parse setup interactive (PUBDEV-532)
Added --verbose (github)
Adds some H2OParseExceptions. Removes all H2O.fail in parse (no parse issues should cause a fail)(github)
Allows parse to specify check_headers=HAS_HEADERS, but not provide column names (github)
Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)

Web UI

Add 'Clear cell' and 'Run all cells' toolbar buttons (github)
Add 'Clear cell' and 'Clear all cells' commands (PUBDEV-493) (github)
'Run' button selects next cell after running
ModelMetrics by model category: Clustering (PUBDEV-416)
ModelMetrics by model category: Regression (PUBDEV-415)
ModelMetrics by model category: Multinomial (PUBDEV-414)
ModelMetrics by model category: Binomial (PUBDEV-413)
Add ability to select and delete multiple models (github)
Add ability to select and delete multiple frames (github)
Flows now stop running when an error occurs
Print full number of mismatches during POJO comparison check. (github)
Make Grid multi-node safe (github)
Beautify the vertical axis labels for Flow charts/visualization (more) (PUBDEV-329)

Bug Fixes

Algorithms

GBM only populates either MSE_train or MSE_valid but displays both (PUBDEV-350)
GBM: train error increases after hitting zero on prostate dataset (PUBDEV-513)
GBM : Variable importance displays 0's for response param => should not display response in table at all (PUBDEV-430)
GLM : R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
Flow: GLM - 'model.output.coefficients_magnitude.name' not found, so can't view model (PUBDEV-466)
GBM predict fails without response column (PUBDEV-478)
GBM: When validation set is provided, gbm should report both mse_valid and mse_train (PUBDEV-499)
PCA Assertion Error during Model Metrics (PUBDEV-548) (github)
KMeans: Size of clusters in Model Output is different from the labels generated on the training set (PUBDEV-542) (github)
Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
PUBDEV-580: Fix some numerical edge cases (github)
Fix two missing float -> double conversion changes in tree scoring. (github)
Flow: HIDDEN_DROPOUT_RATIOS for DL does not show default value (PUBDEV-285)
Old GLM Parameters Missing (PUBDEV-431)
GLM: R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)

API

SplitFrame on String column produce C0LChunk instead of CStrChunk (PUBDEV-468)
Error in node$h2o$node : $ operator is invalid for atomic vectors (PUBDEV-348)
Response from /ModelBuilders don't conform to standard error json shape when there are errors (HEXDEV-121) (github)

Python

fix python syntax error (github)
Fixes handling of None in python for a returned na_string. (github)

R

R : Inconsistency - Train set name with and without quotes work but Validation set name with quotes does not work (PUBDEV-491)
h2o.confusionmatrices does not work (PUBDEV-547)
How do i convert an enum column back to integer/double from R? (PUBDEV-546)
Summary in R is faulty (PUBDEV-539)
R: as.h2o should preserve R data types (PUBDEV-578)
NPE in GBM Prediction with Sliced Test Data (HEXDEV-207) (github)
Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
Custom Functions don't work in apply() in R (PUBDEV-436)
got water.DException$DistributedException and then got java.lang.RuntimeException: Categorical renumber task (HEXDEV-195)
H2O-R: as.h2o parses column name as one of the row entries (PUBDEV-591)
R-H2O Managing Memory in a loop (PUB-1125)
h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
H2O-R not showing meaningful error msg

System

Flow: When balance class = F then flow should not show max_after_balance_size = 5 in the parameter listing (PUBDEV-503)
3 jvms, doing ModelMetrics on prostate, class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-495)
Not able to start h2o on hadoop (PUBDEV-487)
one row (one col) dataset seems to get assertion error in parse setup request (PUBDEV-96)
Parse : Import file (move.com) => Parse => First row contains column names => column names not selected (HEXDEV-171) (github)
The NY0 parse rule, in summary. Doesn't look like it's counting the 0's as NAs like h2o (PUBDEV-154)
0 / Y / N parsing (PUBDEV-229)
NodePersistentStorage gets wiped out when laptop is restarted. (HEXDEV-167)
Building a model and making a prediction accepts invalid frame types (PUBDEV-83)
Flow : Import file 15M rows 2.2 Cols => Parse => Error fetching job on UI =>Console : ERROR: Job was not successful Exiting with nonzero exit status (HEXDEV-55)
Flow : Build GLM Model => Family tweedy => class hex.glm.LSMSolver$ADMMSolver$NonSPDMatrixException', with msg 'Matrix is not SPD, can't solve without regularization (PUBDEV-211)
Flow : Import File : File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
Check reproducibility on multi-node vs single-node (PUBDEV-557)
Parse : After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)

Web UI

Flow : Build Model => Parameters => shows meta text for some params (PUBDEV-505)
Flow: K-Means - "None" option should not appear in "Init" parameters (PUBDEV-459)
Flow: PCA - "None" option appears twice in "Transform" list (HEXDEV-186)
GBM Model : Params in flow show two times (PUBDEV-440)
Flow multinomial confusion matrix visualization (HEXDEV-204)
Flow: It would be good if flow can report the actual distribution, instead of just reporting "Auto" in the model parameter listing (PUBDEV-509)
Unimplemented algos should be taken out from drop down of build model (PUBDEV-511)
[MapR] unable to give hdfs file name from Flow (PUBDEV-409)

Selberg (0.2.0.1) - 3/6/15

Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-selberg/1/index.html

New Features

Algorithms

Naive Bayes in H2O-dev (PUBDEV-158)
GLM model output, details from R (HEXDEV-94)
Run GLM Regression from Flow (including LBFGS) (HEXDEV-110)
PCA (PUBDEV-157)
Port Random Forest to h2o-dev (PUBDEV-455)
Enable DRF model output (github)
Add DRF to Flow (Model Output) (PUBDEV-533)
Grid for GBM (github)
Run Deep Learning Regression from Flow (HEXDEV-109)

Python

Add Python wrapper for DRF (PUBDEV-534)

R

Add R wrapper for DRF (PUBDEV-530)

System

Include uploadFile (PUBDEV-299) (github)
Added -flow_dir to hadoop driver (github)

Web UI

Add Flow packs (HEXDEV-190) (PUBDEV-247)
Integrate H2O Help inside Help panel (PUBDEV-108) (github)
Add quick toggle button to show/hide the sidebar (github)
Add New, Open toolbar buttons (github)
Auto-refresh data preview when parse setup input parameters are changed (PUBDEV-532) -Flow: Add playbar with Run, Continue, Pause, Progress controls (HEXDEV-192)
You can now stop/cancel a running flow

Enhancements

The following changes are improvements to existing features (which includes changed default values):

Algorithms

Display GLM coefficients only if available (PUBDEV-466)
Add random chance line to RoC chart (HEXDEV-168)
Allow validation dataset for AutoEncoder (PUDEV-581)
Speed up DLSpiral test. Ignore Neurons test (MatVec) (github)
Use getRNG for Dropout (github)
PUBDEV-598: Add tests for determinism of RNGs (github)
PUBDEV-598: Implement Chi-Square test for RNGs (github)
PUBDEV-580: Add log loss to binomial and multinomial model metric (github)
Add DL model output toString() (github)
Add LogLoss to MultiNomial ModelMetrics (PUBDEV-580)
Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
Print number of categorical levels once we hit >1000 input neurons. (github)
Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the do_classification flag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to use as.factor() in their response to get the desired bernoulli behavior. The score_each_iteration flag has been removed as well. (github)
Fully remove _convert_to_enum in all algos (github)
Add DL POJO scoring (PUBDEV-585)

API

Display point layer for tree vs mse plots in GBM output (PUBDEV-504)
Rename API inputs/outputs (github)
Rename Inf to Infinity (github)

Python

added H2OFrame.setNames(), H2OFrame.cbind(), H2OVec.cbind(), h2o.cbind(), and pyunit_cbind.py (github)
Make H2OVec.levels() return the levels (github)
H2OFrame.dim(), H2OFrame.append(), H2OVec.setName(), H2OVec.isna() additions. demo pyunit addition (github)

R

PUBDEV-578, PUBDEV-541, PUBDEV-566. -R client now sends the data frame column names and data types to ParseSetup. -R client can get column names from a parsed frame or a list. -Respects client request for column data types (github)

System

Customize H2O web UI port (PUBDEV-483)
Make parse setup interactive (PUBDEV-532)
Added --verbose (github)
Adds some H2OParseExceptions. Removes all H2O.fail in parse (no parse issues should cause a fail)(github)
Allows parse to specify check_headers=HAS_HEADERS, but not provide column names (github)
Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)

Web UI

Add 'Clear cell' and 'Run all cells' toolbar buttons (github)
Add 'Clear cell' and 'Clear all cells' commands (PUBDEV-493) (github)
'Run' button selects next cell after running
ModelMetrics by model category: Clustering (PUBDEV-416)
ModelMetrics by model category: Regression (PUBDEV-415)
ModelMetrics by model category: Multinomial (PUBDEV-414)
ModelMetrics by model category: Binomial (PUBDEV-413)
Add ability to select and delete multiple models (github)
Add ability to select and delete multiple frames (github)
Flows now stop running when an error occurs
Print full number of mismatches during POJO comparison check. (github)
Make Grid multi-node safe (github)
Beautify the vertical axis labels for Flow charts/visualization (more) (PUBDEV-329)

Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms

GBM only populates either MSE_train or MSE_valid but displays both (PUBDEV-350)
GBM: train error increases after hitting zero on prostate dataset (PUBDEV-513)
GBM : Variable importance displays 0's for response param => should not display response in table at all (PUBDEV-430)
Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
GLM : R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
Flow: GLM - 'model.output.coefficients_magnitude.name' not found, so can't view model (PUBDEV-466)
GBM predict fails without response column (PUBDEV-478)
GBM: When validation set is provided, gbm should report both mse_valid and mse_train (PUBDEV-499)
PCA Assertion Error during Model Metrics (PUBDEV-548) (github)
KMeans: Size of clusters in Model Output is different from the labels generated on the training set (PUBDEV-542) (github)
Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
divide by zero in modelmetrics for deep learning (PUBDEV-568)
AUC reported on training data is 0, but should be 1 (HEXDEV-223) (github)
GBM: reports 0th tree mse value for the validation set, different than the train set ,When only train sets is provided (PUDEV-561)
PUBDEV-580: Fix some numerical edge cases (github)
Fix two missing float -> double conversion changes in tree scoring. (github)
Problems during Train/Test adaptation between Enum/Numeric (HEXDEV-229)
DRF/GBM balance_classes=True throws unimplemented exception (HEXDEV-226)
Flow: HIDDEN_DROPOUT_RATIOS for DL does not show default value (PUBDEV-285)
Old GLM Parameters Missing (PUBDEV-431)
GLM: R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
GBM: Initial mse in bernoulli seems to be off (PUBDEV-515)

API

SplitFrame on String column produce C0LChunk instead of CStrChunk (PUBDEV-468)
Error in node$h2o$node : $ operator is invalid for atomic vectors (PUBDEV-348)
Response from /ModelBuilders don't conform to standard error json shape when there are errors (HEXDEV-121)

Python

fix python syntax error (github)
Fixes handling of None in python for a returned na_string. (github)

R

R : Inconsistency - Train set name with and without quotes work but Validation set name with quotes does not work (PUBDEV-491)
h2o.confusionmatrices does not work (PUBDEV-547)
How do i convert an enum column back to integer/double from R? (PUBDEV-546)
Summary in R is faulty (PUBDEV-539)
Custom Functions don't work in apply() in R (PUBDEV-436)
R: as.h2o should preserve R data types (PUBDEV-578)
as.h2o loses track of headers (PUBDEV-541)
NPE in GBM Prediction with Sliced Test Data (HEXDEV-207) (github)
Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
Custom Functions don't work in apply() in R (PUBDEV-436)
got water.DException$DistributedException and then got java.lang.RuntimeException: Categorical renumber task (HEXDEV-195)
h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
R: h2o.confusionMatrix should handle both models and model metric objects (PUBDEV-590)
H2O-R: as.h2o parses column name as one of the row entries (PUBDEV-591)

System

Flow: When balance class = F then flow should not show max_after_balance_size = 5 in the parameter listing (PUBDEV-503)
3 jvms, doing ModelMetrics on prostate, class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-495)
Not able to start h2o on hadoop (PUBDEV-487)
one row (one col) dataset seems to get assertion error in parse setup request (PUBDEV-96)
Parse : Import file (move.com) => Parse => First row contains column names => column names not selected (HEXDEV-171) (github)
The NY0 parse rule, in summary. Doesn't look like it's counting the 0's as NAs like h2o (PUBDEV-154)
0 / Y / N parsing (PUBDEV-229)
NodePersistentStorage gets wiped out when laptop is restarted. (HEXDEV-167)
Parse : Parsing random crap gives java.lang.ArrayIndexOutOfBoundsException: 13 (PUBDEV-428)
Flow: converting a column to enum while parsing does not work (PUBDEV-566)
Parse: Numbers completely parsed wrong (PUBDEV-574)
NodePersistentStorage gets wiped out when hadoop cluster is restarted (HEXDEV-185)
Parse: Fail gracefully when asked to parse a zip file with different files in it (PUBDEV-540)(github)
Building a model and making a prediction accepts invalid frame types (PUBDEV-83)
Flow : Import file 15M rows 2.2 Cols => Parse => Error fetching job on UI =>Console : ERROR: Job was not successful Exiting with nonzero exit status (HEXDEV-55)
Flow : Build GLM Model => Family tweedy => class hex.glm.LSMSolver$ADMMSolver$NonSPDMatrixException', with msg 'Matrix is not SPD, can't solve without regularization (PUBDEV-211)
Flow : Import File : File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
Check reproducibility on multi-node vs single-node (PUBDEV-557)
Parse: After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)

Web UI

Flow : Build Model => Parameters => shows meta text for some params (PUBDEV-505)
Flow: K-Means - "None" option should not appear in "Init" parameters (PUBDEV-459)
Flow: PCA - "None" option appears twice in "Transform" list (HEXDEV-186)
GBM Model : Params in flow show two times (PUBDEV-440)
Flow multinomial confusion matrix visualization (HEXDEV-204)
Flow: It would be good if flow can report the actual distribution, instead of just reporting "Auto" in the model parameter listing (PUBDEV-509)
Unimplemented algos should be taken out from drop down of build model (PUBDEV-511)
[MapR] unable to give hdfs file name from Flow (PUBDEV-409)

Selberg (0.2.0.1) - 3/6/15

Download at: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-selberg/1/index.html

New Features

Web UI

Flow: Delete functionality to be available for import files, jobs, models, frames (PUBDEV-241)
Implement "Download Flow" (PUBDEV-407)
Flow: Implement "Run All Cells" (PUBDEV-110)

API

Create python package (PUBDEV-181)
as.h2o in Python (HEXDEV-72)

System

Add a README.txt to the hadoop zip files (github)
Build a cdh5.2 version of h2o (github)

Enhancements

Web UI

Flow: Job view should have info on start and end time (PUBDEV-267)
Flow: Implement 'File > Open' (PUBDEV-408)
Display IP address in ADMIN -> Cluster Status (HEXDEV-159)
Flow: Display alternate UI for splitFrames() (PUBDEV-399)

Algorithms

Added K-Means scoring (github)
Flow: Implement model output for Deep Learning (PUBDEV-118)
Flow: Implement model output for GLM (PUBDEV-120)
Deep Learning model output (HEXDEV-89, Flow),(HEXDEV-88, Python),(HEXDEV-87, R)
Run GLM Binomial from Flow (including LBFGS) (HEXDEV-90)
Flow: Display confusion matrices for multinomial models (PUBDEV-397)
During PCA, missing values in training data will be replaced with column mean (github)
Update parameters for best model scan (github)
Change Quantiles to match h2o-1; both Quantiles and Rollups now have the same default percentiles (github)
Massive cleanup and removal of old PCA, replacing with quadratically regularized PCA based on alternating minimization algorithm in GLRM (github)
Add model run time to DL Model Output (github)
Don't gather Neurons/Weights/Biases statistics (github)
Only store best model if override_with_best_model is enabled (github)
beta_eps added, passing tests changed (github)
For GLM, default values for max_iters parameter were changed from 1000 to 50.
For quantiles, probabilities are displayed.
Run Deep Learning Multinomial from Flow (HEXDEV-108)

API

Expose DL weights/biases to clients via REST call (PUBDEV-344)
Flow: Implement notification bar/API (PUBDEV-359)
Variable importance data in REST output for GLM (PUBDEV-359)
Add extra DL parameters to R API (average_activation, sparsity_beta, max_categorical_features, reproducible) (github)
Update GLRM API model output (github)
h2o.anomaly missing in R (PUBDEV-434)
No method to get enum levels (PUBDEV-432)

System

Improve memory footprint with latest version of h2o-dev (github)
For now, let model.delete() of DL delete its best models too. This allows R code to not leak when only calling h2o.rm() on the main model. (github)
Bind both TCP and UDP ports before clustering (github)
Round summary row#. Helps with pctiles for very small row counts. Add a test to check for getting close to the 50% percentile on small rows. (github)
Increase Max Value size in DKV to 256MB (github)
Flow: make parseRaw() do both import and parse in sequence (HEXDEV-184)
Remove notion of individual job/job tracking from Flow (PUBDEV-449)
Capability to name prediction results Frame in flow (PUBDEV-233)

Bug Fixes

Algorithms

GLM binomial prediction failing (PUBDEV-403)
DL: Predict with auto encoder enabled gives Error processing error (PUBDEV-433)
balance_classes in Deep Learning intermittent poor result (PUBDEV-437)
Flow: Building GLM model fails (PUBDEV-186)
summary returning incorrect 0.5 quantile for 5 row dataset (PUBDEV-95)
GBM missing variable importance and balance-classes (PUBDEV-309)
H2O Dev GBM first tree differs from H2O 1 (PUBDEV-421)
get glm model from flow fails to find coefficient name field (PUBDEV-394)
GBM/GLM build model fails on Hadoop after building 100% => Failed to find schema for version: 3 and type: GBMModel (PUBDEV-378)
Parsing KDD wrong (PUBDEV-393)
GLM AIOOBE (PUBDEV-199)
Flow : Build GLM Model with family poisson => java.lang.ArrayIndexOutOfBoundsException: 1 at hex.glm.GLM$GLMLambdaTask.needLineSearch(GLM.java:359) (PUBDEV-210)
Flow : GLM Model Error => Enum conversion only works on small integers (PUBDEV-365)
GLM binary response, do_classfication=FALSE, family=binomial, prediction error (PUBDEV-339)
Epsilon missing from GLM parameters (PUBDEV-354)
GLM NPE (PUBDEV-395)
Flow: GLM bug (or incorrect output) (PUBDEV-252)
GLM binomial prediction failing (PUBDEV-403)
GLM binomial on benign.csv gets assertion error in predict (PUBDEV-132)
current summary default_pctiles doesn't have 0.001 and 0.999 like h2o1 (PUBDEV-94)
Flow: Build GBM/DL Model: java.lang.IllegalArgumentException: Enum conversion only works on integer columns (PUBDEV-213) (github)
ModelMetrics on cup98VAL_z dataset has response with many nulls (PUBDEV-214)
GBM : Predict model category output/inspect parameters shows as Regression when model is built with do classification enabled (PUBDEV-441)
Fix double-precision DRF bugs (github)

System

Null columnTypes for /smalldata/arcene/arcene_train.data (PUBDEV-406) (github)
Flow: Waiting for -1 responses after starting h2o on hadoop cluster of 5 nodes (PUBDEV-419)
Parse: airlines_all.csv => Airtime type shows as ENUM instead of Integer (PUBDEV-426) (github)
Flow: Typo - "Time" option displays twice in column header type menu in Parse (PUBDEV-446)
Duplicate validation messages in k-means output (PUBDEV-305) (github)
Fixes Parse so that it returns to supplying generic column names when no column names exist (github)
Flow: Import File: File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
Flow: Parse => 1m.svm hangs at 42% (HEXDEV-174)
Prediction NFE (PUBDEV-308)
NPE doing Frame to key before it's fully parsed (PUBDEV-79)
h2o_master_DEV_gradle_build_J8 #351 hangs for past 17 hrs (PUBDEV-239)
Sparkling water - container exited due to unavailable port (PUBDEV-357)

API

Flow: Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410) (github)
Incorrect dest.type, description in /CreateFrame jobs (PUBDEV-404)
space in windows filename on python (PUBDEV-444) (github)
Python end-to-end data science example 1 runs correctly (PUBDEV-182)
3/NodePersistentStorage.json/foo/id should throw 404 instead of 500 for 'not-found' (HEXDEV-163)
POST /3/NodePersistentStorage.json should handle Content-Type:multipart/form-data (HEXDEV-165)
by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-92)
Sparkling water : val train:DataFrame = prostateRDD => Fails with ArrayIndexOutOfBoundsException (PUBDEV-392)
Flow : getModels produces error: Error calling GET /3/Models.json (PUBDEV-254)
Flow : Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410)
ddply 'Could not find the operator' (HEXDEV-162) (github)
h2o.table AIOOBE during NewChunk creation (HEXDEV-161) (github)
Fix warning in h2o.ddply when supplying multiple grouping columns (github)

0.1.26.1051 - 2/13/15

New Features

Flow: Display alternate UI for splitFrames() (PUBDEV-399)

Enhancements

System

Embedded H2O config can now provide flat file (needed for Hadoop) (github)
Don't logging GET of individual jobs to avoid filling up the logs (github)

Algorithms

Increase GBM/DRF factor binning back to historical levels. Had been capped accidentally at nbins (typically 20), was intended to support a much higher cap. (github)
Tweaked rho heuristic in glm (github)
Enable variable importances for autoencoders (github)
Removed group_split option from GBM
Flow: display varimp for GBM output (PUBDEV-398)
variable importance for GBM (github)
GLM in H2O-Dev may provide slightly different coefficient values when applying an L1 penalty in comparison with H2O1.

Bug Fixes

Algorithms

Fixed bug in GLM exception handling causing GLM jobs to hang (github)
Fixed a bug in kmeans input parameter schema where init was always being set to Furthest (github)
Fixed mean computation in GLM (github)
Fixed kmeans.R (github)
Flow: Building GBM model fails with Error executing javascript (PUBDEV-396)

System

DataFrame propagates absolute path to parser (github)
Fix flow shutdown bug (github)

0.1.26.1032 - 2/6/15

New Features

General Improvements

better model output
support for Python client
support for Maven
support for Sparkling Water
support for REST API schema
support for Hadoop CDH5 (github)

UI

Display summary visualizations by default in column summary output cells (PUBDEV-337)
Display AUC curve by default in binomial prediction output cells (PUBDEV-338)
Flow: Implement About H2O/Flow with version information (PUBDEV-111)
Add UI for CreateFrame (PUBDEV-218)
Flow: Add ability to cancel running jobs (PUBDEV-373)
Flow: warn when user navigates away while having unsaved content (PUBDEV-322)

Algorithms

Implement splitFrame() in Flow (PUBDEV-356)
Variable importance graph in Flow for GLM (PUBDEV-360)
Flow: Implement model building form init and validation (PUBDEV-102)
Added a shuffle-and-split-frame function; Use it to build a saner model on time-series data (github)
Added binomial model metrics (github)
Run KMeans from R (HEXDEV-105)
Be able to create a new GLM model from an existing one with updated coefficients (HEXDEV-48)
Run KMeans from Python (HEXDEV-106)
Run Deep Learning Binomial from Flow (HEXDEV-83)
Run KMeans from Flow (HEXDEV-104)
Run Deep Learning from Python (HEXDEV-85)
Run Deep Learning from R (HEXDEV-84)
Run Deep Learning Multinomial from Flow (HEXDEV-108)
Run Deep Learning Regression from Flow (HEXDEV-109)

API

Flow: added REST API documentation to the web ui (PUBDEV-60)
Flow: Implement visualization API (PUBDEV-114)

System

Dataset inspection from Flow (HEXDEV-66)
Basic data munging (Rapids) from R (HEXDEV-70)
Implement stack operator/stacking in Lightning (HEXDEV-128)

Enhancements

UI

Added better message when h2o.init() not yet called (No active connection to an H2O cluster. Try calling "h2o.init()") (github)

Algorithms

Updated column-based gradient task to use sparse interface (github)
Updated LBFGS (added progress monitor interface, updated some default params), added progress and job support to GLM lbfgs (github)
Added pretty print (github)
Added AutoEncoder to R model categories (github)
Added Coefficients table to GLM model (github)
Updated glm lbfgs to allow for efficient lambda-search (l2 penalty only) (github)
Removed splitframe shuffle parameter (github)
Simplified model builders and added deeplearning model builder (github)
Add DL model outputs to Flow (PUBDEV-372)
Flow: Deep Learning: Expert Mode (PUBDEV-284)
Flow: Display multinomial and regression DL model outputs (PUBDEV-383)
Display varimp details for DL models (PUBDEV-381)
Make binomial response "0" and "1" by default (github)
Add Coefficients table to GLM model (github)
Removed splitframe shuffle parameter (github)
Update R GBM demos to reflect new input parameter names (github)
Rename GLM variable importance to normalized coefficient magnitudes (github)

API

Changed key to destination_key (github)
Cleaned up REST API schema interface (github)
Changed method name, cleaned setup, added a pyunit runner (github)

System

Allow changing column types during parse-setup (PUBDEV-376)
Display %NAs in model builder column lists (PUBDEV-375)
Figure out how to add H2O to PyPl (PUBDEV-178)

Bug Fixes

UI

Flow: Parse => 1m.svm hangs at 42% (PUBDEV-345)
cup98 Dataset has columns that prevent validation/prediction (PUBDEV-349)
Flow: predict step failed to function (PUBDEV-217)
Flow: Arrays of numbers (ex. hidden in deeplearning)require brackets (PUBDEV-303)
F

Files

Changes.md

Latest commit

History

Changes.md

File metadata and controls

Recent Changes

H2O

Zeno (3.30.1.1) - 8/10/2020

Bug

New Feature

Task

Improvement

Technical Task

Engineering Story

Docs

Zahradnik (3.30.0.7) - 7/21/2020

New Feature

Task

Improvement

Zahradnik (3.30.0.6) - 6/30/2020

Bug

New Feature

Docs

Zahradnik (3.30.0.5) - 6/18/2020

Bug

New Feature

Improvement

Docs

Zahradnik (3.30.0.4) - 6/1/2020

Bug

New Feature

Improvement

Docs

Zahradnik (3.30.0.3) - 5/12/2020

Bug

Improvement

Docs

Zahradnik (3.30.0.2) - 4/28/2020

Bug

New Feature

Improvement

Docs

Zahradnik (3.30.0.1) - 4/3/2020

Bug

New Feature

Task

Improvement

Docs

Yule (3.28.1.3) - 4/2/2020

Bug

Story

Task

Improvement

Docs

Yule (3.28.1.2) - 3/17/2020

Bug

New Feature

Improvement\

Docs

Yule (3.28.1.1) - 3/5/2020

Bug

Engineering Story

Docs

Yu (3.28.0.4) - 2/23/2020

Bug

New Feature

Improvement

Docs

Yu (3.28.0.3) - 2/5/2020

Bug

New Feature

Task

Improvement

Engineering Story

Docs

Yu (3.28.0.2) - 1/20/2020

Bug

Task

Improvement