`nira`: Near-infrared data analysis

Last update: 10.10.2020

Author: M.Sc André Dantas de Medeiros (Universidade Federal de viçosa). Email: [email protected]

This package uses functions developed from the following packages: pls, prospectr, dplyr, reshape2, ggplot2, ggpubr, factoextra, FactoMineR, readxl, plyr, caret, e1071. We thank the creators of these useful packages.

Warning:

The data structure must be in Dataframe format. Use the following command to transform it into a dataframe: as.data.frame().
The last column must contain the response variable in factor orcharacter format.
The package is under development and was made for the specific needs of our laboratory.

Installation and loading

install the latest version from GitHub as follow:

#Install
if(!require(devtools)) install.packages("devtools")
if(!require(nira)) devtools::install_github("admedeiros/nira")

#Load package
library(nira)

For package update

if(!require(devtools)) install.packages("devtools")
if(require(nira)) remove.packages("nira")
.rs.restartR()
if(!require(nira)) devtools::install_github("admedeiros/nira")
#Load package
library(nira)

Main functions:

import_csv(): Import and merge CSV NIR files

plotmean_df(): Create graphics with the average of the spectra class.

plotraw_df(): Create graphics with the raw of the spectra class.

center_df(): Center the data on the average.

autoscaling_df(): Autoscaling the data using the Z-score transformation.

snv_df(): Standard Normal Variate (SNV) on near-infrared (NIR) data.

msc_df(): Multiplicative Scatter Correction (MSC) on near-infrared (NIR) data.

der_df(): Applying derivatives with Savitzky-Golay smoothing..

pca_df(): Plots a graph of the principal component analysis.

classification_df(): Function to develop classification models

load_classifier_df(): Function to load and apply classifier created with the classification_df function

prediction_df_df(): Function to apply the classifier to new data without labels

Examples

Spectra Visualization

library(nira)
library(ggpubr) #for ggarrange
# raw spectra
a<-plotraw_df(nir_seed)
# mean spectra per class
b<-plotmean_df(nir_seed)
# center
c<-plotraw_df(center_df(nir_seed))
d<-plotmean_df(center_df(nir_seed))
# Autoscaling
e<-plotraw_df(autoscaling_df(nir_seed))
f<-plotmean_df(autoscaling_df(nir_seed))
# SNV
g<-plotraw_df(snv_df(nir_seed))
h<-plotmean_df(snv_df(nir_seed))
# MSC
i<-plotraw_df(msc_df(nir_seed))
j<-plotmean_df(msc_df(nir_seed))
# 1st derivative with Savitzky-Golay smoothing
k<-plotraw_df(der_SG(nir_seed,1))
l<-plotmean_df(der_SG(nir_seed,1))
# 2nd derivative with Savitzky-Golay smoothing
m<-plotraw_df(der_SG(nir_seed,2))
n<-plotmean_df(der_SG(nir_seed,2))

ggarrange(a, b,c,d,e,f,g,h,i,j,k,l,m,n,
  labels = c("Raw spectra", "Mean spectra", "Center", "Center","Autoscaling","Autoscaling","SNV","SNV","MSC", "MSC","1st derivative", "1st derivative", "2nd derivative", "2nd derivative" ),
  ncol = 2, nrow = 7)

View of the exploratory principal component analysis

pca_df(nir_seed,1,2)

Classification

data(nir_seed)
classification<-classification_df(nir_seed,metrics = 2)

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

Cross-validation

classification$`Cross-validation`

## Partial Least Squares 
## 
## 210 samples
## 312 predictors
##   3 classes: 'High_quality', 'Intermediate_quality', 'Low_quality' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 189, 189, 189, 189, 189, 189, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  logLoss    AUC        prAUC      Accuracy   Kappa      Mean_F1  
##    1     1.0659706  0.6503401  0.4292712  0.4650794  0.1976190        NaN
##    2     1.0266903  0.7137188  0.4718326  0.5952381  0.3928571  0.5609619
##    3     0.9235199  0.8267574  0.5867059  0.6285714  0.4428571  0.5957456
##    4     0.8862659  0.8396825  0.6121077  0.6666667  0.5000000  0.6461700
##    5     0.8254675  0.9018141  0.6930179  0.7761905  0.6642857  0.7666811
##    6     0.7982458  0.9240363  0.7366469  0.7952381  0.6928571  0.7891360
##    7     0.7491364  0.9557823  0.7852269  0.8761905  0.8142857  0.8730127
##    8     0.7309549  0.9699546  0.8073060  0.8904762  0.8357143  0.8879721
##    9     0.7056253  0.9797052  0.8254200  0.9015873  0.8523810  0.8999919
##   10     0.6879264  0.9858277  0.8337702  0.9317460  0.8976190  0.9304002
##   Mean_Sensitivity  Mean_Specificity  Mean_Pos_Pred_Value  Mean_Neg_Pred_Value
##   0.4650794         0.7325397               NaN            0.7591309          
##   0.5952381         0.7976190         0.5715248            0.8422302          
##   0.6285714         0.8142857         0.6160619            0.8359405          
##   0.6666667         0.8333333         0.6617188            0.8451206          
##   0.7761905         0.8880952         0.7884099            0.8958894          
##   0.7952381         0.8976190         0.8036552            0.9033291          
##   0.8761905         0.9380952         0.8847002            0.9417733          
##   0.8904762         0.9452381         0.8981834            0.9484829          
##   0.9015873         0.9507937         0.9072046            0.9531868          
##   0.9317460         0.9658730         0.9393342            0.9684278          
##   Mean_Precision  Mean_Recall  Mean_Detection_Rate  Mean_Balanced_Accuracy
##         NaN       0.4650794    0.1550265            0.5988095             
##   0.5715248       0.5952381    0.1984127            0.6964286             
##   0.6160619       0.6285714    0.2095238            0.7214286             
##   0.6617188       0.6666667    0.2222222            0.7500000             
##   0.7884099       0.7761905    0.2587302            0.8321429             
##   0.8036552       0.7952381    0.2650794            0.8464286             
##   0.8847002       0.8761905    0.2920635            0.9071429             
##   0.8981834       0.8904762    0.2968254            0.9178571             
##   0.9072046       0.9015873    0.3005291            0.9261905             
##   0.9393342       0.9317460    0.3105820            0.9488095             
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was ncomp = 10.

Train results

classification$Training_results

##               Accuracy                  Kappa                Mean_F1 
##              0.9666667              0.9500000              0.9665198 
##       Mean_Sensitivity       Mean_Specificity    Mean_Pos_Pred_Value 
##              0.9666667              0.9833333              0.9668364 
##    Mean_Neg_Pred_Value         Mean_Precision            Mean_Recall 
##              0.9835141              0.9668364              0.9666667 
##    Mean_Detection_Rate Mean_Balanced_Accuracy 
##              0.3222222              0.9750000

Test results

classification$Testing_results

##               Accuracy                  Kappa                Mean_F1 
##              0.9777778              0.9666667              0.9777778 
##       Mean_Sensitivity       Mean_Specificity    Mean_Pos_Pred_Value 
##              0.9777778              0.9888889              0.9777778 
##    Mean_Neg_Pred_Value         Mean_Precision            Mean_Recall 
##              0.9888889              0.9777778              0.9777778 
##    Mean_Detection_Rate Mean_Balanced_Accuracy 
##              0.3259259              0.9833333

PLS-DA Plot

classification$plsplot

Variable importance

classification$`Variable importance`

## kernelpls variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 312)
## 
##             High_quality Intermediate_quality Low_quality
## 2499.019759       21.486                86.56      100.00
## 2487.033849       19.938                41.06       65.81
## 2103.637378        6.449                51.50       38.96
## 2095.137688        5.984                49.68       37.92
## 2475.162977       16.837                31.14       47.99
## 2086.706843        5.483                45.98       35.71
## 1362.803133        3.437                45.85       35.52
## 2112.206314        6.868                42.66       32.62
## 2078.343145        4.991                41.36       33.61
## 1902.923499        4.935                38.93       27.27
## 2463.404282       13.069                23.00       38.33
## 2330.546543        4.119                37.15       25.30
## 2120.844894        7.765                34.00       28.34
## 2201.896273        2.391                32.73       26.82
## 2070.046224        4.823                32.71       29.98
## 2451.756782       10.717                19.10       32.30
## 1198.95178         1.572                32.30       16.21
## 2005.983045        2.132                32.03       21.53
## 2211.286184        1.471                31.99       24.44
## 2320.118864        3.536                31.90       19.19

Load the pre-trained model and validate their performance in external labeled data

data(nir_seed)
load_classifier_df(nir_seed)

## Confusion Matrix and Statistics
## 
##                       
## previsoes              High_quality Intermediate_quality Low_quality
##   High_quality                  100                    1           0
##   Intermediate_quality            0                   94           3
##   Low_quality                     0                    5          97
## 
## Overall Statistics
##                                           
##                Accuracy : 0.97            
##                  95% CI : (0.9438, 0.9862)
##     No Information Rate : 0.3333          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.955           
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: High_quality Class: Intermediate_quality
## Sensitivity                       1.0000                      0.9400
## Specificity                       0.9950                      0.9850
## Pos Pred Value                    0.9901                      0.9691
## Neg Pred Value                    1.0000                      0.9704
## Prevalence                        0.3333                      0.3333
## Detection Rate                    0.3333                      0.3133
## Detection Prevalence              0.3367                      0.3233
## Balanced Accuracy                 0.9975                      0.9625
##                      Class: Low_quality
## Sensitivity                      0.9700
## Specificity                      0.9750
## Pos Pred Value                   0.9510
## Neg Pred Value                   0.9848
## Prevalence                       0.3333
## Detection Rate                   0.3233
## Detection Prevalence             0.3400
## Balanced Accuracy                0.9725

Load the pre-trained model and use on data without labeling

data(nir_seed)
pc<-prediction_df(nir_seed[,-313])
pca_df(pc)

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
R		R
README_files/figure-gfm		README_files/figure-gfm
build		build
data		data
logo		logo
man		man
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
nira_0.2.0.tar.gz		nira_0.2.0.tar.gz
niradm.Rproj		niradm.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`nira`: Near-infrared data analysis

Installation and loading

For package update

Main functions:

Examples

Spectra Visualization

View of the exploratory principal component analysis

Classification

Cross-validation

Train results

Test results

PLS-DA Plot

Variable importance

Load the pre-trained model and validate their performance in external labeled data

Load the pre-trained model and use on data without labeling

About

Releases

Packages

Languages

License

andremedeiros94/nira

Folders and files

Latest commit

History

Repository files navigation

nira: Near-infrared data analysis

Installation and loading

For package update

Main functions:

Examples

Spectra Visualization

View of the exploratory principal component analysis

Classification

Cross-validation

Train results

Test results

PLS-DA Plot

Variable importance

Load the pre-trained model and validate their performance in external labeled data

Load the pre-trained model and use on data without labeling

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`nira`: Near-infrared data analysis

Packages