Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of analyze_vars with specifics for .stat_names #1361

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

Melkiades
Copy link
Contributor

Pull Request

Fixes #1352

@Melkiades Melkiades marked this pull request as draft December 11, 2024 14:14
all obs
—————————————————————
n 4
Mean (SD) 2.5 (1.3)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be NAs

n 5
a 2 (40%)
b 1 (20%)
NA 2 (40%)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost NA

8 mean_ci (4.57, 5.69) 0 Mean 95% CI
9 mean_sei (4.89, 5.38) 0 Mean -/+ 1xSE
10 mean_sdi (4.35, 5.91) 0 Mean -/+ 1xSD
2 sum 48.7 0 Sum
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this calc changed?

R/analyze_variables.R Outdated Show resolved Hide resolved
@Melkiades Melkiades marked this pull request as ready for review December 16, 2024 16:00
#'
#' @return A `list` with modified elements `x`, `.formats`, `.labels`, and `.indent_mods`.
#' @return A `list` with modified elements `stat_out`, `.formats`, `.labels`, `.levels`, and `.indent_mods`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to fix: ungroup_stats should be removed in favor of get_*_from_stats

Comment on lines +296 to -303
empty_pval <- "pval" %in% names(stat_out) && length(stat_out[["pval"]]) == 0
empty_pval_counts <- "pval_counts" %in% names(stat_out) && length(stat_out[["pval_counts"]]) == 0
stat_out <- unlist(stat_out, recursive = FALSE)

# If p-value is empty it is removed by unlist and needs to be re-added
if (empty_pval) x[["pval"]] <- character()
if (empty_pval_counts) x[["pval_counts"]] <- character()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are if pval is NULL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pval handling to be refactored

n 5
FALSE 1 (20%)
TRUE 2 (40%)
NA 2 (40%)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lost NA


if (na.rm) {
if (na_rm) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

na_rm can be reverted into na.rm if you think it is just creating issues downstream

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have always thought it would be nicer for it to be na_rm but na.rm was the standard so we never changed. I'm not sure if this will cause any issues but if not it would be nice to change, or at least start a deprecation cycle.

Comment on lines +252 to +253
# Compare with reference group
if (isTRUE(compare_with_ref_group)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare functions have been deprecated and merged in here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a clearer error message/check for the presence of a reference group?

)

y$n_blq <- sum(grepl("BLQ|LTR|<[1-9]|<PCLLOQ", x))
y$n_blq <- list("n_blq" = c("n_blq" = sum(grepl("BLQ|LTR|<[1-9]|<PCLLOQ", x))))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useful to get_stat_names

dots_extra_args <- list(...)

# Check if there are user-defined functions
default_and_custom_stats_list <- .split_std_from_custom_stats(.stats)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is possible to add custom stats

Comment on lines -469 to -475
.N_col, # nolint
.N_row, # nolint
.var = NULL,
.df_row = NULL,
.ref_group = NULL,
.in_ref_col = FALSE,
compare = FALSE,
Copy link
Contributor Author

@Melkiades Melkiades Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these vars are broadcasted from analyze through ...

# Auto format handling
.formats <- apply_auto_formatting(.formats, x_stats, .df_row, .var)
# Get and check statistical names from defaults
.stat_names <- get_stat_names(x_stats, .stat_names_in) # note is x_stats
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core stat_names modification

@@ -140,7 +141,7 @@ afun_riskdiff <- function(df,
stat <- ifelse("count_fraction" %in% names(s_x), "count_fraction", "unique")
if ("flag_variables" %in% names(s_args)) {
var_nms <- s_args$flag_variables
} else if (!is.null(names(s_x[[stat]]))) {
} else if (is.list(s_x[[stat]]) && !is.null(names(s_x[[stat]]))) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now statistics are character vectors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*named numeric vectors

@@ -351,6 +387,72 @@ get_indents_from_stats <- function(stats, indents_in = NULL, row_nms = NULL) {
out
}

# Function to loop over each stat and levels to set correct values
.adjust_stats_desc_by_in_def <- function(levels_per_stats, user_in, tern_defaults) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only to be extra sure that you can change the labels in anyway you want (to do add this for formats and indentation)

@@ -1361,7 +1603,7 @@
----------------------------
row_name formatted_cell indent_mod row_label
1 pval <0.0001 3 pvalue
2 median_ci -0.41 - 1.10 3 Median 95% CI
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why changed?

@shajoezhu shajoezhu requested review from edelarua and BFalquet January 9, 2025 01:11
Copy link
Contributor

@edelarua edelarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!! Just a few comments from me regarding some improvements that could be made to the documentation, but the code all seems good!

Comment on lines +252 to +253
# Compare with reference group
if (isTRUE(compare_with_ref_group)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a clearer error message/check for the presence of a reference group?


if (na.rm) {
if (na_rm) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have always thought it would be nicer for it to be na_rm but na.rm was the standard so we never changed. I'm not sure if this will cause any issues but if not it would be nice to change, or at least start a deprecation cycle.

.var,
...) {
checkmate::assert_flag(na.rm)
s_summary <- function(x, denom, control, ...) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... is still described as "additional parameters passed to s_summary" which I think should be updated with more detail since a lot of the additional parameters were collapsed into .... Maybe move the s_summary methods to their own file since there are a lot of them and their param descriptions no longer align very well with analyze_vars and the help file is getting pretty long? What do you think?

Also, since the description for na.rm was in the argument_convention.R file you should also add one for na_rm there.

.N_col, # nolint
...) {
if (na.rm) x <- x[!is.na(x)]
s_summary.logical <- function(x, denom = c("n", "N_col", "N_row"), ...) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why denom wasn't put into the ... here but was in the other methods?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

return stats in a named list
3 participants