Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer to category #17

Closed
MarkoMekapses opened this issue Dec 27, 2024 · 2 comments
Closed

Integer to category #17

MarkoMekapses opened this issue Dec 27, 2024 · 2 comments

Comments

@MarkoMekapses
Copy link

Not sure that it's an issue, but as_categorical throws an error in converting integers to categories, like so
cars = DataFrame(cylinders=[2, 4, 6, 8]) @mutate cars cylinders = as_categorical(cylinders)
throws
MethodError: Cannot convertan object of type Int32 to an object of type String The functionconvert exists, but no method is defined for this combination of argument types.
I understand why this is the case... we could want to depict Int32 as octal, or hex, or whatever. But it seems a common enough problem (cf dataframe column names telling you to cast to Symbol) that there might already be a proper way to deal with this in TidierCats. If not, I propose that there should be.

@drizk1
Copy link
Member

drizk1 commented Dec 27, 2024

there is not a method for that right now.

would something as simple as this solve the problem

function as_categorical(arr::AbstractArray)
    T = eltype(arr)
    if T <: Number
        # keep numeric data as numeric categories
        return CategoricalArray{Union{Missing, T}}(arr)
    else
        # fallback: treat them as strings
        return CategoricalArray(map(x -> ismissing(x) ? missing : string(x), arr))
    end
end
julia> @mutate cars cylinders = as_categorical(cylinders)
4×1 DataFrame
 Row │ cylinders 
     │ Cat…?     
─────┼───────────
   1 │ 2
   2 │ 4
   3 │ 6
   4 │ 8

@drizk1
Copy link
Member

drizk1 commented Jan 9, 2025

@MarkoMekapses This should be fixed in v.2.0 now. Give it a shot and let me know if you run into any issues.

Closing this issue for now but reopen or open a new as things come up.

@drizk1 drizk1 closed this as completed Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants