Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More issues with non-ascii characters in paths #547

Closed
pdeffebach opened this issue Jan 27, 2025 · 2 comments
Closed

More issues with non-ascii characters in paths #547

pdeffebach opened this issue Jan 27, 2025 · 2 comments

Comments

@pdeffebach
Copy link

Hello all,

I am running into a hard-to-reproduce issue regarding vroom and non-ascii characters in paths.

I have a large collection of .csv files from various cities across brazil, which therefore include many non-ascii characters. One example is the path

"/home/peterwd/Dropbox/personTripBoston/raw/non_jica_admin_data/brazil/raw_incomes/AC_20171016/AC/Base informaçoes setores2010 universo AC/CSV/Basico_AC.csv"

When using read_csv mapping over many of these files, I get the following error

Error in `map()`:
i In index: 1.
Caused by error:
! '/projectnb/econdept/peterwd/Dropbox/personTripBoston//raw/non_jica_admin_data/brazil/raw_incomes/AC_20171016/AC/Base informa<c3><a7>oes setores2010 universo AC/CSV/Basico_AC.csv' does not exist.
Backtrace:
     x
  1. +-... %>% bind_rows()
  2. +-dplyr::bind_rows(.)
  3. | -rlang::list2(...)
  4. +-purrr::map(...)
  5. | -purrr:::map_("list", .x, .f, ..., .progress = .progress)
  6. |   +-purrr:::with_indexed_errors(...)
  7. |   | -base::withCallingHandlers(...)
  8. |   +-purrr:::call_with_cleanup(...)
  9. |   -global .f(.x[[i]], ...)
 10. |     -readr::read_delim(...)
 11. |       -vroom::vroom(...)
 12. |         -vroom:::vroom_(...)
 13. +-vroom (local) `<fn>`("/projectnb/econdept/peterwd/Dropbox/personTripBoston//raw/non_jica_admin_data/brazil/raw_incomes/AC_20171016/AC/Base informa<c3><a7>oes setores2010 universo AC/CSV/Basico_AC.csv")
 14. | -vroom:::check_path(path)
 15. |   -base::stop(...)
 16. -base::.handleSimpleError(...)
 17.   -purrr (local) h(simpleError(msg, call))
 18.     -cli::cli_abort(...)
 19.       -rlang::abort(...)
Execution halted

However this error does not happen at all times. I am working on a computing cluster running Alma Linux. When I run the code interactively, it works fine. When I run the code on as a simple Rscript command, it works fine. However when I submit the job via qsub, I get the error above.

My only intuition for this is that the ability for vroom to work with non-ascii paths varies in some way based on the processor that the code is run on. And the processor (cascadelake) that I use interactively is different than the one that the qsub job sends my code to. But I haven't been able to debug this yet.

Is this an expected error? Is there some option with vroom that I can use to fix this? See sessionInfo() below.

r$> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: AlmaLinux 8.10 (Cerulean Leopard)

Matrix products: default
BLAS/LAPACK: FlexiBLAS MKLOPENMP;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=C.UTF-8    LC_NUMERIC=C        LC_TIME=C          
 [4] LC_COLLATE=C        LC_MONETARY=C       LC_MESSAGES=C      
 [7] LC_PAPER=C          LC_NAME=C           LC_ADDRESS=C       
[10] LC_TELEPHONE=C      LC_MEASUREMENT=C    LC_IDENTIFICATION=C

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
 [1] lfe_3.1.0              Matrix_1.7-0          
 [3] kableExtra_1.4.0       knitr_1.49            
 [5] xtable_1.8-4           lehdr_1.1.4           
 [7] gghighlight_0.4.1      marginaleffects_0.24.0
 [9] quantreg_5.99.1        SparseM_1.81          
[11] binsreg_1.1            leaflet_2.2.2         
[13] RColorBrewer_1.1-3     osmdata_0.2.5         
[15] elevatr_0.99.0         geodata_0.6-2         
[17] stargazer_5.2.3        tmap_3.3-4            
[19] ggrepel_0.9.5          terra_1.7-71          
[21] raster_3.6-26          sp_2.1-4              
[23] fixest_0.12.1          haven_2.5.4           
[25] lubridate_1.9.3        forcats_1.0.0         
[27] stringr_1.5.1          dplyr_1.1.4           
[29] purrr_1.0.2            readr_2.1.5           
[31] tidyr_1.3.1            tibble_3.2.1          
[33] ggplot2_3.5.1          tidyverse_2.0.0       
[35] sf_1.0-16             

loaded via a namespace (and not attached):
 [1] DBI_1.2.2               httr2_1.0.1            
 [3] tmaptools_3.2           sandwich_3.1-0         
 [5] rlang_1.1.3             magrittr_2.0.3         
 [7] dreamerr_1.4.0          matrixStats_1.3.0      
 [9] e1071_1.7-14            compiler_4.4.0         
[11] systemfonts_1.0.6       png_0.1-8              
[13] vctrs_0.6.5             pkgconfig_2.0.3        
[15] fastmap_1.1.1           lwgeom_0.2-14          
[17] leafem_0.2.3            utf8_1.2.4             
[19] rmarkdown_2.26          tzdb_0.4.0             
[21] MatrixModels_0.5-3      xfun_0.50              
[23] stringmagic_1.1.2       parallel_4.4.0         
[25] R6_2.5.1                stringi_1.8.3          
[27] stars_0.6-5             numDeriv_2016.8-1.1    
[29] Rcpp_1.0.12             zoo_1.8-12             
[31] base64enc_0.1-3         leaflet.providers_2.0.0
[33] splines_4.4.0           timechange_0.3.0       
[35] tidyselect_1.2.1        rstudioapi_0.16.0      
[37] dichromat_2.0-0.1       abind_1.4-5            
[39] codetools_0.2-20        lattice_0.22-6         
[41] leafsync_0.1.0          withr_3.0.0            
[43] evaluate_0.23           survival_3.6-4         
[45] units_0.8-5             proxy_0.4-27           
[47] xml2_1.3.6              pillar_1.9.0           
[49] KernSmooth_2.23-22      generics_0.1.3         
[51] hms_1.1.3               munsell_0.5.1          
[53] scales_1.3.0            class_7.3-22           
[55] glue_1.7.0              tools_4.4.0            
[57] data.table_1.15.4       XML_3.99-0.16.1        
[59] grid_4.4.0              crosstalk_1.2.1        
[61] colorspace_2.1-0        nlme_3.1-164           
[63] Formula_1.2-5           cli_3.6.2              
[65] rappdirs_0.3.3          fansi_1.0.6            
[67] viridisLite_0.4.2       svglite_2.1.3          
[69] gtable_0.3.5            digest_0.6.35          
[71] progressr_0.14.0        classInt_0.4-10        
[73] htmlwidgets_1.6.4       htmltools_0.5.8.1      
[75] lifecycle_1.0.4         MASS_7.3-60.2     
@jennybc
Copy link
Member

jennybc commented Jan 28, 2025

There's not going to be some quick vroom fix for this.

But you might want to gain some insight into what's different in your different execution contexts by looking at the output of l10n_info() and Sys.getlocale(). That might reveal something interesting and actionable on your end.

@pdeffebach
Copy link
Author

Thank you for your response.

Yes, the issue ended up being the locale. For some reason, the locale for the interactive node was "UTF-8" (or similar), while the locale for the qsub node was "C". Setting at the locale at the top of the script fixed this problem.

So this appears to not be a vroom problem. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants