You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying out docxtractr::read_docx on doc files in Windows 10 using LibreOffice Version: 6.2.5.2 (x64).
It was horribly slow (due to LibreOffice I guess) if I don't open LibreOffice (manually outside R). Once I close and run the same code in R again it's slow.
fn<-"rough/messy_files/doc.doc"
library(tictoc)
# LibreOffice never opened in after last PC-reboot
tic()
tmp<-docxtractr::read_docx(fn)
toc()
# 285.63 sec elapsed# 4.7 min !# LibreOffice open
tic()
tmp<-docxtractr::read_docx(fn)
toc()
# 1.1 sec elapsed# LibreOffice closed after open
tic()
tmp<-docxtractr::read_docx(fn)
toc()
# 24.21 sec elapsed
It is ok for a single file but if you have bundles of files then definitely not a good thing.
I was thinking if any alternative way of supporting doc files can be given to users.
Like use of docx4j as mentioned in this repository. Then the system dependency (on LibreOffice) will go away and I believe that will be smoother also.
Thanks a lot for such a great package.
I was trying out
docxtractr::read_docx
ondoc
files inWindows 10
usingLibreOffice Version: 6.2.5.2 (x64)
.It was horribly slow (due to LibreOffice I guess) if I don't open LibreOffice (manually outside R). Once I close and run the same code in R again it's slow.
It is ok for a single file but if you have bundles of files then definitely not a good thing.
I was thinking if any alternative way of supporting doc files can be given to users.
Like use of docx4j as mentioned in this repository. Then the system dependency (on LibreOffice) will go away and I believe that will be smoother also.
Ref #5
The text was updated successfully, but these errors were encountered: