-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add caching #5
Comments
What kind of caching do you have in mind? I guess I could cache the intermediate parser output from BeautifulSoup, but that's a relatively small part compared to the time to compile regexes. The regexes are compiled only once per execution. AFAICT there is no way to persist the compiled object across multiple runs. Even Instead of caching BeautifulSoup output, I'd prefer switching to a faster parser instead so I could lose the dependency and it wouldn't be an issue at all. |
Hello Marti. I wrote a full rewrite of your script available at I used Pickle to write/read the compiled Regexs. Thought that it Christian. Christian Berendt B1 Systems GmbH |
I profiled it on my laptop and the total startup time is 2.1s, of which 1.7 is spent in re.compile, the rest is almost all BeautifulSoup. It is significant, but I'd explore the possibility of changing the parser first. Thanks for going through the effort to rewrite it. I'll definitely adopt some things from it, but I'll have to think whether I want it whole. I have some reservations like with logging. If it's a simple command-line app and not a daemon, people probably don't care about timestamps and log levels. |
That's okay for me. You don't have to take the whole rewrite. Just drop me a line if you're finished and I'll drop the rewrite from Gist. I think that's not worth to have caching implemented if there is not really a benefit in the loading time. |
Cleaning up my open issues, I will close this issue. |
Loading the regs everytime doesn't make sense.
The text was updated successfully, but these errors were encountered: