Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add caching #5

Closed
berendt opened this issue Apr 30, 2014 · 6 comments
Closed

add caching #5

berendt opened this issue Apr 30, 2014 · 6 comments

Comments

@berendt
Copy link
Contributor

berendt commented Apr 30, 2014

Loading the regs everytime doesn't make sense.

@intgr
Copy link
Owner

intgr commented May 5, 2014

What kind of caching do you have in mind? I guess I could cache the intermediate parser output from BeautifulSoup, but that's a relatively small part compared to the time to compile regexes.

The regexes are compiled only once per execution. AFAICT there is no way to persist the compiled object across multiple runs. Even pickle.dumps() on a regex object has to compile again on load, so it's not a win.

Instead of caching BeautifulSoup output, I'd prefer switching to a faster parser instead so I could lose the dependency and it wouldn't be an issue at all.

@berendt
Copy link
Contributor Author

berendt commented May 5, 2014

Hello Marti.

I wrote a full rewrite of your script available at
https://gist.github.com/berendt/5ae38f2f1d5bd6b883d3 at the moment and
tested it with some OpenStack repositories. Please have a look if you
want to replace the current version of Topy with this rewrite. It has
clean logging an argument parsing and it uses regex instead of re. Using
regex it's possible to compile all regular expressions. With Python2 I
had > 500 failed compilations.

I used Pickle to write/read the compiled Regexs. Thought that it
increased the loading time, but I have not measured it.

Christian.

Christian Berendt
Cloud Computing Solution Architect
Mail: [email protected]

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

@intgr
Copy link
Owner

intgr commented May 5, 2014

I profiled it on my laptop and the total startup time is 2.1s, of which 1.7 is spent in re.compile, the rest is almost all BeautifulSoup. It is significant, but I'd explore the possibility of changing the parser first.

Thanks for going through the effort to rewrite it. I'll definitely adopt some things from it, but I'll have to think whether I want it whole. I have some reservations like with logging. If it's a simple command-line app and not a daemon, people probably don't care about timestamps and log levels.

@berendt
Copy link
Contributor Author

berendt commented May 5, 2014

That's okay for me. You don't have to take the whole rewrite. Just drop me a line if you're finished and I'll drop the rewrite from Gist.

I think that's not worth to have caching implemented if there is not really a benefit in the loading time.

@berendt
Copy link
Contributor Author

berendt commented May 8, 2016

Cleaning up my open issues, I will close this issue.

@intgr
Copy link
Owner

intgr commented May 19, 2016

@berendt Thanks, I have opened a new issue #16 "Speed up ruleset parsing" based on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants