Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia #10

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Julia #10

wants to merge 4 commits into from

Conversation

floswald
Copy link

hi grant!

I made an addition to your really cool workshop. guess what. the julia package! :-)

I had to make some choices though, because I need this material for a class I'm teaching, and I had to make sure they know some basics first. So I added some stuff to the simple sql chapter, refering also to your other work. I compiled this into a new website that I'll share with that class at https://floswald.github.io/duckdb-polars/

the problem is that I really wanted to use your setup for the data download and python packages, so I had to leave your stuff in there. So, what about this plan. I'll teach the class based on that material, then I take the website down and we try to integrate it properly into your setup here? It runs so well - quarto I mean. 3 languages in one project. amazing.

anyway, let me know what you think - obviously if you don't like what I put up on that website let me know and I'll take it down immediately. cheers

@grantmcdermott
Copy link
Owner

Hey Florian,

This is great, thanks for the HU! As you know, I'm a big Julia fan (thanks in part to you)...

Let's figure out how / if we can integrate your additions into my version once you've finished teaching your class. Two things to consider, off the top of my head:

  1. Does Julia have a viable Polars implementation? I'd like to maintain the full equivalence across languages here (that was a primary aim of my workshop, although it's obviously cool to drill down on DuckDB's polyglot support too).
  2. Kinda tangential, but something that bothered me about this workshop was the Julia I/O timings here. I've double checked multiple times and even raised an issue Very slow to serialize large DBInterface (DuckDB) query result JuliaDatabases/DBInterface.jl#48 but no dice. LMK if you have any insights to resolving or heft for convincing someone to take a deeper look.

Cheers

@grantmcdermott
Copy link
Owner

P.S. Your inquiry made me realise that I had forgotten to add a license. I've just remedied that (CC BY 4.0), which will hopefully remove any ambiguity going forward.

@floswald
Copy link
Author

yes - thanks for the license.
I've also noted that the julia query takes much longer than the R counterpart. I'll bump your issue on discourse, see what gives.
with regards to the timings plot, did you include the code for those benchmarks somewhere? I thought you only had reported the measured timings- would be good to be able to have a look at this.

@floswald
Copy link
Author

slow timings fixed! the quarto process does not pick up all threads for julia by default. (i'm setting vscode to always run julia with all available threads, so did not notice in that REPL). In particular, there is nothing wrong with the final conversion into a dataframe, almost instantaneous.
I wonder if that's part of the reason for the slow timings in your slides as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants