docs: remove example from the README as it was showcasing `as_polars_…

…df()`, make it clearer in vignette that those are convenience functions
etiennebacher · Aug 21, 2024 · c29ea84 · c29ea84
1 parent 39a195e
commit c29ea84
Show file tree

Hide file tree

Showing 3 changed files with 89 additions and 213 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -37,101 +37,25 @@ knitr::opts_chunk$set(
 `tidypolars` provides a [`polars`](https://rpolars.github.io/) backend for the
 `tidyverse`. The aim of `tidypolars` is to enable users to keep their existing
 `tidyverse` code while using `polars` in the background to benefit from large
-performance gains.
-
-See the example below and the ["Getting started" vignette](https://tidypolars.etiennebacher.com/articles/tidypolars) for a gentle 
-introduction to `tidypolars`.
-
-
-## Installation
-
-`tidypolars` is built on `polars`, which is not available on CRAN. This means 
-that `tidypolars` also can't be on CRAN. However, you can install it from 
-R-universe.
-
-### Windows or macOS
-
-```{r eval=FALSE}
-install.packages(
-  'tidypolars', 
-  repos = c('https://etiennebacher.r-universe.dev', getOption("repos"))
-)
-```
-
-### Linux
-
-```{r eval=FALSE}
-install.packages(
-  'tidypolars', 
-  repos = c('https://etiennebacher.r-universe.dev/bin/linux/jammy/4.3', getOption("repos"))
-)
-```
-
-
-## Example
-
-Suppose that you already have some code that uses `dplyr`:
-
-```{r}
-library(dplyr, warn.conflicts = FALSE)
-
-iris |> 
-  select(starts_with(c("Sep", "Pet"))) |> 
-  mutate(
-    petal_type = ifelse((Petal.Length / Petal.Width) > 3, "long", "large")
-  ) |> 
-  filter(between(Sepal.Length, 4.5, 5.5)) |> 
-  head()
-```
-
-With `tidypolars`, you can provide a Polars `DataFrame` or `LazyFrame` and keep 
-the exact same code:
-
-```{r}
-library(tidypolars)
-
-iris |> 
-  as_polars_df() |> 
-  select(starts_with(c("Sep", "Pet"))) |> 
-  mutate(
-    petal_type = ifelse((Petal.Length / Petal.Width) > 3, "long", "large")
-  ) |> 
-  filter(between(Sepal.Length, 4.5, 5.5)) |> 
-  head()
-```
-
-If you're used to the `tidyverse` functions and syntax, this will feel much 
-easier to read than the pure `polars` syntax:
-
-```{r}
-library(polars)
-
-# polars syntax
-pl$DataFrame(iris)$
-  select(c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"))$
-  with_columns(
-    pl$when(
-      (pl$col("Petal.Length") / pl$col("Petal.Width") > 3)
-    )$then(pl$lit("long"))$
-      otherwise(pl$lit("large"))$
-      alias("petal_type")
-  )$
-  filter(pl$col("Sepal.Length")$is_between(4.5, 5.5))$
-  head(6)
-```
+performance gains. The only thing that needs to change is the way data is
+imported in the R session.
 
 Since most of the work is rewriting `tidyverse` code into `polars` syntax, 
 `tidypolars` and `polars` have very similar performance.
 
 <details>
 <summary>Click to see a small benchmark</summary>
 
-For more serious benchmarks about `polars`, take a look at [DuckDB
-benchmarks](https://duckdblabs.github.io/db-benchmark/).
+The main purpose of this benchmark is to show that `polars` and `tidypolars` are
+close and to give an idea of the performance. For more thorough, representative 
+benchmarks about `polars`, take a look at [DuckDB benchmarks](https://duckdblabs.github.io/db-benchmark/) instead.
 
 ```{r}
 library(collapse, warn.conflicts = FALSE)
+library(dplyr, warn.conflicts = FALSE)
 library(dtplyr)
+library(polars)
+library(tidypolars)
 
 large_iris <- data.table::rbindlist(rep(list(iris), 100000))
 large_iris_pl <- as_polars_lf(large_iris)
@@ -198,6 +122,35 @@ bench::mark(
 </details>
 
 
+See the["Getting started" vignette](https://tidypolars.etiennebacher.com/articles/tidypolars) 
+for a gentle introduction to `tidypolars`.
+
+
+## Installation
+
+`tidypolars` is built on `polars`, which is not available on CRAN. This means 
+that `tidypolars` also can't be on CRAN. However, you can install it from 
+R-universe.
+
+### Windows or macOS
+
+```{r eval=FALSE}
+install.packages(
+  'tidypolars', 
+  repos = c('https://etiennebacher.r-universe.dev', getOption("repos"))
+)
+```
+
+### Linux
+
+```{r eval=FALSE}
+install.packages(
+  'tidypolars', 
+  repos = c('https://etiennebacher.r-universe.dev/bin/linux/jammy/4.3', getOption("repos"))
+)
+```
+
+
 ## Contributing
 
 Did you find some bugs or some errors in the documentation? Do you want 

diff --git a/README.md b/README.md
@@ -28,120 +28,8 @@ is here:
 `tidypolars` provides a [`polars`](https://rpolars.github.io/) backend
 for the `tidyverse`. The aim of `tidypolars` is to enable users to keep
 their existing `tidyverse` code while using `polars` in the background
-to benefit from large performance gains.
-
-See the example below and the [“Getting started”
-vignette](https://tidypolars.etiennebacher.com/articles/tidypolars) for
-a gentle introduction to `tidypolars`.
-
-## Installation
-
-`tidypolars` is built on `polars`, which is not available on CRAN. This
-means that `tidypolars` also can’t be on CRAN. However, you can install
-it from R-universe.
-
-### Windows or macOS
-
-``` r
-install.packages(
-  'tidypolars', 
-  repos = c('https://etiennebacher.r-universe.dev', getOption("repos"))
-)
-```
-
-### Linux
-
-``` r
-install.packages(
-  'tidypolars', 
-  repos = c('https://etiennebacher.r-universe.dev/bin/linux/jammy/4.3', getOption("repos"))
-)
-```
-
-## Example
-
-Suppose that you already have some code that uses `dplyr`:
-
-``` r
-library(dplyr, warn.conflicts = FALSE)
-
-iris |> 
-  select(starts_with(c("Sep", "Pet"))) |> 
-  mutate(
-    petal_type = ifelse((Petal.Length / Petal.Width) > 3, "long", "large")
-  ) |> 
-  filter(between(Sepal.Length, 4.5, 5.5)) |> 
-  head()
-#>   Sepal.Length Sepal.Width Petal.Length Petal.Width petal_type
-#> 1          5.1         3.5          1.4         0.2       long
-#> 2          4.9         3.0          1.4         0.2       long
-#> 3          4.7         3.2          1.3         0.2       long
-#> 4          4.6         3.1          1.5         0.2       long
-#> 5          5.0         3.6          1.4         0.2       long
-#> 6          5.4         3.9          1.7         0.4       long
-```
-
-With `tidypolars`, you can provide a Polars `DataFrame` or `LazyFrame`
-and keep the exact same code:
-
-``` r
-library(tidypolars)
-
-iris |> 
-  as_polars_df() |> 
-  select(starts_with(c("Sep", "Pet"))) |> 
-  mutate(
-    petal_type = ifelse((Petal.Length / Petal.Width) > 3, "long", "large")
-  ) |> 
-  filter(between(Sepal.Length, 4.5, 5.5)) |> 
-  head()
-#> shape: (6, 5)
-#> ┌──────────────┬─────────────┬──────────────┬─────────────┬────────────┐
-#> │ Sepal.Length ┆ Sepal.Width ┆ Petal.Length ┆ Petal.Width ┆ petal_type │
-#> │ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---        │
-#> │ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str        │
-#> ╞══════════════╪═════════════╪══════════════╪═════════════╪════════════╡
-#> │ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ long       │
-#> │ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ long       │
-#> │ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ long       │
-#> │ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ long       │
-#> │ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ long       │
-#> │ 5.4          ┆ 3.9         ┆ 1.7          ┆ 0.4         ┆ long       │
-#> └──────────────┴─────────────┴──────────────┴─────────────┴────────────┘
-```
-
-If you’re used to the `tidyverse` functions and syntax, this will feel
-much easier to read than the pure `polars` syntax:
-
-``` r
-library(polars)
-
-# polars syntax
-pl$DataFrame(iris)$
-  select(c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"))$
-  with_columns(
-    pl$when(
-      (pl$col("Petal.Length") / pl$col("Petal.Width") > 3)
-    )$then(pl$lit("long"))$
-      otherwise(pl$lit("large"))$
-      alias("petal_type")
-  )$
-  filter(pl$col("Sepal.Length")$is_between(4.5, 5.5))$
-  head(6)
-#> shape: (6, 5)
-#> ┌──────────────┬─────────────┬──────────────┬─────────────┬────────────┐
-#> │ Sepal.Length ┆ Sepal.Width ┆ Petal.Length ┆ Petal.Width ┆ petal_type │
-#> │ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---        │
-#> │ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str        │
-#> ╞══════════════╪═════════════╪══════════════╪═════════════╪════════════╡
-#> │ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ long       │
-#> │ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ long       │
-#> │ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ long       │
-#> │ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ long       │
-#> │ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ long       │
-#> │ 5.4          ┆ 3.9         ┆ 1.7          ┆ 0.4         ┆ long       │
-#> └──────────────┴─────────────┴──────────────┴─────────────┴────────────┘
-```
+to benefit from large performance gains. The only thing that needs to
+change is the way data is imported in the R session.
 
 Since most of the work is rewriting `tidyverse` code into `polars`
 syntax, `tidypolars` and `polars` have very similar performance.
@@ -151,13 +39,18 @@ syntax, `tidypolars` and `polars` have very similar performance.
 Click to see a small benchmark
 </summary>
 
-For more serious benchmarks about `polars`, take a look at [DuckDB
-benchmarks](https://duckdblabs.github.io/db-benchmark/).
+The main purpose of this benchmark is to show that `polars` and
+`tidypolars` are close and to give an idea of the performance. For more
+thorough, representative benchmarks about `polars`, take a look at
+[DuckDB benchmarks](https://duckdblabs.github.io/db-benchmark/) instead.
 
 ``` r
 library(collapse, warn.conflicts = FALSE)
 #> collapse 2.0.15, see ?`collapse-package` or ?`collapse-documentation`
+library(dplyr, warn.conflicts = FALSE)
 library(dtplyr)
+library(polars)
+library(tidypolars)
 
 large_iris <- data.table::rbindlist(rep(list(iris), 100000))
 large_iris_pl <- as_polars_lf(large_iris)
@@ -222,11 +115,11 @@ bench::mark(
 #> # A tibble: 5 × 6
 #>   expression      min   median `itr/sec` mem_alloc `gc/sec`
 #>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
-#> 1 polars     116.67ms 158.61ms     5.20    20.54KB    0.260
-#> 2 tidypolars 144.51ms 184.05ms     5.12   353.94KB    1.53 
-#> 3 dplyr         4.45s    4.79s     0.202    1.79GB    0.450
-#> 4 dtplyr        1.07s    1.18s     0.821    1.72GB    1.66 
-#> 5 collapse   585.85ms 803.39ms     1.26   745.96MB    1.26
+#> 1 polars      142.5ms 173.96ms     4.43     4.51MB    0.222
+#> 2 tidypolars  161.9ms 206.56ms     4.70     1.78MB    2.00 
+#> 3 dplyr          3.8s    4.07s     0.231    1.79GB    0.554
+#> 4 dtplyr      810.6ms       1s     0.999    1.72GB    2.82 
+#> 5 collapse    400.8ms  493.3ms     1.97   745.96MB    1.33
 
 # NOTE: do NOT take the "mem_alloc" results into account.
 # `bench::mark()` doesn't report the accurate memory usage for packages calling
@@ -235,6 +128,34 @@ bench::mark(
 
 </details>
 
+See the[“Getting started”
+vignette](https://tidypolars.etiennebacher.com/articles/tidypolars) for
+a gentle introduction to `tidypolars`.
+
+## Installation
+
+`tidypolars` is built on `polars`, which is not available on CRAN. This
+means that `tidypolars` also can’t be on CRAN. However, you can install
+it from R-universe.
+
+### Windows or macOS
+
+``` r
+install.packages(
+  'tidypolars', 
+  repos = c('https://etiennebacher.r-universe.dev', getOption("repos"))
+)
+```
+
+### Linux
+
+``` r
+install.packages(
+  'tidypolars', 
+  repos = c('https://etiennebacher.r-universe.dev/bin/linux/jammy/4.3', getOption("repos"))
+)
+```
+
 ## Contributing
 
 Did you find some bugs or some errors in the documentation? Do you want

diff --git a/vignettes/tidypolars.Rmd b/vignettes/tidypolars.Rmd
@@ -27,20 +27,22 @@ knitr::opts_chunk$set(
 }
 ```
 
-The first thing to do when using `tidypolars` is to get some data as a Polars
-`DataFrame` or `LazyFrame`. You can read files with the various `read_*_polars()`
-functions (such as `read_parquet_polars()`) to import them as `DataFrame`s, or 
-with `scan_*_polars()` functions (such as `scan_parquet_polars()`) to import them 
-as `LazyFrame`s. There are several functions to import various file formats, 
-such as CSV, Parquet, or JSON.
+Using `tidypolars` requires importing data as Polars `DataFrame`s or
+`LazyFrame`s. You can read files with the [various `read_*_polars()` functions](https://tidypolars.etiennebacher.com/reference/#import-data)
+(such as `read_parquet_polars()`) to import them as `DataFrame`s, or with
+`scan_*_polars()` functions (such as `scan_parquet_polars()`) to import them as
+`LazyFrame`s. There are several functions to import various file formats, such
+as CSV, Parquet, or JSON.
 
 You could also read data with other packages and then convert it with
 `as_polars_df()` (or `as_polars_lf()` if you want to make it a 
 `LazyFrame`). 
 
 <div class="custom_note">
-  <p><b>Note: </b><code>as_polars_df()</code> and <code>as_polars_lf()</code> are 
-  merely convenience functions to quickly convert data to Polars, which is
+  <p><b>Note:</b> in examples or some tutorials, the functions <code>as_polars_df()</code>
+  and <code>as_polars_lf()</code> are sometimes used to convert an existing R 
+  data.frame to a Polars DataFrame or LazyFrame. Those are merely convenience 
+  functions to quickly convert an existing dataset to Polars, which is
   useful for showcase purposes. However, this conversion from R to Polars has
   some cost and it hurts the performance. In real-life usecases, be sure to load 
   the data with the <code>read_\*()</code> or the <code>scan_\*()</code> functions