-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add tab_spanner_delim()
method
#604
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #604 +/- ##
==========================================
- Coverage 90.71% 90.70% -0.01%
==========================================
Files 46 46
Lines 5417 5467 +50
==========================================
+ Hits 4914 4959 +45
- Misses 503 508 +5 ☔ View full report in Codecov by Sentry. |
@rich-iannone is going to circle around with this a bit fleshed out, but we paired on having an intermediate representation inside this function, to make it easier to communicate how column name splits end up at various spanner levels: # case 1: simple ----
split(["span_1.A", "span_1.B.x"])
# produces
{"span_1.A": ["span_1", "A"], "span_1.B.x": ["span_1", "B", "x"]}
# case 2: reversed ----
split(["span_1.A", "span_1.B.x"], reverse=True)
# produces
{"span_1.A": ["A", "span_1"], "span_1.B.x": ["x", "B", "span_1"]}
# converted to DataFrame (full example) ----------
from itertools import zip_longest
import polars as pl
# d is the intermediate representation
d = {"span_1.A": ["A", "span_1"], "span_1.B.x": ["x", "B", "span_1"]}
rectangle = [list(d.keys()), *zip_longest(*d.values())]
# first column should be "column_name"
# other columns should be numbered (e.g. span_1, span_2, span_3)
pl.DataFrame(rectangle) For example, the above DataFrame looks like this ┌────────────┬──────────┬──────────┬──────────┐
│ column_0 ┆ column_1 ┆ column_2 ┆ column_3 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞════════════╪══════════╪══════════╪══════════╡
│ span_1.A ┆ A ┆ span_1 ┆ null │
│ span_1.B.x ┆ x ┆ B ┆ span_1 │
└────────────┴──────────┴──────────┴──────────┘ This first column is the original column names, with the other columns representing spanner levels (from lowest to highest). It seems that gt loops through the other columns from lowest to highest, and...
I think having the intermediate dictionary of lists, and conversion to DataFrame will end up useful for us communicating gt's underlying spec and testing it (without having to snapshot whole tables) |
@jrycw Thank you so much for your work on this feature, it is a big one! Regarding the In your example you had: import polars as pl
from great_tables import GT
data = {
"province.NL_ZH.pop": [1, 2, 3],
"province.NL_ZH.gdp": [4, 5, 6],
"province.NL_NH.pop": [7, 8, 9],
"province.NL_NH.gdp": [10, 11, 12],
}
gt = GT(pl.DataFrame(data))
gt.tab_spanner_delim() ![]() Which is a sensible way to structure column names as we go from large to small elements, from left to right (e.g., data = {
"pop.NL_ZH.province": [1, 2, 3],
"gdp.NL_ZH.province": [4, 5, 6],
"pop.NL_NH.province": [7, 8, 9],
"gdp.NL_NH.province": [10, 11, 12],
} It's just another way of structuring the elements, but in a different order (going from smallest to largest elements). Rendering that currently with ![]() But having the option to use data = {
"pop.NL_ZH": [1, 2, 3],
"gdp.NL_ZH": [4, 5, 6],
"pop.NL_NH": [7, 8, 9],
"gdp.NL_NH": [10, 11, 12],
} where it makes sense to the user that In terms of implementation, what's done in the R program is just this:
Hope this helps clarify the intended use of |
@machow and @rich-iannone, thanks for all the examples and explanations! I've tried to consolidate the logic you described into a class named I've also added some tests, but I'm not entirely sure if my logic is correct— |
Fix issue: #582.
Hello team:
This PR sets up a scaffold for
tab_spanner_delim()
.Basic Logic:
str.split
orstr.rsplit
to split column names, corresponding todelimi=
andlimit=
parameters.level
androot_col
(original column).level != 0
, calltab_spanner()
.level == 0
, callcols_label()
.Example:
Remaining Tasks:
reverse=
logic is not implemented—need the team's help to clarify the expected behavior.{gt}
but need refinement.