-
Notifications
You must be signed in to change notification settings - Fork 0
Group command
The top level Group
instruction allows content belonging to a common ancestor to be grouped together. It is especially usefull when trying to extract information which comes naturally in bundles. An example use case would be to save title, post date and content for each news article on the front page of a website.
The Group
instructions takes two mandatory instructions;
By
-
Save
orGroup
.
By
specifies the XPath of the ancestor element(s) for which we would like to group. Each match for the By
XPath will be a seperate group. The output will always be a list where each item is a group. The instructions in Save
will be passed on to the save parser. It is important to note that the XPath context for parsing the Save
contents is relative to grouped ancestor. Therefore you will almost always want to start your XPath in a Group
with a dot (.
) to indicate that you want to query relative to the grouping context we defined.
The example below shows how one can obtain the types and number of each Pokemon on a page. Notice that a Pokemon can be multiple types so we add a --keep-list
to ensure that type is always a list (even if a pokemon is only one type).
Url: https://iilaurens.github.io/urlsave/pages/pokemons.html
Group:
By: //tr[@class="row"]
Save:
Name: .//td[@title="name"]
Number: .//td[@title="number"]
Type: .//td[@title="type"]/span --keep-list
>> [{'Name': 'Bulbasaur', 'Number': '#001', 'Type': ['Grass', 'Poison']},
>> {'Name': 'Ivysaur', 'Number': '#002', 'Type': ['Grass', 'Poison']},
>> {'Name': 'Venusaur', 'Number': '#003', 'Type': ['Grass', 'Poison']},
>> {'Name': 'Charmander', 'Number': '#004', 'Type': ['Fire']},
>> {'Name': 'Charmeleon', 'Number': '#005', 'Type': ['Fire']},
>> {'Name': 'Charizard', 'Number': '#006', 'Type': ['Fire', 'Flying']},
>> ... ]
Optionally you can provide a Keys
option to the Group
command above. In that case the output will change from a list to a dictionary. The keys of the dictionary are defined by evaluating the XPath in the Keys
option.
Url: https://iilaurens.github.io/urlsave/pages/pokemons.html
Group:
By: //tr[@class="row"]
Keys: .//td[@title="name"]
Save:
Number: .//td[@title="number"]
Type: .//td[@title="type"]/span --keep-list
>> {'Bulbasaur': {'Number': '#001', 'Type': ['Grass', 'Poison']},
>> 'Ivysaur': {'Number': '#002', 'Type': ['Grass', 'Poison']},
>> 'Venusaur': {'Number': '#003', 'Type': ['Grass', 'Poison']},
>> 'Charmander': {'Number': '#004', 'Type': ['Fire']},
>> 'Charmeleon': {'Number': '#005', 'Type': ['Fire']},
>> 'Charizard': {'Number': '#006', 'Type': ['Fire', 'Flying']},
>> ... }