Skip to content

Group command

iiLaurens edited this page Sep 23, 2018 · 10 revisions

The top level Group instruction allows content belonging to a common ancestor to be grouped together. It is especially usefull when trying to extract information which comes naturally in bundles. An example use case would be to save title, post date and content for each news article on the front page of a website.

Using the 'Group' instruction

The Group instructions takes two mandatory instructions;

  • By
  • Save or Group.

By specifies the XPath of the ancestor element(s) for which we would like to group. Each match for the By XPath will be a seperate group. The output will always be a list where each item is a group. The instructions in Save will be passed on to the save parser. It is important to note that the XPath context for parsing the Save contents is relative to grouped ancestor. Therefore you will almost always want to start your XPath in a Group with a dot (.) to indicate that you want to query relative to the grouping context we defined.

The example below shows how one can obtain the types and number of each Pokemon on a page. Notice that a Pokemon can be multiple types so we add a --keep-list to ensure that type is always a list (even if a pokemon is only one type).

Url: https://iilaurens.github.io/urlsave/pages/pokemons.html
Group:
    By: //tr[@class="row"]
    Save:
        Name: .//td[@title="name"]
        Number: .//td[@title="number"]
        Type: .//td[@title="type"]/span --keep-list

>> [{'Name': 'Bulbasaur',  'Number': '#001', 'Type': ['Grass', 'Poison']},
>>  {'Name': 'Ivysaur',    'Number': '#002', 'Type': ['Grass', 'Poison']},
>>  {'Name': 'Venusaur',   'Number': '#003', 'Type': ['Grass', 'Poison']},
>>  {'Name': 'Charmander', 'Number': '#004', 'Type': ['Fire']},
>>  {'Name': 'Charmeleon', 'Number': '#005', 'Type': ['Fire']},
>>  {'Name': 'Charizard',  'Number': '#006', 'Type': ['Fire',  'Flying']},
>>  ... ]

Optionally you can provide a Keys option to the Group command above. In that case the output will change from a list to a dictionary. The keys of the dictionary are defined by evaluating the XPath in the Keys option.

Url: https://iilaurens.github.io/urlsave/pages/pokemons.html
Group:
    By: //tr[@class="row"]
    Keys: .//td[@title="name"]
    Save:
        Number: .//td[@title="number"]
        Type: .//td[@title="type"]/span --keep-list

>> {'Bulbasaur':  {'Number': '#001', 'Type': ['Grass', 'Poison']},
>>  'Ivysaur':    {'Number': '#002', 'Type': ['Grass', 'Poison']},
>>  'Venusaur':   {'Number': '#003', 'Type': ['Grass', 'Poison']},
>>  'Charmander': {'Number': '#004', 'Type': ['Fire']},
>>  'Charmeleon': {'Number': '#005', 'Type': ['Fire']},
>>  'Charizard':  {'Number': '#006', 'Type': ['Fire',  'Flying']},
>>  ... }
Clone this wiki locally