Skip to content

Commit

Permalink
fix tests
Browse files Browse the repository at this point in the history
  • Loading branch information
CodyInnowhere authored and CodyInnowhere committed Dec 13, 2024
1 parent 6e31171 commit af7d705
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions trafilatura/xml.py
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,11 @@ def process_element(element: _Element, returnlist: List[str], include_formatting
# this is the text that comes before the first child
returnlist.append(replace_element_text(element, include_formatting))

if element.tail and element.tag != 'graphic' and is_in_table_cell(element):
# if element is in table cell, append tail after element text when element is not graphic since we deal with
# graphic tail alone, textless elements like lb should be processed here too, otherwise process tail at the end
returnlist.append(element.tail.strip())

for child in element:
process_element(child, returnlist, include_formatting)

Expand Down

0 comments on commit af7d705

Please sign in to comment.