-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathdataWrangling.nim
107 lines (89 loc) · 3.1 KB
/
dataWrangling.nim
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
##
## data wrangling
## ==============
## [bookmark](https://nim-lang.org/docs/strscans.html)
##[
TLDR
- for parsing configuration files, see packaging
- constructing regular expressions are expensive, save to a var if you can reuse it
- re
- follows perl 5 (see pcre spec link)
- is an impure module require C's PCRE to be available at runtime
- pegs meant to replace re
- scanf can be extended with arbitrary procs for data wrangling
- parseutils
- provides many declarative wrappers utilizing while loops
- prefer this over manually looping through haystacks looking for needles
- sometimes faster than using re module
links
- intros
- [pcre specification](https://perldoc.perl.org/perlre)
- [EBNF wikipedia](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form)
- examples
- [nimgrep source code](https://github.com/nim-lang/Nim/blob/devel/tools/nimgrep.nim)
- high impact
- [peg matching](https://nim-lang.org/docs/pegs.html)
- [regex pcre wrapper](https://nim-lang.org/docs/re.html)
- [string scans](https://nim-lang.org/docs/strscans.html)
- [fusion matching](https://nim-lang.github.io/fusion/src/fusion/matching.html)
- [parse utils](https://nim-lang.org/docs/parseutils.html)
- nitche
- [perl compatible regex](https://nim-lang.org/docs/pcre.html)
TODOs
-----
- re
- study the expression: no clue what this means
## re
- everything works as expected, only things i found interesting are listed
- wrapper around pcre pkg
- supports up to 20 or 40 capturing subpatterns, not sure which is correct
- start param can change where scan starts, but output is always relative to ^input
- findAll and split can be iterated
- =~ is particularly useful
re metacharacters
-----------------
- the usual suspects
- \ddd octal code ddd or backreference
- \x{hh} character with hex code hh
re exceptions
-------------
- RegexError re syntax invalid
re types
--------
- Regex
- RegexFlag enum
- reIgnoreCase
- reMultiLine ^ $ match new lines
- reDotAll . matches anything
- reExtended ignore whitespace and `#` and comments
- reStudy study the expression?
re consts
---------
- MaxReBufSize high(cint)
- MaxSubPatterns 20
re procs
--------
- find
- match
]##
{.push warning[UnusedImport]:off .}
import std/[sugar, strformat]
echo "############################ re"
# rex"ignore whitespace and # comments"
# transformFile for quick scripting
import std/re
const lost = "lost something in this string, can you help me find it?"
echo fmt"""{lost.contains re"^lost.*\?$"=}"""
echo fmt"""{lost.endsWith re"it\?"=}"""
echo fmt"""{lost.startsWIth re"lost"=}"""
echo fmt"""{lost.find re"\s"=}"""
echo fmt"""{lost.findAll re"\s"=}"""
echo fmt"""{lost.findBounds re"thing"=}"""
echo fmt"""{lost.match re"lost"=}"""
echo fmt"""{lost.match re"matches at ^ unless start provided"=}"""
echo fmt"""{lost.matchLen re"lost"=}"""
echo fmt"""{lost.multiReplace [(re"some.*g\s", "my keys ")]=}"""
echo fmt"""{lost.replace re"some.*g\s", "my keys "=}"""
echo fmt"""{lost.replacef re"(some.*g\s)", "$1special " =}"""
echo fmt"""{lost.split re"\s"=}"""
echo fmt"""pass a block to access matches[] {lost =~ re".*find\s\{1\}it"=}"""