Low discrepancy generating vectors and matrices.
We propose standard formats to specify lattice rules, polynomial lattice rules, and digital nets, in simple text files. We want the formats to be simple and relatively compact, with no more than one line per dimension, so they can easily be used for point sets in several thousand dimensions if desired. Ordinary text files with decimal numbers are good enough. They are easy to read by both humans and computers in any language. Other specialized formats (Json or Parquet, for example) can be more compact but then the files are not as easy to read without extra tools.
Joe and Kuo (2008) provide a file for up to 21201 dimensions for Sobol' points, and this seems to be the most widely used set of parameters for RQMC points at this time. They use a fixed and very simple format, which requires no special software to read. We want to provide similar types of files for other types of point sets, for an arbitrarily large number of dimensions. The SSJ simulation library (L’Ecuyer, 2016) can already read some of our proposed formats. Latnet Builder (L’Ecuyer et al., 2022) can (or will) produce output files in these formats. The choice of output format for Latnet Builder can be specified on the command line, using the following keywords:
lattice
A lattice rule: give the modulus and the generating vector.dnet
A digital net: give the generating matrices, one per line.plattice
A polynomial lattice rule: give the polynomial modulus and the generating vector.sobol
Sobol' points (default format), give only the direction numbers.soboljk
Sobol' points, the format used by Joe and Kuo (2008).
The most important formats are the first two, since the point sets covered by the other formats are special cases of digital nets, so they can all be described by the dnet
format. We propose them because they provide alternative representations that are either more compact or commonly used.
All the point sets that we consider have the form
where #
can be used for comments and descriptions; these lines are totally optional and should be just skipped by the program that reads the file. Anything that starts with #
on any given line in the header should also be skipped. All these comments are only for human readers to better see what is in the file, they are not for the computer1. One exception: the first line of the file must be a comment that contains the keyword for the file type; for example dnet
for a digital net. The number of dimensions (number of lines after the header) can be much larger than what we usually need; it suffices to use the number of rows that are needed.
The point sets can be extensible in the number of points
In the proposed formats, the files do not assume a given computer word size (e.g., 32 bits or 64 bits). The format is exactly the same regardless of the word size. Of course, if the file contains integers of more than 32 bits, the corresponding points cannot be generated properly on a 32 bit computer. A comment in the file header can say it.
Some users might prefer input files with no header at all, only the
The following sections describe the proposed text-file formats for the different point sets.
For an ordinary lattice rule of rank 1, we have
where
In a lattice
file, the first line must start with # lattice
. After that, not counting the comment lines, the first line gives the number
One example of a parameter file for an ordinary lattice rule, in lattice
format is given below. In this file, the first line is skipped, only the number 8
is read on the second line, only the number 65536
is read on the second line, etc.
# lattice
# A lattice rule, non-embedded, in 'lattice' format
8 # 8 dimensions
65536 # modulus = n = 65536 points
# coordinates of the generating vector, starting at j=1:
1
19463
17213
5895
14865
31925
30921
26671
A digital net in base
$$(y_{i, j, 1}, \dots, y_{i, j, r})^T = \boldsymbol{C}j \cdot (a{i, 0}, \dots, a_{i, k-1})^T$$
and
The points $\boldsymbol{u}i$ are defined by $\boldsymbol{u}i = (u{i,1},\dots,u{i,s})$. Digital nets are usually in base
The proposed format to specify digital nets is as follows. The first line must start with # dnet
. Then the first four non-comment lines give double
. By looking at
The DigitalNetBase2
in SSJ for
One example of a file for a digital net in dnet
format:
# dnet
# A digital net in base 2, in 'dnet' format
2 # basis b = 2
8 # s = 8 dimensions
10 # k = 10, so n = 2^10 = 1024 points
31 # r = 31 digits
# The columns of gen. matrices C_1, ..., C_s, one matrix per line:
1073741824 536870912 268435456 134217728 ...
2012537125 1382645254 ...
...
This differs from Joe and Kuo (2008), where the
Recall that coordinate u[i,j]
, coordinate C[j,c]
is the integer that represents column
normFactor = 1.0 / (1 << r) # 2^(-r)
coord = 0
for c in range(k):
coord ^= ((i >> c) & 1) * C[j,c]
u[i,j] = coord * normFactor
Polynomial lattice rules are a special type of digital nets with generating matrices of a special form. For a polynomial lattice rule of rank 1 in a prime base
where
This point set has
We must specify the polynomial modulus
As usual, the first line is a comment that tells the type of file. Then the first four non-comment lines give the base plattice
format:
# plattice
# A polynomial lattice rule in base 2, in 'plattice' format
2 # base b = 2
8 # s = 8 dimensions
16 # n = 2^16 = 65536 points
45781 # polynomial modulus
# coordinates of the generating vector, starting at j=1:
1
17213
5895
14865
31925
30921
26671
17213
A polynomial lattice rule in base dnet
format, as for general digital net in base plattice
files and generate the points would usually first convert the polynomials into the corresponding generating matrices. LatNet Builder (L’Ecuyer et al., 2022) is also able to make the conversion and produce a file in the dnet
format, for more convenience and better flexibility, so the user can select the format she/he prefers.
The Sobol' construction provides another special case of digital nets (and sequences), in base 2. They are defined in many places, including Joe and Kuo (2008). For each coordinate
One obvious option for these point sets is to adopt exactly the same format as Joe and Kuo (2008), because it is already used in many places. The only difference is that we now allow comment lines in the file. In the format of Joe and Kuo (2008), only the first line is skipped. In the proposed format, other comment lines can be added at the beginning of the file, e.g., to give the maximum number of dimensions in the file, the criterion and weights that were used, etc. Note that Sobol' sequences have an infinite number of points and an unlimited number of dimensions, although the file will give parameters for a finite number of dimensions.
The other lines of the file specify the primitive polynomials and the initial direction numbers for each dimension
The first number on each line is the dimension
We denote this format for Sobol parameters by the soboljk
keyword. One example of a file in this format is shown below. The first line gives the type of file and the next three lines are comments that must be skipped by the reading program.
# soboljk
# Parameters for Sobol points, in 'soboljk' format
# 8 dimensions
# c_j p_j m_{j,c}
2 1 0 1
3 2 1 1 3
4 3 1 1 3 1
5 3 2 1 1 1
6 4 1 1 1 3 3
7 4 4 1 3 5 13
8 5 2 1 1 5 5 17
The soboljk
format can be simplified as follows. First, removing the first and last "1" in the representation of the primitive polynomials saves a bit of memory, but it also makes thinks slightly more complicated. In the default representations of the primitive polynomials in the code that generates the points, these bits are usually not removed. In SSJ, the first thing we do when reading a file in soboljk
format is to add them back. Also, the primitive polynomials can be in a separate file, since they never change, and only the (initial) direction numbers (those depend on the selected FOM and weights) would be given to specify the Sobol' points. That is, we remove the first three columns of the soboljk
format. The Magic Point Shop (Nuyens, 2020) also produces files that contain only the direction numbers.
One example of a file in this sobol
format:
# sobol
# Parameters m_{j,c} for Sobol points, in 'sobol' format
# 8 dimensions
1 # This is m_{j,c} for the second coordinate
1 3
1 3 1
1 1 1
1 1 3 3
1 3 5 13
1 1 5 5 17
A list of the first few primitive polynomials in base 2 is given here. If we do not remove the first and last 1's in their representations, the first primitive polynomials are: 3, 7, 11, 13, 19, 25, ...
. Their degrees are 1, 2, 3, 3, 4, 4, ...
. This representation is the one used in the code of SSJ, for example. We can have a separate file that gives these polynomials, one per line, exactly as in the first three columns of the soboljk
format. We may also want to remove the first column.
Another, perhaps more convenient, way of storing Sobol' constructions is to just use the general dnet
format, in which the generating matrices are given explicitly. This dnet
format is easier to use. On the other hand, it requires specifying a (maximum) value of sobol
format, one can construct a digital net with an arbitrarily large
When
The idea of proposing a format for storing specific randomizations was suggested by Fred Hickernell. This can be useful for verification purposes, for example.
We can store randomizations in the following file formats:
-
shiftmod1
A (random) shift modulo 1. It corresponds to a single point in$[0,1)^s$ . -
dshift
A digital shift in base$b$ . Also a single point in$[0,1)^s$ , but with$r$ digits in base$b$ . -
nuscramble
A nested uniform scramble in base$b$ . -
lmscramble
A (linear) left matrix scramble in base$b$ .
For a shiftmod1
in
# shiftmod1
# A shift modulo 1, in 'shiftmod1' format
3 # s = 3 dimensions
0.32638741823951621
0.91325392536931693
0.1530364040t106301
For a dshift
with
# dshift
# A digital shift in base 2, in 'dshift' format
2 # b = 2
3 # s = 3
31 # r = 31
2146832861
1084390381
963462828
For a lmscramble
with dnet
format, using one integer for each column. We want them in this format for the fast LMS implementation we have in SSJ, for example. The file will contain dnet
format. Thus, each scrambling matrix is represented by
# lmscramble
# A left matrix scramble in base 2, with 31 digits of resolution.
2 # basis b = 2
8 # s = 8 dimensions
31 # r = 31 digits
# The columns of the lower-triangular r x r scrambling matrices, one matrix per line:
1673741824 906870912 615843556 213427728 ...
2012537125 1012645254 ...
...
For a nuscramble
of the first DigitalNetBase2
of SSJ, we need nuscramble
file format. The first non-comment line contains the base
# nuscramble
# A nested uniform scramble in base 2, with 30 bits of resolution.
2 # basis b = 2
8 # s = 8 dimensions
10 # k = 10, so n = 2^10 = 1024 points
30 # r = 30 digits
# The following s rows contain n = 1024 30 bit integers per row:
1173741824 906870912 615843556 213427728 ...
1012537125 1001975254 ...
...
It is strongly recommend that all file names start with the corresponding keyword, like plattice
for a polynomial lattice rule, sobol
for a Sobol point set, and lmscramble
for a left matrix scramble, for example.
It is also recommended to put enough relevant comments in each file for a knowledgeable human to find what the file is for (type of point set, figure of merit and weights that were used to construct it, range of values of
We also want some unit tests: some specific parameter files together with the correct output that should be observed when generating the points from these files.
- P. Bratley and B. L. Fox. Algorithm 659: Implementing Sobol’s quasirandom sequence generator. ACM Transactions on Mathematical Software, 14(1):88–100, 1988.
- Brent Burley. Practical hash-based Owen scrambling. The Journal of Computer Graphics Techniques, 9(4):1–20, 2020.
- I. Friedel and A. Keller. Fast generation of randomized low-discrepancy point sets. In K.-T. Fang, F. J. Hickernell, and H. Niederreiter, editors, Monte Carlo and Quasi-Monte Carlo Methods 2000, pages 257–273, Berlin, 2002. Springer-Verlag.
- T. Goda and J. Dick. Construction of interlaced scrambled polynomial lattice rules of arbitrary high order. Foundation of Computational Mathematics, 15:1245–1278, 2015.
- S. Joe and F. Y. Kuo. Constructing Sobol sequences with better two-dimensional projections. SIAM Journal on Scientific Computing, 30(5):2635–2654, 2008.
- S. Laine and T. Karras. Stratified sampling for stochastic transparency. Computer Graphics Forum, 30(4):1197–1204, 2011.
- P. L’Ecuyer. SSJ: Stochastic simulation in Java. http://simul.iro.umontreal.ca/ssj/, accessed 9th August 2021, 2016.
- P. L’Ecuyer, P. Marion, M. Godin, and F. Puchhammer. A tool for custom construction of QMC and RQMC point sets. In A. Keller, editor, Monte Carlo and Quasi-Monte Carlo Methods: MCQMC 2020, pages 51–70, Berlin, 2022. Springer. https://arxiv.org/abs/2012.10263.
- D. Nuyens. The magic point shop, 2020. https://people.cs.kuleuven.be/~dirk.nuyens/qmc-generators/.
Footnotes
-
Comments are now allowed only in the header lines, not in the $s$ lines that follow. This makes more sense. ↩
-
The range for which the points were built could be given in the file, but this makes things a bit more complicated for some point sets. For example, for ordinary lattice rules with a prime number of points, this additional info might be confusing for some users. For Sobol points, the range has no limit. ↩
-
Should we be forced to always put the embedding range in the file for the computer to read? My suggestion is not to force the computer to read it, but just put it as a comment for humans, Otherwise, it will force an additional line that gives the base and the range. What would we put for example if $n$ is prime and the rule is not embedded? Should we have a
lattice2
format for embedded rules in base 2? Putting more options makes things more complicated. ↩ -
Another way of storing the NUS is as follows. For each coordinate $j$, each point can be identified by a $k$ bit integer, and the NUS maps each such $k$ to a $r$ bit integer that corresponds to the scrambled coordinate $j$ of this point. So we can simply store this map in an array of size $b^k$ whose entry $i$ contains the corresponding $r$ bit integer. Applying this NUS is then fast and straightforward. ↩
-
Alternative implementations of NUS that use a hashing function in place of a RNG are proposed in Burley (2020) and Laine and Karras (2011). These methods might be faster and the is much less information to store to reproduce a given scramble, but the hashing function must be fixed, known, and reliable. This essentially amount to fixing the RNG and storing only its seed. ↩