2D rendering plans #1453

rcoreilly · 2025-01-25T10:34:29Z

rcoreilly
Jan 25, 2025
Maintainer

This discussion is to organize plans around the 2D rendering infrastructure.

Currently, we're using a lightly modified version of https://github.com/srwiley/rasterx with the https://github.com/srwiley/scanx rasterizer backend, which was faster than the https://pkg.go.dev/golang.org/x/image/vector rasterizer in his testing (reported on the github readme for example).

This renderer is plenty fast for modern desktop systems, but when this CPU-based code runs via WASM on the web, we really start to see significant slowdowns, especially on mobile web platforms. Thus, one good strategy would be to directly leverage WebGPU, which we already have good infrastructure for, to do 2D rasterization, to avoid the CPU -> WASM slowdown. The https://github.com/linebender/vello framework in particular, written in Rust, provides very fast WebGPU-specific 2D rasterization, and was our nominal plan (by making a Go port and re-using their .wgsl shaders). However, in starting to investigate it, there are many issues that will be summarized in a subsequent sub-post here, and it is clear that this is a major undertaking with some significant drawbacks at this time.

We also looked at https://github.com/tdewolff/canvas which is all in Go and has lots of amazing features. Overall the code looks very compatible with our styles and paint packages, to the point where much of the actual code there is essentially redundant with that. The novel bits are mostly in the backends and in some of the more advanced text formatting functionality. The native Go render-to-an-image backend is none other than https://pkg.go.dev/golang.org/x/image/vector, so it is unlikely to represent a speedup relative to what we're currently using.

However, it also has an html canvas backend, which suggests a different solution to the web rendering problem: just leverage the browser's own canvas engine (skia or whatever) directly, instead of doing everything in Go. This is not as fancy as WebGPU but is almost certainly a much easier and more well-supported path. It is unclear when WebGPU on the iPhone safari browser will actually be usable, but chrome on Android works now (as of very recently -- and maybe not on various older devices?)

That would just leave lower-performance mobile native devices with a sub-standard 2D rendering system (iPhone, Android) -- they get the WebGPU Drawer compositing benefits, but not the basic 2D rendering.

It is somewhat difficult to have comparable test cases across different rasterizing backends, and much likely depends on the details of the content being rendered, but a first priority is to try to get some benchmarks comparing our current paint setup with canvas on various realistic rendering cases (e.g., the equivalent of scrolling text and images, and some data-heavy plots, and some basic SVG images).

Then we probably want to see if we can easily get the html canvas backend working, and see how that works (can hopefully use canvas to test that with our benchmarks first).

Eventually we may want to do the vello port, but probably not now.

The other major interacting topic here is text formatting on the way to rendering. Our current paint package has an impl that works OK but lacks key international font rendering and layout support. https://github.com/go-text is used by Fyne and Gio for their text layout, and it has a direct Go translation of the "industry standard" https://github.com/harfbuzz/harfbuzz C library for cross-language support. Canvas also has its own reasonable support for this, and its framework is more compatible overall on first inspection with our paint impl, so it might be easier to use for us. It also has latex line breaking and other support for latex (though not the native go version from star-tex?) There are some further layout things still pending: tdewolff/canvas#74 -- need to compare more directly with go-text.

It is not 100% clear but it this wiki page: https://github.com/tdewolff/canvas/wiki/Planning suggests that perhaps there are further optimizations that need to happen on the text rendering -- we'll presumably find this out with our benchmarks. Our code uses rendered glyph images that get cached (which takes up memory!) but is presumably faster and worth it.

See also #1056 and #568 for our current issues on these things.

rcoreilly · 2025-01-25T11:09:19Z

rcoreilly
Jan 25, 2025
Maintainer Author

Vello summary

After a first-pass reading through the code, here's my summary of how https://github.com/linebender/vello works, and what the issues are for us.

vello_encoding/path.rs turns the standard LineTo QuadTo etc path drawing primitives into a GPU-optimized binary encoding that ends up as a scene buffer in many of the GPU shaders, e.g., var<storage> scene: array<u32>; This entire encoding sub-package would need to be re-implemented in Go, using our styles and paint function interface. Not a huge thing, but some significant effort to get the binary encoding right for all the different path types.
vello/render.rs has the key rendering code, with sequential recording.dispatch( calls that launch the WebGPU shaders (all in vello_shaders). Each shader does a step of transformation from the initial path encoding to actual rendered pixels on a texture, with an initial coarse path that does a lot of the core structural logic, followed by an (optional?) fine path that maybe is mostly about antialiasing? The ability to put all of this "coarse" processing on the GPU is the main advantage of vello vs. other rasterizers, which do more of that on the CPU. The secret sauce here is something called a 'Monoid' which basically allows bounding boxes to be computed efficiently in parallel, as far as I can tell. The rasterization involves splitting things into tiles and accumulating the bits that go into each tile, after all the stacking and clipping etc.
It was surprising that each of these dispatch calls are sequentially dependent on the CPU, with each being launched with a different subset of buffers: that sync between CPU and GPU is a major point of slowdown, and the heterogenous mix of buffers for each shader runs counter to the basic organization of our gpu system, which is most efficient with a single overall buffer organization, which optimizes pure-GPU computation.
Here's the issue that explains what is going on: Strategy for robust dynamic memory, readback, and async linebender/vello#366 -- dynamic sizing! They are actively working on this issue, but the bottom line is: the solution to this and several other important current issues will likely come with significant code changes to the shaders etc (along with significant performance improvements).
Aside from fixing that issue, the other biggie is: Plan for glyph rendering linebender/vello#204 glyph rendering (i.e., text rendering), which is really the most important problem domain for us: text scrolling is slow on the mobile web platform. They are going to try to figure out how to do caching of rendered glyph images, which I think we already do in our impl. This is tricky on GPU because of significant limits on texture availability within a given render pass. However, in our gpu.Drawer, I found that if you can load up all the textures in advance, calling a separate render pass for each image is not too bad. And atlases and layered stacks of images can make this more efficient, but do require more management overhead.
In the absence of the glyph caching, they render each letter every time from the shape info, and they don't handle the slower hinting logic that makes the fonts look better, especially at smaller sizes or lower resolutions. This can also be slower on slower GPU hardware. And would require significant interfacing with Go-based font parsing to get the shapes up to the GPU.

So, again, vello has some significant further development needed, and thus a Go port would then require significant updating in the future, and it may not render very good looking and / or fast text right now.

Overall, the real strength of vello is in rendering complex graphical images with lots of overlap etc. This is not really our main use-case. Blitting font glyphs is our main use case. We can probably improve our CPU performance on that, in much less total time than doing the vello port.

We'll continue to monitor it, and hopefully at some point it will be useful and super cool to get all this working in Go, but not right now.

2 replies

rcoreilly Jan 25, 2025
Maintainer Author

It is also important to note that doing some kind of wrapper around vello itself, instead of a full Go port, is not useful because it would not work on the web: no cgo on the web! And this also presumes a C wrapper around vello which does not exist yet either! If we do the port, then we have pure Go calling WebGPU shaders through javascript via the webgpu javascript bindings: very nice.

kkoreilly Jan 25, 2025
Maintainer

By the way, we could theoretically link to C/Rust on the web using emscripten, but that would be a lot of work and would require a non-standard build system since Go doesn't automatically support it. (The big advantage of that would be that wgpu has a WebGL backup backend implemented, so we could get 3D and hardware accelerated rendering and compute on basically all browsers immediately).

rcoreilly · 2025-01-25T20:44:01Z

rcoreilly
Jan 25, 2025
Maintainer Author

Canvas issues

float32 vs. float64

There are some "impedance mismatches" between canvas and paint, the main one being the use of float64 in canvas vs float32 in our code. Here's some relevant considerations around this:

WebGPU is exclusively float32 based, and vello accordingly uses a float32 point representation, although their cpu general data structure library kurbo uses float64.
All 3D graphics libraries use float32 because GPUs use float32 in general. Hence, xyz will always remain on float32. Our math32 library (adapted from g3n) provides extensive support for both 2D and 3D graphics math. It would be awkward to intermix float64 here.
The "gold standard" 2D vector library https://github.com/google/skia uses float32
But another widely used one, https://gitlab.freedesktop.org/cairo/cairo uses float64

On balance, I would be very reluctant to move away from float32.

Renderer interface

Here's the renderer interface in canvas, which is then the target api for all backends:

canvas.go:
// Renderer is an interface that renderers implement. It defines the size of the target (in mm) and functions to render paths, text objects and images.
type Renderer interface {
	Size() (float64, float64)
	RenderPath(path *Path, style Style, m Matrix)
	RenderText(text *Text, m Matrix)
	RenderImage(img image.Image, m Matrix)
}

The rasterization api all goes through Path, in path.go:

// Path defines a vector path in 2D using a series of commands (MoveTo, LineTo, QuadTo, CubeTo, ArcTo and Close). Each command consists of a number of float64 values (depending on the command) that fully define the action. The first value is the command itself (as a float64). The last two values is the end point position of the pen after the action (x,y). QuadTo defined one control point (x,y) in between, CubeTo defines two control points, and ArcTo defines (rx,ry,phi,large+sweep) i.e. the radius in x and y, its rotation (in radians) and the large and sweep booleans in one float64.
// Only valid commands are appended, so that LineTo has a non-zero length, QuadTo's and CubeTo's control point(s) don't (both) overlap with the start and end point, and ArcTo has non-zero radii and has non-zero length. For ArcTo we also make sure the angle is in the range [0, 2*PI) and we scale the radii up if they appear too small to fit the arc.
type Path struct {
	d []float64
	// TODO: optimization: cache bounds and path len until changes (clearCache()), set bounds directly for predefined shapes
	// TODO: cache index last MoveTo, cache if path is settled?
}

The srwiley/raster version of Path, which defines our current backend api, looks like this (paint/raster/geom.go), using the fixed.Point26_6 representation in literal pixel coordinates after all the transforms etc have been applied:

// PathCommand is the type for the path command token
type PathCommand fixed.Int26_6 //enums:enum -no-extend

// Human readable path command constants
const (
	PathMoveTo PathCommand = iota
	PathLineTo
	PathQuadTo
	PathCubicTo
	PathClose
)

// A Path starts with a PathCommand value followed by zero to three fixed
// int points.
type Path []fixed.Int26_6

And the more abstract Raster interface looks like this:

// Raster is the interface for rasterizer types. It extends the [Adder]
// interface to include LineF and JoinF functions.
type Raster interface {
	Adder
	LineF(b fixed.Point26_6)
	JoinF()
}

// Adder is the interface for types that can accumulate path commands
type Adder interface {
	// Start starts a new curve at the given point.
	Start(a fixed.Point26_6)

	// Line adds a line segment to the path
	Line(b fixed.Point26_6)

	// QuadBezier adds a quadratic bezier curve to the path
	QuadBezier(b, c fixed.Point26_6)

	// CubeBezier adds a cubic bezier curve to the path
	CubeBezier(b, c, d fixed.Point26_6)

	// Closes the path to the start point if closeLoop is true
	Stop(closeLoop bool)
}

Note that ArcTo is processed upstream.

For reference, in Skia, the core/SkPath.h is the relevant type, and it adds a Conic path element beyond what is standard in the others.

class SK_API SkPath {
public:
    /**
     *  Create a new path with the specified segments.
     *
     *  The points and weights arrays are read in order, based on the sequence of verbs.
     *
     *  Move    1 point
     *  Line    1 point
     *  Quad    2 points
     *  Conic   2 points and 1 weight
     *  Cubic   3 points
     *  Close   0 points
...

Styling

Every framework has its own way of defining the stroke styles etc. Canvas is similar to our styles, but of course there are differences, which we need to examine.

In our copy of srwiley/raster, the styling is implemented with explicit function calls, instead of an overall Style struct as in canvas, e.g.,:

// SetStroke set the parameters for stroking a line. width is the width of the line, miterlimit is the miter cutoff
// value for miter, arc, miterclip and arcClip joinModes. CapL and CapT are the capping functions for leading and trailing
// line ends. If one is nil, the other function is used at both ends. If both are nil, both ends are ButtCapped.
// gp is the gap function that determines how a gap on the convex side of two joining lines is filled. jm is the JoinMode
// for curve segments.
func (r *Stroker) SetStroke(width, miterLimit fixed.Int26_6, capL, capT CapFunc, gp GapFunc, jm JoinMode) {

Images and text

These are not handled at all in srwiley/raster, and we just handle them separately in our paint system. A key question is if we ever want to be able to run on GPU in a seamless way, and also for html/canvas, is how to represent an image in a way that is generic across a GPU texture vs. Go *image.RGBA etc. And the text needs to be something more like the rich text rep that is in canvas.

Bottom line

We should probably make an interface like that in canvas, and then have glue for each backend, etc. Can convert float32 to float64 etc to leverage the canvas api, but it almost certainly will not be the high-performance pathway. In any case, with our canvas bridge interface, we could directly leverage the other fancy backends like PDF, and probably we'd want to write our own more direct version of the html canvas backend for web.

3 replies

kkoreilly Jan 25, 2025
Maintainer

It seems like we should fold canvas into Cogent Core, only taking the novel parts, instead of maintaining a compatibility layer that has to constantly convert between types.

rcoreilly Jan 25, 2025
Maintainer Author

And miss out on (or have to manually apply) upstream updates on canvas?

rcoreilly Jan 26, 2025
Maintainer Author

Some further points about the Raster interface:

For optimal GPU usage down the road, it would be best to have a single high-level Scene representation as in vello, which captures everything being rendered in one big stream. The more that can be processed in one parallel blob, the better. This is also true for html canvas, which improves significantly with batching.
Scene is just a big list of path, text and image elements.
There is some memory overhead involved in constructing such a scene vs. just sending it off right away, but we can minimize that by using an efficient vello-like representation, e.g., compressing the styles down into the path structure (presumably just use vello encoding directly so we're ready for that in the future?).
The canvas Renderer api is missing anything about the overall config except the size in mm -- presumably all that happens in backend-specific ways? what about other state? carrying over existing pixel state and just updating one section (as we often do) vs. clearing and starting over? Unclear how all this happens.
Presumably need more state in the renderer, particularly in terms of the ClipPath? Our paint.State representation has this clip stack, and I've never quite dug into exactly how it works, but it is pretty critical (and at least somewhat broken in our code!). We also have a bounds stack which is key for restricting rendering to a given region. Ok it looks like this is all managed in the canvas Path using Clip ops -- tdewolff has done some significant work on this aspect of rendering and it might be worth investigating further: https://github.com/tdewolff/canvas/wiki/Boolean-operations -- in path_intersection.go I'm guessing this all happens in a pre-processing step prior to sending to the renderer, so that keeps the renderer side simpler (but removes some opportunities for GPU optimization -- I think vello does this stuff too).

Canvas does have the Context:

// Context maintains the state for the current path, path style, and view transformation matrix.
type Context struct {
	Renderer

	path *Path
	ContextState
	stack []ContextState
}

A further key question is when the transforms get applied. Canvas sends them down to the renderer, but we pre-handle all that in paint, sending only raw pixel coords. It would be good to understand if svg / pdf work better by sending the transforms in (probably) vs. pre-transforming.

rcoreilly · 2025-01-25T21:27:01Z

rcoreilly
Jan 25, 2025
Maintainer Author

Text representation

Here's the Text elements from canvas:

// Text holds the representation of a text object.
type Text struct {
	lines []line
	fonts map[*Font]bool
	WritingMode
	TextOrientation
	Width, Height float64
	Text          string
	Overflows     bool // true if lines stick out of the box
}

type line struct {
	y     float64
	spans []TextSpan
}

// TextSpan is a span of text.
type TextSpan struct {
	X         float64
	Width     float64
	Face      *FontFace
	Text      string
	Glyphs    []text.Glyph
	Direction text.Direction
	Rotation  text.Rotation
	Level     int

	Objects []TextSpanObject
}

// TextSpanObject is an object that can be used within a text span. It is a wrapper around Canvas and can thus draw anything to be mixed with text, such as images (emoticons) or paths (symbols).
type TextSpanObject struct {
	*Canvas
	X, Y          float64
	Width, Height float64
	VAlign        VerticalAlign
}

// Glyph is a shaped glyph for the given font and font size. It specified the glyph ID, the cluster ID, its X and Y advance and offset in font units, and its representation as text.
type Glyph struct {
	SFNT *font.SFNT
	Size float64
	Script
	Vertical bool // is false for Latin/Mongolian/etc in a vertical layout

	ID       uint16
	Cluster  uint32
	XAdvance int32
	YAdvance int32
	XOffset  int32
	YOffset  int32
	Text     rune
}

// Script is the script.
type Script uint32  <- this is a wrapper on go-text/typesetting/language

Aha, and how I see that the text package in canvas directly uses the go-text typesetting framework and that is where the harfbuzz and fribidi comes from. So there isn't really a choice to be made here: the only choice is how to package that stuff, and overall the canvas packaging for the overall Text rep is similar to ours, so it will be a helpful reference if nothing else.

Here's ours for reference:

// Text contains one or more Span elements, typically with each
// representing a separate line of text (but they can be anything).
type Text struct {
	Spans []Span

	// bounding box for the rendered text.  use Size() method to get the size.
	BBox math32.Box2

	// fontheight computed in last Layout
	FontHeight float32

	// lineheight computed in last Layout
	LineHeight float32

	// whether has had overflow in rendering
	HasOverflow bool

	// where relevant, this is the (default, dominant) text direction for the span
	Dir styles.TextDirections

	// hyperlinks within rendered text
	Links []TextLink
}

// Span contains fully explicit data needed for rendering a span of text
// as a slice of runes, with rune and Rune elements in one-to-one
// correspondence (but any nil values will use prior non-nil value -- first
// rune must have all non-nil). Text can be oriented in any direction -- the
// only constraint is that it starts from a single starting position.
// Typically only text within a span will obey kerning.  In standard
// Text context, each span is one line of text -- should not have new
// lines within the span itself.  In SVG special cases (e.g., TextPath), it
// can be anything.  It is NOT synonymous with the HTML <span> tag, as many
// styling applications of that tag can be accommodated within a larger
// span-as-line.  The first Rune RelPos for LR text should be at X=0
// (LastPos = 0 for RL) -- i.e., relpos positions are minimal for given span.
type Span struct {

	// text as runes
	Text []rune

	// render info for each rune in one-to-one correspondence
	Render []Rune

	// position for start of text relative to an absolute coordinate that is provided at the time of rendering.
	// This typically includes the baseline offset to align all rune rendering there.
	// Individual rune RelPos are added to this plus the render-time offset to get the final position.
	RelPos math32.Vector2

	// rune position for further edge of last rune.
	// For standard flat strings this is the overall length of the string.
	// Used for size / layout computations: you do not add RelPos to this,
	// as it is in same Text relative coordinates
	LastPos math32.Vector2

	// where relevant, this is the (default, dominant) text direction for the span
	Dir styles.TextDirections

	// mask of decorations that have been set on this span -- optimizes rendering passes
	HasDeco styles.TextDecorations
}

// Rune contains fully explicit data needed for rendering a single rune
// -- Face and Color can be nil after first element, in which case the last
// non-nil is used -- likely slightly more efficient to avoid setting all
// those pointers -- float32 values used to support better accuracy when
// transforming points
type Rune struct {

	// fully specified font rendering info, includes fully computed font size.
	// This is exactly what will be drawn, with no further transforms.
	// If nil, previous one is retained.
	Face font.Face `json:"-"`

	// Color is the color to draw characters in.
	// If nil, previous one is retained.
	Color image.Image `json:"-"`

	// background color to fill background of color, for highlighting,
	// <mark> tag, etc.  Unlike Face, Color, this must be non-nil for every case
	// that uses it, as nil is also used for default transparent background.
	Background image.Image `json:"-"`

	// dditional decoration to apply: underline, strike-through, etc.
	// Also used for encoding a few special layout hints to pass info
	// from styling tags to separate layout algorithms (e.g., &lt;P&gt; vs &lt;BR&gt;)
	Deco styles.TextDecorations

	// relative position from start of Text for the lower-left baseline
	// rendering position of the font character
	RelPos math32.Vector2

	// size of the rune itself, exclusive of spacing that might surround it
	Size math32.Vector2

	// rotation in radians for this character, relative to its lower-left
	// baseline rendering position
	RotRad float32

	// scaling of the X dimension, in case of non-uniform scaling, 0 = no separate scaling
	ScaleX float32
}

1 reply

rcoreilly Feb 1, 2025
Maintainer Author

Here's gio structs. It adds a visualOrder slice that allows mapping from a standardized LR visual order to the input source order, which may get rendered RL. This is important for text selection logic, which does not appear to be supported in canvas.

It does not appear to support vertical text. It does look like canvas supports vertical layout, and it also supports text decoration (underline etc), which Gio does not appear to support. It is unclear what kind of underlying representation of rich text is used, but we definitely need to separate the html part of it from the rich text aspect of things. Canvas also supports full knuth level line breaking, which is amazing, but also looks very complicated. It does not look like we could use the canvas code directly because it has its own distinct font representation, and doesn't have the key selection encoding, and uses float64 instead of 32, and actually text rendering needs to happen in fixed, though probably it makes sense to postpone that as long as possible, for other backend representations.

We also handle SVG text rendering, which adds a rotation and path components. ugh.

// document holds a collection of shaped lines and alignment information for
// those lines.
type document struct {
	lines     []line
	alignment Alignment
	// alignWidth is the width used when aligning text.
	alignWidth      int
	unreadRuneCount int
}

// A line contains the measurements of a line of text.
type line struct {
	// runs contains sequences of shaped glyphs with common attributes. The order
	// of runs is logical, meaning that the first run will contain the glyphs
	// corresponding to the first runes of data in the original text.
	runs []runLayout
	// visualOrder is a slice of indices into Runs that describes the visual positions
	// of each run of text. Iterating this slice and accessing Runs at each
	// of the values stored in this slice traverses the runs in proper visual
	// order from left to right.
	visualOrder []int
	// width is the width of the line.
	width fixed.Int26_6
	// ascent is the height above the baseline.
	ascent fixed.Int26_6
	// descent is the height below the baseline, including
	// the line gap.
	descent fixed.Int26_6
	// lineHeight captures the gap that should exist between the baseline of this
	// line and the previous (if any).
	lineHeight fixed.Int26_6
	// direction is the dominant direction of the line. This direction will be
	// used to align the text content of the line, but may not match the actual
	// direction of the runs of text within the line (such as an RTL sentence
	// within an LTR paragraph).
	direction system.TextDirection
	// runeCount is the number of text runes represented by this line's runs.
	runeCount int

	yOffset int
}

type runLayout struct {
	// VisualPosition describes the relative position of this run of text within
	// its line. It should be a valid index into the containing line's VisualOrder
	// slice.
	VisualPosition int
	// X is the visual offset of the dot for the first glyph in this run
	// relative to the beginning of the line.
	X fixed.Int26_6
	// Glyphs are the actual font characters for the text. They are ordered
	// from left to right regardless of the text direction of the underlying
	// text.
	Glyphs []glyph
	// Runes describes the position of the text data this layout represents
	// within the containing text.Line.
	Runes Range
	// Advance is the sum of the advances of all clusters in the Layout.
	Advance fixed.Int26_6
	// PPEM is the pixels-per-em scale used to shape this run.
	PPEM fixed.Int26_6
	// Direction is the layout direction of the glyphs.
	Direction system.TextDirection
	// face is the font face that the ID of each Glyph in the Layout refers to.
	face *font.Face
	// truncator indicates that this run is a text truncator standing in for remaining
	// text.
	truncator bool
}

// glyph contains the metadata needed to render a glyph.
type glyph struct {
	// id is this glyph's identifier within the font it was shaped with.
	id GlyphID
	// clusterIndex is the identifier for the text shaping cluster that
	// this glyph is part of.
	clusterIndex int
	// glyphCount is the number of glyphs in the same cluster as this glyph.
	glyphCount int
	// runeCount is the quantity of runes in the source text that this glyph
	// corresponds to.
	runeCount int
	// xAdvance and yAdvance describe the distance the dot moves when
	// laying out the glyph on the X or Y axis.
	xAdvance, yAdvance fixed.Int26_6
	// xOffset and yOffset describe offsets from the dot that should be
	// applied when rendering the glyph.
	xOffset, yOffset fixed.Int26_6
	// bounds describes the visual bounding box of the glyph relative to
	// its dot.
	bounds fixed.Rectangle26_6
}

// Range describes the position and quantity of a range of text elements
// within a larger slice. The unit is usually runes of unicode data or
// glyphs of shaped font data.
type Range struct {
	// Count describes the number of items represented by the Range.
	Count int
	// Offset describes the start position of the represented
	// items within a larger list.
	Offset int
}

rcoreilly · 2025-01-31T00:56:41Z

rcoreilly
Jan 31, 2025
Maintainer Author

Benchmarking initial results

renderbench_test.go

BenchmarkTable `go test -bench BenchmarkTable -run none -tags update`

3.8 ms -- new rasterx with canvas arc paths directly converted to cubeto
4 ms -- old main rasterx prior to newpaint
4.4ms -- new rasterx with canvas arc paths
5.6 ms -- new rasterx using ScanFT rasterizer
5.8 ms -- new rasterx with old quad arc paths
7.4 ms -- new canvas raster with rasterx scan
3.6 s (~1000x worse!) -- new canvas raster with image/vector raster with arc paths
3.6 s -- rasterx using ScannerGV which uses image/vector rasterizer -- convergent replication with above result.

BenchmarkForm `go test -bench BenchmarkForm -run none -tags update`

6.8 ms -- new rasterx with canvas arc paths directly converted to cubeto
7 ms -- old main rasterx prior to newpaint
7.6 ms -- new rasterx with canvas arc paths
9.2 ms -- new rasterx using ScanFT rasterizer
12.8 ms -- new canvas raster with rasterx scan
13 ms -- new rasterx with old quad arc paths
29 s -- (seconds!) new canvas raster with image/vector with arc paths

Profiling

go test -run TestProfileForm -tags update -- runs 200x updates with targeted profile around various key bits of code.

Rasterx

rasterx.(*Renderer).RenderPath-rasterx-fill                 Total:  414.30 ms	Avg:  0.00	N:155400	Pct: 31.21
rasterx.(*Renderer).RenderPath-rasterx-stroke               Total:  375.04 ms	Avg:  0.00	N:155400	Pct: 28.25
rasterx.(*Renderer).Fill-rasterx-draw                       Total:  264.32 ms	Avg:  0.00	N: 66000	Pct: 19.91
rasterx.(*Renderer).RenderPath-rasterx-replace-arcs         Total:  165.84 ms	Avg:  0.00	N:155400	Pct: 12.49
rasterx.(*Renderer).Stroke-rasterx-draw                     Total:   94.41 ms	Avg:  0.00	N: 70000	Pct:  7.11
rasterx.(*Renderer).RenderPath-rasterx-path                 Total:   13.48 ms	Avg:  0.00	N:155400	Pct:  1.02

These are not mutually exclusive. Stroke is very fast! Replace arcs is pretty slow so we could try to optimize that.

UPDATE: benchmark results above now include case where ArcTo directly generates corresponding CubeTo commands instead of using the ReplaceArcs function, which has a lot of memory and processing overhead. This is now the fastest case overall!

rasterx-fill includes fill-rasterx-draw, which is the scan draw component. the two -draw components add up to ~360 ms, but the full fill and stroke are 775, some of which is extra stroke stuff.

ScanGV and ScanFT

The rasterx system provides two other rasterizers using a consistent interface: ScanGV is a wrapper around image/vector and it is just as slow as Canvas using that rasterizer (500-1000 x slower!), and ScanFT is a wrapper around the freetype rasterizer. ScanFT is a bit slower than scanx, and does not directly support gradients, so there is no point in using it. Commit 87b8f77 has this code, which is now removed.

Canvas rasterizer

This uses path-based operations to create the stroke by offsetting the path to create a filled region that is then sent to the filler, whereas the rasterx stroker does who knows what but it is MUCH faster -- the stroke step here is 1204 msec! That pretty much accounts for the overall performance difference, as one would expect.

Using rasterx scanx scanner: ~600ms total of scanner time is for both fill and stroke, which is reasonable given ~775ms total for rasterx renderer.

rasterizer.(*Renderer).RenderPath-canvas-stroker            Total: 1204.29 ms	Avg:  0.02	N: 70000	Pct: 65.27
rasterizer.(*Renderer).ToRasterizerScan-canvas-scan         Total:  597.69 ms	Avg:  0.00	N:136000	Pct: 32.39
rasterizer.(*Renderer).RenderPath-canvas-transform          Total:   43.06 ms	Avg:  0.00	N: 66000	Pct:  2.33

Using image/vector rasterizer (running just 1 iteration!) ras-draw is the vector rasterizer -- incredibly slow!

rasterizer.(*Renderer).RenderPath-canvas-stroke-ras-draw    Total: 5258.34 ms	Avg: 15.02	N:   350	Pct: 51.42
rasterizer.(*Renderer).RenderPath-canvas-fill-ras-draw      Total: 4957.38 ms	Avg: 15.02	N:   330	Pct: 48.48
rasterizer.(*Renderer).RenderPath-canvas-stroker            Total:    8.46 ms	Avg:  0.02	N:   350	Pct:  0.08
rasterizer.ToRasterizer-canvas-to-rasterizer                Total:    1.19 ms	Avg:  0.00	N:   680	Pct:  0.01
rasterizer.(*Renderer).RenderPath-canvas-transform          Total:    0.31 ms	Avg:  0.00	N:   330	Pct:  0.00

1 reply

rcoreilly Mar 3, 2025
Maintainer Author

`composer.Composer` render pipeline in a separate goroutine, and glyph caching (3/3/2025)

Note: the numbers here are not comparable to the above because we now need to benchmark the full renderWindow() call whereas before we were only doing Scene.RenderWidget(). Re-ran from the above profiling branch to get comparable numbers.

BenchmarkForm

7.3ms for UseGlyphCache = true -- significant speedup vs below:
7.7ms for previous rasterbench case reported above, for the full renderWindow() call: comparable to the UseGlyphCache = true case because it is using our prior bitmap-based font rendering pipeline. So relative to that, we're a bit faster now!
9.8ms for false: clearly notably worse without glyph caching relative to two above baselines.

ProfileForm

UseGlyphCache = true:

core.(*renderWindow).renderAsync-Compose                    Total: 1340.38 ms	Avg:  6.70	N:   200	Pct: 78.15
core.(*Scene).doUpdate-render                               Total:  231.31 ms	Avg:  1.16	N:   200	Pct: 13.49
rasterx.(*Renderer).RenderText-RenderText                   Total:  143.49 ms	Avg:  0.00	N: 34600	Pct:  8.37

UseGlyphCache = false:

core.(*renderWindow).renderAsync-Compose                    Total: 1831.50 ms	Avg:  9.16	N:   200	Pct: 68.07
rasterx.(*Renderer).RenderText-RenderText                   Total:  627.42 ms	Avg:  0.02	N: 34600	Pct: 23.32
core.(*Scene).doUpdate-render                               Total:  231.65 ms	Avg:  1.16	N:   200	Pct:  8.61

So 143 / 627 = ~20% of the time = 5x speedup for glyph caching.

Also note that the Scene.doUpdate call is 230ms -- that is now done in parallel with the rendering in separate goroutine, so that is a not-insignificant savings that does not show up in above numbers, because the only way to get reliable profile / benchmark results was to disable the "go" call in renderwindow.renderAsync

rcoreilly · 2025-01-31T09:34:36Z

rcoreilly
Jan 31, 2025
Maintainer Author

Progress update

Per #1457 PR, at this point the new API is in place, with a full render.Render "scene" that has all of the path, text, and image painter actions that were generated since the previous RenderDone call. This is sent to the renderer to actually render everything: should be sufficient info in there for all the different backends hopefully. There is a very small performance difference for this version relative to the "render everything immediately" previous version, and it has many potential advantages for html canvas and future GPU based rendering.

Interestingly, we gained basically nothing from tdewolf/canvas in terms of its rendering code which is notably slower than the amazing rasterx per above benchmarks and profiling, but at least it did provide a solid Path framework that is much richer than what I previously coded for the svg/path element. This Path representation is essential for all the backends, including future GPU, so it is a great piece of infrastructure.

0 replies

rcoreilly · 2025-02-01T10:38:45Z

rcoreilly
Feb 1, 2025
Maintainer Author

Text plans

Fonts

gio has better font handling than canvas -- look there for examples.
need an intermediate rich text representation -- none of the existing have anything reasonable that I could find.
go-text/fontscan SetQuery is the key step for selecting the overall parameters for a font (also SetScript for language / script), and then ResolveFace takes a rune and returns a *font.Face. So you setup the query and then it pings with each rune as it goes in doing the shaping.
A rich text input format thus contains Span elements that have a shared style, which parameterizes the Query. Instead of representing this in original HTML all the time, it would be good to boil it down into a compact Font Style tag element that encodes the style within the space of a rune or something like that, so that we could just have []rune as the basic type, like Path is []float32.
I didn't find anything defined in the existing unicode tables, but there are https://en.wikipedia.org/wiki/Private_Use_Areas private use areas that can be co-opted. There are 6,400 items in U+E000...U+F8FF, and plenty more in higher ones. but probably we could use the BMP private area and easily encode all the bits we need. So basically you just insert one of these style runes at the start of each span in the input sequence, and then that is decoded into a query, and defines the "run" length for shaping.
Need a good soln for mapping from original source rune inputs into the style-tagged version though, that is quickly updatable as text changes. In current texteditor/text encoding, there is the full html markup thing that is not very efficient. maybe just storing as explicit spans is best? just [][]rune and each start of []rune has the style tag, and it is easy to index iteratively? probably that is best? use join / split kinds of things to get back to the original source etc.
The family is encoded in terms of a set of font categories (serif, sans, mono, script, etc) that then have a full textual definition in the overall font scanner -- i.e. the category id is a lookup into the full string that is used to set the query. We get this from an overall preferences setting or some other specific context that defines which family string to use for each category.

Layout from fonts

gio text/gotext.go has shaperImpl that manages all the shaping stuff, with the fontscan.FontMap providing the ResolveFace call that is abstracted in shaping via the Fontmap interface.

// shaperImpl implements the shaping and line-wrapping of opentype fonts.
type shaperImpl struct {
	// Fields for tracking fonts/faces.
	fontMap      *fontscan.FontMap
	faces        []*font.Face
	faceToIndex  map[*font.Font]int
	faceMeta     []giofont.Font
	defaultFaces []string
	logger       interface {
		Printf(format string, args ...any)
	}
	parser parser

	// Shaping and wrapping state.
	shaper        shaping.HarfbuzzShaper
	wrapper       shaping.LineWrapper
	bidiParagraph bidi.Paragraph

	// Scratch buffers used to avoid re-allocating slices during routine internal
	// shaping operations.
	splitScratch1, splitScratch2 []shaping.Input
	outScratchBuf                []shaping.Output
	scratchRunes                 []rune

	// bitmapGlyphCache caches extracted bitmap glyph images.
	bitmapGlyphCache bitmapCache
}

3 replies

kkoreilly Feb 1, 2025
Maintainer

From an end user standpoint, supporting some kind of functional API for rich text would be great, and I think that using special unicode characters seems somewhat tenuous.

rcoreilly Feb 2, 2025
Maintainer Author

Text organization

Here are the different levels of text representations, what properties they have, and how we might want to organize them. I created a new overall top-level text parent package to manage these all in one place.

Sources:

string, []byte, []rune -- basic Go level representations of source text, which can include \n \r line breaks, all manner of unicode characters, and require a language and script context to properly interpret.
HTML or other rich text formats (e.g., PDF, RTF, even .docx etc), which can include local text styling (bold, underline, font size, font family, etc), links, and more complex, larger-scale elements including paragraphs, images, tables, etc.

Levels:

Spans or Runs: this is the smallest chunk of text above the individual runes, where all the runes share the same font, language, script etc characteristics. This is the level at which harfbuzz operates, transforming Input spans into Output runs.
Lines: for line-based uses (e.g., texteditor), spans can be organized (strictly) into lines. This imposes strict LTR, RTL horizontal ordering, and greatly simplifies the layout process. Only text is relevant.
Text: for otherwise unconstrained text rendering, you can have horizontal or vertical text that requires a potentially complex layout process. go-text includes a segmenter for finding unicode-based units where line breaks might occur, and a shaping.LineWrapper that manages the basic line wrapping process using the unicode segments. canvas includes a RichText representation that supports Donald Knuth's linebreaking algorithm, which is used in LaTeX, and generally produces very nice looking results. This RichText also supports any arbitrary graphical element, so you get full layout of images along with text etc.

Uses:

texteditor.Editor, planned Terminal: just need pure text, line-oriented results. This is the easy path and we don't need to discuss further. Can use our new rich text span element instead of managing html for the highlighting / markup rendering.
core.Text, core.TextField: pure text (no images) but ideally supports full arbitrary text layout. The overall layout engine is the core widget layout system, optimized for GUI-level layout, and in general there are challenges to integrating the text layout with this GUI layout, due to bidirectional constraints (text shape changes based on how much area it has, and how much area it has influences the overall widget layout). Knuth's algorithm explicitly handles the interdependencies through a dynamic programming approach.
svg.Text: similar to core.Text but also requires arbitrary rotation and scaling parameters in the output, in addition to arbitrary x,y locations per-glyph that can be transformed overall.
htmlcore and content: ideally would support LaTeX quality full rich text layout that includes images, "div" level grouping structures, tables, etc. One possible idea here is to support two different layout paradigms: Markdown directly parsed with the LaTeX layout algorithm, for content, with optimized support for the PDF backend and pagination there, and separately an html layout frame that directly implements html layout algorithms.

Organization:

text/rich: the rich.Spans is a [][]rune type that encodes the local font-level styling properties (bold, underline, etc) for the basic chunks of text input. This is the basic engine for basic harfbuzz shaping and all text rendering, and produces a corresponding text/ptext ptext.Runs output that mirrors the Spans input and handles the basic machinery of text rendering. This is the replacement for the ptext.Text, Span and Rune elements that we have now.
text/lines: manages Spans and Runs for line-oriented uses (texteditor, terminal). Need to move parse/lexer/Pos into lines, along with probably some of the other stuff from lexer, and move parser/tokens into text/tokens as it is needed to be our fully general token library for all markup. Probably just move parse under text too?
text/text: is the general unconstrained text layout framework: do we make the most general-purpose LaTeX layout system with arbitrary textobject elements as in canvas? Is this just the core.Text guy? textobjects are just wrappers around render.Render items -- need an interface that gives the size of the elements, and how much detail does the layout algorithm need? need to be able to put any Widget elements. This is all a bit up in the air. In the mean time, we can start with a basic go-text based text-only layout system that will get core.Text functionality working.

rcoreilly Feb 3, 2025
Maintainer Author

Good link for complex international layout issues: https://www.w3.org/International/articles/vertical-text/

rcoreilly · 2025-02-03T06:57:36Z

rcoreilly
Feb 3, 2025
Maintainer Author

Can now do the actual render in a separate thread!

@kkoreilly just pointed out that once we create the full render.Render representation in the Paint loop, we can safely ship that off to a separate goroutine to actually do all the rendering! This should result in a significant improvement in speed, with minimal additional complexity or locking issues!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2D rendering plans #1453

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 10 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

2D rendering plans #1453

rcoreilly Jan 25, 2025 Maintainer

Replies: 7 comments · 10 replies

rcoreilly Jan 25, 2025 Maintainer Author

Vello summary

rcoreilly Jan 25, 2025 Maintainer Author

kkoreilly Jan 25, 2025 Maintainer

rcoreilly Jan 25, 2025 Maintainer Author

Canvas issues

float32 vs. float64

Renderer interface

Styling

Images and text

Bottom line

kkoreilly Jan 25, 2025 Maintainer

rcoreilly Jan 25, 2025 Maintainer Author

rcoreilly Jan 26, 2025 Maintainer Author

rcoreilly Jan 25, 2025 Maintainer Author

Text representation

rcoreilly Feb 1, 2025 Maintainer Author

rcoreilly Jan 31, 2025 Maintainer Author

Benchmarking initial results

BenchmarkTable go test -bench BenchmarkTable -run none -tags update

BenchmarkForm go test -bench BenchmarkForm -run none -tags update

Profiling

Rasterx

ScanGV and ScanFT

Canvas rasterizer

rcoreilly Mar 3, 2025 Maintainer Author

composer.Composer render pipeline in a separate goroutine, and glyph caching (3/3/2025)

BenchmarkForm

ProfileForm

rcoreilly Jan 31, 2025 Maintainer Author

Progress update

rcoreilly Feb 1, 2025 Maintainer Author

Text plans

Fonts

Layout from fonts

kkoreilly Feb 1, 2025 Maintainer

rcoreilly Feb 2, 2025 Maintainer Author

Text organization

rcoreilly Feb 3, 2025 Maintainer Author

rcoreilly Feb 3, 2025 Maintainer Author

Can now do the actual render in a separate thread!

rcoreilly
Jan 25, 2025
Maintainer

Replies: 7 comments 10 replies

rcoreilly
Jan 25, 2025
Maintainer Author

rcoreilly Jan 25, 2025
Maintainer Author

kkoreilly Jan 25, 2025
Maintainer

rcoreilly
Jan 25, 2025
Maintainer Author

kkoreilly Jan 25, 2025
Maintainer

rcoreilly Jan 25, 2025
Maintainer Author

rcoreilly Jan 26, 2025
Maintainer Author

rcoreilly
Jan 25, 2025
Maintainer Author

rcoreilly Feb 1, 2025
Maintainer Author

rcoreilly
Jan 31, 2025
Maintainer Author

BenchmarkTable `go test -bench BenchmarkTable -run none -tags update`

BenchmarkForm `go test -bench BenchmarkForm -run none -tags update`

rcoreilly Mar 3, 2025
Maintainer Author

`composer.Composer` render pipeline in a separate goroutine, and glyph caching (3/3/2025)

rcoreilly
Jan 31, 2025
Maintainer Author

rcoreilly
Feb 1, 2025
Maintainer Author

kkoreilly Feb 1, 2025
Maintainer

rcoreilly Feb 2, 2025
Maintainer Author

rcoreilly Feb 3, 2025
Maintainer Author

rcoreilly
Feb 3, 2025
Maintainer Author