Unexpected snapshot errors when snapshotting with lots of newlines #126

nedtwigg · 2022-12-10T03:38:38Z

Thanks so much for the library! I'm a big fan of snapshot testing, I've been baffled that java doesn't have one, and I think the high-level design of yours is fantastic! When I first tried it I got stuck for a while. Here was the behavior I was seeing

tests were passing locally
failing on CI with this error

  au.com.origin.snapshots.exceptions.LogGithubIssueException: Corrupt Snapshot (REGEX matches = 0): possibly due to manual editing or our REGEX failing
  Possible Solutions
  1. Ensure you have not accidentally manually edited the snapshot file!
  2. Compare the snapshot with GIT history
  
  *** This exception should never be thrown ***

I dug around and found this eventually

java-snapshot-testing/java-snapshot-testing-core/src/main/java/au/com/origin/snapshots/serializers/ToStringSnapshotSerializer.java

Lines 28 to 32 in 5330f07

    
           if (it.contains(SnapshotFile.SPLIT_STRING)) { 
        
             log.warn( 
        
                 "Found 3 consecutive lines in your snapshot \\n\\n\\n. This sequence is reserved as the snapshot separator - replacing with \\n.\\n.\\n"); 
        
             return it.replaceAll(SnapshotFile.SPLIT_STRING, "\n.\n.\n"); 
        
           }

And it so happens that I was snapshotting input that started and ended with lots of newlines. So I tried adding a trim() before I passed my data to you and that fixed it.

IMO, it's a big deal to mangle the snapshot data - that's the most sacred part of a snapshot library! I think it's really important to pick a lossless encoding function. trim() works well enough for me for now so I'm not going to take the time to contribute a PR, but I can offer up this little hunk of code that I've used in several projects for handling problems of this sort (lossless roundtrip encoding of text in text). Feel free to use it or take a different approach :)

/**
 * If your escape policy is "'123", it means this:
 *
 * ```
 * abc->abc
 * 123->'1'2'3
 * I won't->I won''t
 * ```
 */
class PerCharacterEscaper
/**
 * The first character in the string will be uses as the escape character, and all characters will
 * be escaped.
 */
private constructor(
		private val escapeCodePoint: Int,
		private val escapedCodePoints: IntArray,
		private val escapedByCodePoints: IntArray
) : Converter<String, String>() {
	fun needsEscaping(input: String): Boolean {
		return firstOffsetNeedingEscape(input) != -1
	}

	private fun firstOffsetNeedingEscape(input: String): Int {
		val length = input.length
		var firstOffsetNeedingEscape = -1
		var offset = 0
		outer@ while (offset < length) {
			val codepoint = input.codePointAt(offset)
			for (escaped in escapedCodePoints) {
				if (codepoint == escaped) {
					firstOffsetNeedingEscape = offset
					break@outer
				}
			}
			offset += Character.charCount(codepoint)
		}
		return firstOffsetNeedingEscape
	}

	fun escapeCodePoint(): Int {
		return escapeCodePoint
	}

	override fun doForward(input: String): String {
		val noEscapes = firstOffsetNeedingEscape(input)
		return if (noEscapes == -1) {
			input
		} else {
			val length = input.length
			val needsEscapes = length - noEscapes
			val builder = StringBuilder(noEscapes + 4 + needsEscapes * 5 / 4)
			builder.append(input, 0, noEscapes)
			var offset = noEscapes
			while (offset < length) {
				val codepoint = input.codePointAt(offset)
				offset += Character.charCount(codepoint)
				val idx = Ints.indexOf(escapedCodePoints, codepoint)
				if (idx == -1) {
					builder.appendCodePoint(codepoint)
				} else {
					builder.appendCodePoint(escapeCodePoint)
					builder.appendCodePoint(escapedByCodePoints[idx])
				}
			}
			builder.toString()
		}
	}

	private fun firstOffsetNeedingUnescape(input: String): Int {
		val length = input.length
		var firstOffsetNeedingEscape = -1
		var offset = 0
		while (offset < length) {
			val codepoint = input.codePointAt(offset)
			if (codepoint == escapeCodePoint) {
				firstOffsetNeedingEscape = offset
				break
			}
			offset += Character.charCount(codepoint)
		}
		return firstOffsetNeedingEscape
	}

	override fun doBackward(input: String): String {
		val noEscapes = firstOffsetNeedingUnescape(input)
		return if (noEscapes == -1) {
			input
		} else {
			val length = input.length
			val needsEscapes = length - noEscapes
			val builder = StringBuilder(noEscapes + 4 + needsEscapes * 5 / 4)
			builder.append(input, 0, noEscapes)
			var offset = noEscapes
			while (offset < length) {
				var codepoint = input.codePointAt(offset)
				offset += Character.charCount(codepoint)
				// if we need to escape something, escape it
				if (codepoint == escapeCodePoint) {
					if (offset < length) {
						codepoint = input.codePointAt(offset)
						val idx = Ints.indexOf(escapedByCodePoints, codepoint)
						if (idx != -1) {
							codepoint = escapedCodePoints[idx]
						}
						offset += Character.charCount(codepoint)
					} else {
						throw IllegalArgumentException(
								"Escape character '" +
										String(intArrayOf(escapeCodePoint), 0, 1) +
										"' can't be the last character in a string.")
					}
				}
				// we didn't escape it, append it raw
				builder.appendCodePoint(codepoint)
			}
			builder.toString()
		}
	}

	companion object {
		/**
		 * If your escape policy is "'123", it means this:
		 *
		 * ```
		 * abc->abc
		 * 123->'1'2'3
		 * I won't->I won''t
		 * ```
		 */
		@JvmStatic
		fun selfEscape(escapePolicy: String): PerCharacterEscaper {
			val escapedCodePoints = escapePolicy.codePoints().toArray()
			val escapeCodePoint = escapedCodePoints[0]
			return PerCharacterEscaper(escapeCodePoint, escapedCodePoints, escapedCodePoints)
		}

		/**
		 * If your escape policy is "'a1b2c3d", it means this:
		 *
		 * ```
		 * abc->abc
		 * 123->'b'c'd
		 * I won't->I won'at
		 * ```
		 */
		@JvmStatic
		fun specifiedEscape(escapePolicy: String): PerCharacterEscaper {
			val codePoints = escapePolicy.codePoints().toArray()
			Preconditions.checkArgument(codePoints.size % 2 == 0)
			val escapeCodePoint = codePoints[0]
			val escapedCodePoints = IntArray(codePoints.size / 2)
			val escapedByCodePoints = IntArray(codePoints.size / 2)
			for (i in escapedCodePoints.indices) {
				escapedCodePoints[i] = codePoints[2 * i]
				escapedByCodePoints[i] = codePoints[2 * i + 1]
			}
			return PerCharacterEscaper(escapeCodePoint, escapedCodePoints, escapedByCodePoints)
		}
	}
}

The text was updated successfully, but these errors were encountered:

nedtwigg · 2022-12-10T03:39:52Z

If you're curious for a test case, this is the commit that took me from failing to passing

equodev/equo-ide@aa20e03

jackmatt2 · 2022-12-10T10:53:08Z

Thanks. I believe this is a bug introduced via the 4.X release. Will look into it.

nedtwigg · 2022-12-10T15:19:13Z

Before I figured out the trim() workaround I tried going back to 3.4.0 just in case, but that was broken for me too.

equodev/equo-ide@073ed11

jackmatt2 · 2022-12-10T21:05:33Z

@nedtwigg are you developing on Windows and your CI is running Linux?

I feel this might be a line ending issue. Git checkout as \r\n (windows) and checkin as \n given it only fails on CI. Git does this for Windows machines.

nedtwigg · 2022-12-10T21:11:28Z

Nope, dev on mac, CI on Linux.

jackmatt2 · 2022-12-10T22:09:15Z

I can replicate this issue locally. It is indeed related to leading/trailing newline characters. It's due to the toStringSerializer appending the[\n and \n] to the start and end of the string. Those extra new line characters are not accounted for in the string replacement for three consecutive lines \n\n\n. So when the snapshot it split via the REGEX it starts splitting the actual snapshot which results in this error.

jackmatt2 · 2022-12-10T22:39:12Z

Fixed in 4.0.2

jackmatt2 · 2022-12-10T22:45:21Z

Regarding the lossless implementation - can you open a "Discussion" about it from the "Discussions" tab. I'm not too crash hot on the manipulation of the snapshot data either. I'm not sure how the above function will fix the issue because we still need an escape character (hence snapshot text will be modified)

jackmatt2 added the bug Something isn't working label Dec 10, 2022

jackmatt2 added a commit that referenced this issue Dec 10, 2022

#patch: Fixes #126 leading/trailing whitespace causing snapshot failures

31f510e

jackmatt2 closed this as completed in c477afd Dec 10, 2022

jackmatt2 mentioned this issue Dec 11, 2022

Perfect roundtrips on plaintext snapshots #130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected snapshot errors when snapshotting with lots of newlines #126

Unexpected snapshot errors when snapshotting with lots of newlines #126

nedtwigg commented Dec 10, 2022

nedtwigg commented Dec 10, 2022

jackmatt2 commented Dec 10, 2022

nedtwigg commented Dec 10, 2022

jackmatt2 commented Dec 10, 2022

nedtwigg commented Dec 10, 2022

jackmatt2 commented Dec 10, 2022 •

edited

Loading

jackmatt2 commented Dec 10, 2022

jackmatt2 commented Dec 10, 2022

Unexpected snapshot errors when snapshotting with lots of newlines #126

Unexpected snapshot errors when snapshotting with lots of newlines #126

Comments

nedtwigg commented Dec 10, 2022

nedtwigg commented Dec 10, 2022

jackmatt2 commented Dec 10, 2022

nedtwigg commented Dec 10, 2022

jackmatt2 commented Dec 10, 2022

nedtwigg commented Dec 10, 2022

jackmatt2 commented Dec 10, 2022 • edited Loading

jackmatt2 commented Dec 10, 2022

jackmatt2 commented Dec 10, 2022

jackmatt2 commented Dec 10, 2022 •

edited

Loading