Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected snapshot errors when snapshotting with lots of newlines #126

Closed
nedtwigg opened this issue Dec 10, 2022 · 8 comments
Closed

Unexpected snapshot errors when snapshotting with lots of newlines #126

nedtwigg opened this issue Dec 10, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@nedtwigg
Copy link
Contributor

Thanks so much for the library! I'm a big fan of snapshot testing, I've been baffled that java doesn't have one, and I think the high-level design of yours is fantastic! When I first tried it I got stuck for a while. Here was the behavior I was seeing

  • tests were passing locally
  • failing on CI with this error
  au.com.origin.snapshots.exceptions.LogGithubIssueException: Corrupt Snapshot (REGEX matches = 0): possibly due to manual editing or our REGEX failing
  Possible Solutions
  1. Ensure you have not accidentally manually edited the snapshot file!
  2. Compare the snapshot with GIT history
  
  *** This exception should never be thrown ***

I dug around and found this eventually

if (it.contains(SnapshotFile.SPLIT_STRING)) {
log.warn(
"Found 3 consecutive lines in your snapshot \\n\\n\\n. This sequence is reserved as the snapshot separator - replacing with \\n.\\n.\\n");
return it.replaceAll(SnapshotFile.SPLIT_STRING, "\n.\n.\n");
}

And it so happens that I was snapshotting input that started and ended with lots of newlines. So I tried adding a trim() before I passed my data to you and that fixed it.

IMO, it's a big deal to mangle the snapshot data - that's the most sacred part of a snapshot library! I think it's really important to pick a lossless encoding function. trim() works well enough for me for now so I'm not going to take the time to contribute a PR, but I can offer up this little hunk of code that I've used in several projects for handling problems of this sort (lossless roundtrip encoding of text in text). Feel free to use it or take a different approach :)

/**
 * If your escape policy is "'123", it means this:
 *
 * ```
 * abc->abc
 * 123->'1'2'3
 * I won't->I won''t
 * ```
 */
class PerCharacterEscaper
/**
 * The first character in the string will be uses as the escape character, and all characters will
 * be escaped.
 */
private constructor(
		private val escapeCodePoint: Int,
		private val escapedCodePoints: IntArray,
		private val escapedByCodePoints: IntArray
) : Converter<String, String>() {
	fun needsEscaping(input: String): Boolean {
		return firstOffsetNeedingEscape(input) != -1
	}

	private fun firstOffsetNeedingEscape(input: String): Int {
		val length = input.length
		var firstOffsetNeedingEscape = -1
		var offset = 0
		outer@ while (offset < length) {
			val codepoint = input.codePointAt(offset)
			for (escaped in escapedCodePoints) {
				if (codepoint == escaped) {
					firstOffsetNeedingEscape = offset
					break@outer
				}
			}
			offset += Character.charCount(codepoint)
		}
		return firstOffsetNeedingEscape
	}

	fun escapeCodePoint(): Int {
		return escapeCodePoint
	}

	override fun doForward(input: String): String {
		val noEscapes = firstOffsetNeedingEscape(input)
		return if (noEscapes == -1) {
			input
		} else {
			val length = input.length
			val needsEscapes = length - noEscapes
			val builder = StringBuilder(noEscapes + 4 + needsEscapes * 5 / 4)
			builder.append(input, 0, noEscapes)
			var offset = noEscapes
			while (offset < length) {
				val codepoint = input.codePointAt(offset)
				offset += Character.charCount(codepoint)
				val idx = Ints.indexOf(escapedCodePoints, codepoint)
				if (idx == -1) {
					builder.appendCodePoint(codepoint)
				} else {
					builder.appendCodePoint(escapeCodePoint)
					builder.appendCodePoint(escapedByCodePoints[idx])
				}
			}
			builder.toString()
		}
	}

	private fun firstOffsetNeedingUnescape(input: String): Int {
		val length = input.length
		var firstOffsetNeedingEscape = -1
		var offset = 0
		while (offset < length) {
			val codepoint = input.codePointAt(offset)
			if (codepoint == escapeCodePoint) {
				firstOffsetNeedingEscape = offset
				break
			}
			offset += Character.charCount(codepoint)
		}
		return firstOffsetNeedingEscape
	}

	override fun doBackward(input: String): String {
		val noEscapes = firstOffsetNeedingUnescape(input)
		return if (noEscapes == -1) {
			input
		} else {
			val length = input.length
			val needsEscapes = length - noEscapes
			val builder = StringBuilder(noEscapes + 4 + needsEscapes * 5 / 4)
			builder.append(input, 0, noEscapes)
			var offset = noEscapes
			while (offset < length) {
				var codepoint = input.codePointAt(offset)
				offset += Character.charCount(codepoint)
				// if we need to escape something, escape it
				if (codepoint == escapeCodePoint) {
					if (offset < length) {
						codepoint = input.codePointAt(offset)
						val idx = Ints.indexOf(escapedByCodePoints, codepoint)
						if (idx != -1) {
							codepoint = escapedCodePoints[idx]
						}
						offset += Character.charCount(codepoint)
					} else {
						throw IllegalArgumentException(
								"Escape character '" +
										String(intArrayOf(escapeCodePoint), 0, 1) +
										"' can't be the last character in a string.")
					}
				}
				// we didn't escape it, append it raw
				builder.appendCodePoint(codepoint)
			}
			builder.toString()
		}
	}

	companion object {
		/**
		 * If your escape policy is "'123", it means this:
		 *
		 * ```
		 * abc->abc
		 * 123->'1'2'3
		 * I won't->I won''t
		 * ```
		 */
		@JvmStatic
		fun selfEscape(escapePolicy: String): PerCharacterEscaper {
			val escapedCodePoints = escapePolicy.codePoints().toArray()
			val escapeCodePoint = escapedCodePoints[0]
			return PerCharacterEscaper(escapeCodePoint, escapedCodePoints, escapedCodePoints)
		}

		/**
		 * If your escape policy is "'a1b2c3d", it means this:
		 *
		 * ```
		 * abc->abc
		 * 123->'b'c'd
		 * I won't->I won'at
		 * ```
		 */
		@JvmStatic
		fun specifiedEscape(escapePolicy: String): PerCharacterEscaper {
			val codePoints = escapePolicy.codePoints().toArray()
			Preconditions.checkArgument(codePoints.size % 2 == 0)
			val escapeCodePoint = codePoints[0]
			val escapedCodePoints = IntArray(codePoints.size / 2)
			val escapedByCodePoints = IntArray(codePoints.size / 2)
			for (i in escapedCodePoints.indices) {
				escapedCodePoints[i] = codePoints[2 * i]
				escapedByCodePoints[i] = codePoints[2 * i + 1]
			}
			return PerCharacterEscaper(escapeCodePoint, escapedCodePoints, escapedByCodePoints)
		}
	}
}
@nedtwigg
Copy link
Contributor Author

If you're curious for a test case, this is the commit that took me from failing to passing

equodev/equo-ide@aa20e03

@jackmatt2 jackmatt2 added the bug Something isn't working label Dec 10, 2022
@jackmatt2
Copy link
Member

Thanks. I believe this is a bug introduced via the 4.X release. Will look into it.

@nedtwigg
Copy link
Contributor Author

Before I figured out the trim() workaround I tried going back to 3.4.0 just in case, but that was broken for me too.

equodev/equo-ide@073ed11

@jackmatt2
Copy link
Member

@nedtwigg are you developing on Windows and your CI is running Linux?

I feel this might be a line ending issue. Git checkout as \r\n (windows) and checkin as \n given it only fails on CI. Git does this for Windows machines.

@nedtwigg
Copy link
Contributor Author

Nope, dev on mac, CI on Linux.

@jackmatt2
Copy link
Member

jackmatt2 commented Dec 10, 2022

I can replicate this issue locally. It is indeed related to leading/trailing newline characters. It's due to the toStringSerializer appending the[\n and \n] to the start and end of the string. Those extra new line characters are not accounted for in the string replacement for three consecutive lines \n\n\n. So when the snapshot it split via the REGEX it starts splitting the actual snapshot which results in this error.

@jackmatt2
Copy link
Member

Fixed in 4.0.2

@jackmatt2
Copy link
Member

Regarding the lossless implementation - can you open a "Discussion" about it from the "Discussions" tab. I'm not too crash hot on the manipulation of the snapshot data either. I'm not sure how the above function will fix the issue because we still need an escape character (hence snapshot text will be modified)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants