Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 🎸 Use snapshots or OCR for E2E tests when comparing terminal #2639

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ZedLi
Copy link
Collaborator

@ZedLi ZedLi commented Dec 21, 2024

Description

When reviewing this PR, please take a look at both commits and test both. The goal of this PR is to resolve this TODO as sometimes we want to interact with our integrated terminal. The main problem is that the terminal is built using a canvas element which makes it difficult to test with playwright as it's basically an image with no DOM elements to interact with. The particular problem we're trying to solve in this PR is knowing when the desktop client properly finishes making the SSH connection and is "done" when using the integrated terminal.

The first commit adds a golden screenshot to compare to what we expect a properly connected SSH session to look like. This of course can be quite brittle due to many factors (size of window, if the EC2 instance has new updates, timestamps, etc) but allowing a little bit of difference seems to work still. It's hard to say how brittle it is as the target machines can change so I'm curious if it works well for others. I had to use poll as it turns out the documentation is a bit misleading and toHaveScreenshot doesn't actually wait until they match and just compares screenshots after they have been stabilized, which is usually before the SSH connection finishes.

The second commit adds tesseract.js and takes a different approach by using an OCR package to interpret the screenshot of the canvas element that holds our terminal. I use a phrase that I expect to see every time we connect to an EC2 instance and have it keep checking to see if it appears. The main downside is the setup of the package adds a decent amount of time (~5s on my mac but I set it up so it only runs once per worker and should be reused) and a few more seconds to actually recognize the image in the test. In theory this should be less brittle but may also be unnecessary as it's decently slower.

How to Test

Check out each commit and try running yarn desktop or yarn desktop:dev

Checklist

  • [ ] I have added before and after screenshots for UI changes
  • [ ] I have added JSON response output for API changes
  • [ ] I have added steps to reproduce and test for bug fixes in the description
  • I have commented on my code, particularly in hard-to-understand areas
  • [ ] My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works

@ZedLi ZedLi self-assigned this Dec 21, 2024
@ZedLi ZedLi requested a review from a team as a code owner December 21, 2024 00:29
Copy link

vercel bot commented Dec 21, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
boundary-ui ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 21, 2025 11:57pm
boundary-ui-desktop ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 21, 2025 11:57pm

Copy link
Collaborator

@moduli moduli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some points of consideration

  • Terminal output may look different if using docker-based infrastructure vs. AWS-based infrastructure. Not sure if there are any test cases that need docker.
  • If using snapshots, it's possible that some people's personal shells may affect terminal output

@ZedLi
Copy link
Collaborator Author

ZedLi commented Dec 26, 2024

Terminal output may look different if using docker-based infrastructure vs. AWS-based infrastructure. Not sure if there are any test cases that need docker.

I haven't used docker for anything yet in DC tests but my assumption was we would only be testing against targets that were spun up somewhere which I assumed would be EC2 instances when using SSH targets. What do you generally connect to for the docker based tests, localhost?

If using snapshots, it's possible that some people's personal shells may affect terminal output

Yup, the assumption was that the output from logging into an EC2 instance would scroll past it, which may not be true in all cases.

@moduli
Copy link
Collaborator

moduli commented Jan 6, 2025

Terminal output may look different if using docker-based infrastructure vs. AWS-based infrastructure. Not sure if there are any test cases that need docker.

I haven't used docker for anything yet in DC tests but my assumption was we would only be testing against targets that were spun up somewhere which I assumed would be EC2 instances when using SSH targets. What do you generally connect to for the docker based tests, localhost?

If using snapshots, it's possible that some people's personal shells may affect terminal output

Yup, the assumption was that the output from logging into an EC2 instance would scroll past it, which may not be true in all cases.

On docker-based tests, we spin up a separate container to use as a target (https://hub.docker.com/r/linuxserver/openssh-server).

@laurenolivia
Copy link
Collaborator

@ZedLi interesting PR. Upon reading the PR description I'm wondering if a short video or screenshots would help review this one? I'm curious about the discovery that led to this work, is there a jira ticket for more context?

@ZedLi
Copy link
Collaborator Author

ZedLi commented Jan 7, 2025

@ZedLi interesting PR. Upon reading the PR description I'm wondering if a short video or screenshots would help review this one? I'm curious about the discovery that led to this work, is there a jira ticket for more context?

I'm not sure video or screenshots would help as it's just the test running and waiting properly so it looks the same between them. The context is really just solving the issue in this TODO, I'll add a little bit more context in the description though!

@laurenolivia
Copy link
Collaborator

@ZedLi Thanks for adding context to the description.

RE: I had to use poll as it turns out the documentation is a bit misleading...

Can you link the docs you are referencing?

@@ -7,7 +7,13 @@ import { mergeTests } from '@playwright/test';
import { electronTest } from './electronTest.js';
import { authenticateTest } from './authenticateTest.js';
import { test as envTest } from '../../global-setup.js';
import { tesseractTest } from './tesseractTest.js';

// The last item in the list will override any previous ones in a fixture name collision
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting work.

RE: override any previous ones

To clarify this commented line, any variables that could possibly exist in a test suite (e.g. envTest), but also exist in tesseractTest(being last) will be overridden?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so if there's any use variables from these fixtures that have the same name, the one that appears later in this list will end up taking priority and override any previous ones.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks. I wonder if that comment could be a bit clearer? Wdyt?

Copy link
Collaborator Author

@ZedLi ZedLi Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think it should say? This was more of a general comment to remind people how merging works if for some reason you are overriding

Copy link
Collaborator

@laurenolivia laurenolivia Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read a bit of the playwright docs to get more context, thanks for sharing those. I'd prob comment something like:

// Please be aware that defining like fixtures across different test envs 
// in Playwright can cause naming collisions
// To solve for this, any like declarations that exist in the last test in this list item
// will override all declarations, if those like declarations exist in previous test envs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking suggestion, though. Good work 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this comment to be a bit more descriptive to explain the merging behavior

@ZedLi
Copy link
Collaborator Author

ZedLi commented Jan 8, 2025

Can you link the docs you are referencing?

I was just referencing the playwright docs for toHaveScreenshot where they mention:

This function will wait until two consecutive page screenshots yield the same result, and then compare the last screenshot with the expectation.

which is not quite accurate.

calcaide
calcaide previously approved these changes Jan 8, 2025
Copy link
Collaborator

@calcaide calcaide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work!!!

I do personally don't like to rely on screenshots for testing, but as you already mention and explain, there isn't much of an option for this specific case.

await use(worker);
await worker.terminate();
},
{ scope: 'worker' },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this scope being used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the doc for worker scopes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants