Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Jazzy/Rolling with F2C v1.2.1 #82

Merged
merged 25 commits into from
Dec 2, 2024

Conversation

adivardi
Copy link
Contributor

@adivardi adivardi commented Nov 21, 2024

This PR updates the code to build and run on Jazzy/Rolling, but still with Fields2Cover v1.2.1.
It requires F2C with branch v1.2.1-devel` to work (https://github.com/Fields2Cover/Fields2Cover/tree/v1.2.1-devel)

The biggest change is the new BehaviorTree version, plus a few fixes for compiler errors.
I also updated the coverage_demo_launch.py and row_coverage_demo_launch to work with the new Gazebo and with nav2_minimal_tb3_sim.

It also updates the CI workflows to base on Ubuntu Noble and ROS Jazzy.

Also moved the initial robot pose and polygon to (0,0). In GZ Ignition, tb3 publishes its ground truth pose as odometry with respect to its start position. Even when spawning the model at (5,5), the map->base_link transform is 0 because the robot did not drive. This causes navigation to fail as the robot pose is too far from the start of the path.

Installation instructions: There is no need to manually build F2C (as in #58 ). Simply cloning it into the workspace with the correct branch and building everything with colcon works.

In GZ Ignition, tb3 publishes its ground truth pose as odometry with respect to its start position. Even when spawning the model at (5,5), the map->base_link transform is 0 because the robot did not drive.
This causes navigation to fail as the robot pose is too far from the start of the path.
@SteveMacenski
Copy link
Member

Also moved the initial robot pose and polygon to (0,0). In GZ Ignition, tb3 publishes its ground truth pose as odometry with respect to its start position. Even when spawning the model at (5,5), the map->base_link transform is 0 because the robot did not drive. This causes navigation to fail as the robot pose is too far from the start of the path.

But the pose is spawned at 5,5 - so it should be fine? I guess its not clear to me why this would need to have been changed given the demo's worked to-date

TODO: update row_ launch as well

Will wait for that :-)

@SteveMacenski
Copy link
Member

Update https://github.com/open-navigation/opennav_coverage/blob/main/.github/workflows/test.yml#L31 to v4 to get CI to run (also, check the linter)

@adivardi
Copy link
Contributor Author

But the pose is spawned at 5,5 - so it should be fine? I guess its not clear to me why this would need to have been changed given the demo's worked to-date

I think that in Gazebo Classic the odom plugin was behaving somewhat differently. With the current Gazebo, at least for me without this change, the robot is spawned at (5,5), but on rviz is it shown at (0,0). map, odom and base_link are all at (0,0).

So missing:

@SteveMacenski
Copy link
Member

(add installation instructions?)

I don't think you should need to manually go into the workspace to build F2C, it should work with colcon build in you workspace with the nav2 coverage servers -- that's how I developed this at least.

I'm just looking for you to put in explicitly that the project branch supports v1.2.1, since this has come up a few times as the maintain made major changes to v2 which we've been struggling to reliably run our tests over

Copied from ros-navigation/navigation2
and add (5,5) to map->odom tf as it is the transform from world/map frame to the robot's initial pose
@adivardi
Copy link
Contributor Author

I don't think you should need to manually go into the workspace to build F2C, it should work with colcon build in you workspace with the nav2 coverage servers -- that's how I developed this at least.

Oh you are right. I think I never tried it since it didn't build directly on rolling and then I was following the #58 instructions. I added a note on that.

Copy link
Member

@SteveMacenski SteveMacenski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some weird compilation errors in CI. Maybe we need to update to BT.CPP in Jazzy (so upgrade our CI in L19 of test.yml github workflow to be jazzy instead of iron? Check the CI job, but either way moving up to jazzy would be good since iron is EOL (and Jazzy wasn't around when we developed this CI).

Do the tests and such work for you locally?

Minus this quirk, I approve of the rest :-)

@adivardi
Copy link
Contributor Author

I support Jazzy as the main base. It would also be great to have this release to Jazzy.

I changed to Jazzy and the F2C branch, now it build fine but 2 tests are failing.

Locally I have 1 test fails with timeout:

    Start 8: test_compute_coverage_path

8: Test command: /usr/bin/python3 "-u" "/opt/ros/jazzy/share/ament_cmake_test/cmake/run_test.py" "/opt/enway/ros2_ws/build/opennav_coverage_bt/test_results/opennav_coverage_bt/test_compute_coverage_path.gtest.xml" "--package-name" "opennav_coverage_bt" "--output-file" "/opt/enway/ros2_ws/build/opennav_coverage_bt/ament_cmake_gtest/test_compute_coverage_path.txt" "--command" "/opt/enway/ros2_ws/build/opennav_coverage_bt/test/test_compute_coverage_path" "--gtest_output=xml:/opt/enway/ros2_ws/build/opennav_coverage_bt/test_results/opennav_coverage_bt/test_compute_coverage_path.gtest.xml"
8: Working Directory: /opt/enway/ros2_ws/build/opennav_coverage_bt/test
8: Test timeout computed to be: 60
8: -- run_test.py: invoking following command in '/opt/enway/ros2_ws/build/opennav_coverage_bt/test':
8:  - /opt/enway/ros2_ws/build/opennav_coverage_bt/test/test_compute_coverage_path --gtest_output=xml:/opt/enway/ros2_ws/build/opennav_coverage_bt/test_results/opennav_coverage_bt/test_compute_coverage_path.gtest.xml
8: [==========] Running 1 test from 1 test suite.
8: [----------] Global test environment set-up.
8: [----------] 1 test from ComputeCoveragePathActionTestFixture
8: [ RUN      ] ComputeCoveragePathActionTestFixture.test_tick
8: [       OK ] ComputeCoveragePathActionTestFixture.test_tick (1 ms)
8: [----------] 1 test from ComputeCoveragePathActionTestFixture (2 ms total)
8: 
8: [----------] Global test environment tear-down
8: [==========] 1 test from 1 test suite ran. (4 ms total)
8: [  PASSED  ] 1 test.
8/8 Test #8: test_compute_coverage_path .......***Timeout  60.02 sec

I don't understand where the coverage_server is launched for the test, and how come the test xml in here does not set any polygons.

@adivardi adivardi marked this pull request as ready for review November 26, 2024 12:40
@SteveMacenski
Copy link
Member

SteveMacenski commented Nov 26, 2024

now it build fine but 2 tests are failing.

Its actually 1 test testServerTransactions, the output is just repeated twice for 2 different reasons. This looks like the issue

ROR] [1732619322.500486728] [my_node.rclcpp_action]: unknown goal response, ignoring...

Or possibly the

  [INFO] [1732619322.524461239] [coverage_server]: Generating coverage path in EPSG:4258 frame for zone with 13 outer nodes and 0 inner polygons.       -- run_test.py: return code -11

Whereas its segfaulting somewhere mid-execution with this update (possibly due to F2C issues). What if we use v1.2.1 without the -devel? I've recently been in a company office that they tried that out and worked fine. The unknown goal response could be unrelated just due to the test structures (or not)

Testing that now...

[Update] Didn't compile since F2C doesn't pass some of our build warnings that are escalated to errors. If you remove those build flags, does it pass then? I can't do that for you in this PR unfortunately since you didn't touch a line near it to test

.github/deps.repos Outdated Show resolved Hide resolved
@adivardi
Copy link
Contributor Author

adivardi commented Nov 27, 2024

v1.2.1 doesn't build on my laptop because of this errors. The v1.2.1-devel just fixes those, so it would be a bit strange if that was causing the issue, but can try that.

Edit: F2C itself is not building. I would need to set -Wno-error=pessimizing-move & -Wno-error=deprecated-declarations to it directly. I tried adding it as colcon defaults to this repo test, but I still getting the same error.

Edit 2: I tried replicating the test_server as a demo node in opennav_coverage_demo, and I get

[component_container_isolated-8] [INFO] [1732708425.611401273] [coverage_server]: Generating coverage path in EPSG:4258 frame for zone with 13 outer nodes and 0 inner polygons.
[component_container_isolated-8] [ERROR] [1732708425.611734145] [coverage_server]: Internal Fields2Cover error: Geometry does not contain point 0
[component_container_isolated-8] [WARN] [1732708425.611806227] [coverage_server]: [compute_coverage_path] [ActionServer] Aborting handle.

Following this thread I found: Fields2Cover/Fields2Cover#73 (comment)
It seems you have had similar issues with the EPSG:4258 polygons before.
If I use cartesian_test_field.xml instead of test_field.xml it works as a demo node. let's see on CI

@SteveMacenski
Copy link
Member

Seems to mostly turn over - though perhaps the file replacement makes tinyxml unhappy for some reason?

  [0.127s] WARNING:colcon.colcon_test_result.test_result.xunit:Skipping 'build/fields2cover/_deps/tinyxml2-src/resources/empty.xml': no element found: line 1, column 0
  [0.127s] WARNING:colcon.colcon_test_result.test_result.xunit:Skipping 'build/fields2cover/_deps/tinyxml2-src/resources/xmltest-4636783552757760.xml': not well-formed (invalid token): line 1, column 0
  [0.128s] WARNING:colcon.colcon_test_result.test_result.xunit:Skipping 'build/fields2cover/_deps/tinyxml2-src/resources/xmltest-5330.xml': syntax error: line 1, column 0
  [0.128s] WARNING:colcon.colcon_test_result.test_result.xunit:Skipping 'build/fields2cover/_deps/tinyxml2-src/resources/xmltest-5662204197076992.xml': not well-formed (invalid token): line 1, column 2
  [0.128s] WARNING:colcon.colcon_test_result.test_result.xunit:Skipping 'build/fields2cover/_deps/tinyxml2-src/resources/xmltest-5720541257269248.xml': not well-formed (invalid token): line 1, column 8
  [0.128s] WARNING:colcon.colcon_test_result.test_result.xunit:Skipping 'build/fields2cover/_deps/tinyxml2-src/resources/out/compact.xml': no element found: line 1, column 0
  [0.132s] WARNING:colcon.colcon_test_result.test_result.xunit:Skipping 'build/fields2cover/_deps/tinyxml2-src/resources/out/test7.xml': syntax error: line 3, column 0

@adivardi
Copy link
Contributor Author

adivardi commented Nov 28, 2024

So the error code -11 is only gone now that I switched away from the EPSG:4258 coordinates file.
Maybe it still is still similar to Fields2Cover/Fields2Cover#73 (comment)

However the test in opennav_coverage_bt is still failing. I can't find errors that really look related
I will look into the tinxml warning

@SteveMacenski
Copy link
Member

Its failing with:

      /__w/opennav_coverage/opennav_coverage/ros_ws/src/iphd6oteara/opennav_coverage/opennav_coverage/test/test_server.cpp:119: Failure
       Expected equality of these values:
        result.code
          Which is: 1-byte object <06>
        rclcpp_action::ResultCode::SUCCEEDED
          Which is: 1-byte object <04>

I dont' think this is a timeout, but a problematic response code due to a failure. We see a log right before it that points to the issue at hand:

  Error: ROR] [1732792205.749924797] [coverage_server]: Invalid GML File or Coordinates: Coordinate system not recognized

@adivardi
Copy link
Contributor Author

adivardi commented Dec 2, 2024

OK, at this point the tests are passing, except test_compute_coverage_path.cpp which I disabled.

  • Using cartesian_test_field.xml gave the error Invalid GML File or Coordinates: Coordinate system not recognized because the server was configured with cartesian_frame_ = true;. Setting it to true fixed the error (this is this run)
  • switching back to test_field.xml passes now (this run). I am not exactly sure why it was failing before...

Without the shutdown: The tests are passing in `RUN_ALL_TESTS()`, the test itself is failing with timeout. Most probably because `server_thread.join()` hangs forever.
Copy link
Contributor Author

@adivardi adivardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, the tests are passing :)

The last test was a timeout in test_compute_coverage_path, due to the server_thread hanging forever when trying to rejoin. un-commenting the rclcpp::shutdown() fixed that

Copy link
Member

@SteveMacenski SteveMacenski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks so much for the hard work and patience to figure out the CI issues -- Merging now 🥳

@SteveMacenski SteveMacenski merged commit 1942505 into open-navigation:main Dec 2, 2024
7 checks passed
@adivardi
Copy link
Contributor Author

adivardi commented Dec 3, 2024

@SteveMacenski Thanks!
Could you release this on Jazzy? right now there is no ros-jazzy-opennav-coverage

@adivardi adivardi deleted the av/jazzy branch December 3, 2024 09:09
@SteveMacenski
Copy link
Member

Unfortunately due to the API instability in F2C and the binaries not necessarily working (perhaps fixed now?), I cannot release this to the build farm without it breaking regularly :( This is the main reason its still in opennav_coverage instead of integrated into Nav2 directly (i.e. nav2_coverage).

@adivardi
Copy link
Contributor Author

adivardi commented Dec 3, 2024

But it is released to iron and humble, is there a difference? or are those also breaking regularly?

Edit: or is it not? I was sure I saw apt packages before but I can't find them at the moment

@SteveMacenski
Copy link
Member

SteveMacenski commented Dec 3, 2024

It is not released in any distribution https://build.ros2.org/search/?q=opennav_coverage&Jenkins-Crumb=c6c6671c175be7303334d165e6229c1395275a64b34e47dfb074fbb571abfe7d. I don't believe it has been ever -- I did try though at one point but the API stability / F2C binaries kept causing problems.

I could probably do it if we did a package vendorization of F2C where we pin it to a specific git hash on v1.2.1 so that new modifications don't break the build -- that is an option

Oh, do you mean a Jazzy branch? I can do that :-)

@adivardi
Copy link
Contributor Author

adivardi commented Dec 4, 2024

I guess I misremembered

For now we have to fix F2C to the v1.2.1-devel branch anyway, at least until #53 get some progress. So I think that's fine and probably a good practice anyway.
I think it would be great to not be forced to build opennav_coverage from source, though understand if this is not the highest as it builds well with colcon.

@SteveMacenski
Copy link
Member

SteveMacenski commented Dec 4, 2024

For now we can't release binaries since things aren't necessarily stable, but we could instead vendorize (Example: https://github.com/SteveMacenski/spatio_temporal_voxel_layer) F2C at a static git commit hash so that new changes don't break v1.2.1 for our needs. That is not the preferred solution, but technically possible. We could have a vendorized F2C package in this repo that then the opennav_coverage relies on (instead of F2C directly) and I could actually release that which puts the control over updates on us in this repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants