Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APE26: Removing data storage (representations) from coordinate frames #112

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

jeffjennings
Copy link

@jeffjennings jeffjennings commented Nov 4, 2024

Coordinate frames in astropy.coordinates currently store metadata used to construct the frame (i.e. for transforming between frames) and may also store coordinate data itself. This duplicates functionality with SkyCoord, which acts as a container for both coordinate data and reference frame information. We propose to change the frame classes such that they only store metadata and never coordinate data. This would make the implementation more modular, remove ambiguity for users from having nearly duplicate functionality with slightly different APIs, and better satisfy the principle of Separation of Concerns.

Authors of this APE (alphabetical): Jeff Jennings @jeffjennings, Adrian Price-Whelan @adrn, Nathaniel Starkman @nstarman, Marten van Kerkwijk @mhvk

@adrn
Copy link
Member

adrn commented Nov 4, 2024

Pinging @eerovaher @eteq @StuartLittlefair @Cadair @ayshih @taldcroft as other "stakeholders" who may want to take a look at this and review!

Copy link
Member

@eerovaher eerovaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am convinced that there should be exactly one type for coordinate data. This would prevent users from getting confused about which type they should be using and simplifies things for both astropy and downstream developers because we'll only have to support one type of coordinate data as input for our functions.

This leaves two possibilities: the single type is a class, which is what this APE proposes, or the single type is a Protocol, which is what Implement BaseCoordinateFrame.frame property (astropy/astropy#16356) (that the APE mentions) proposed. Implementing a Protocol would be backwards-compatible and wouldn't cause disruption for downstream packages. In every other regard this APE is a better idea.

@StuartLittlefair
Copy link

Yes, yes and a thousand times yes.

The devil will be in the implementation details but this is a thoroughly good idea.

@adrn
Copy link
Member

adrn commented Nov 26, 2024

Does anyone else have thoughts before we ask the @astropy/coordinators to review this? cc @eteq @Cadair @ayshih @taldcroft!

@eteq
Copy link
Member

eteq commented Dec 2, 2024

I have been traveling the last several weeks and haven't had a chance to review this in detail, but I will try to do so this week. I am fairly sure I will have Opinions ™️ (but hopefully positive ones?)

@adrn
Copy link
Member

adrn commented Dec 5, 2024

@eteq OK cool. It'd be good to get your feedback ASAP, because @jeffjennings is starting to work through some of the changes we discuss here (in draft stages, but still, it is somewhat in progress).

@Cadair
Copy link
Member

Cadair commented Dec 19, 2024

I have two main thoughts about some effects from this change, I think they can both be managed, but probably worth having a think about.

  1. The asdf serialisation is going to substantially different from now, managing that transition could be complex. - @braingram
  2. Many SunPy frames have an observer attribute which is a realised frame holding the position of the observer. This could obviously become a SkyCoord, but I worry that adds some more complexity, and will for a start make the repr even more obnoxious than it is currently.

@braingram
Copy link

1. The asdf serialisation is going to substantially different from now, managing that transition could be complex. - @braingram

Thanks for pinging me about this change. If I'm following the end result will be that coordinate frames will no longer have data.

Reading old files with data when coordinate frames no longer support data

At that point when a user opens an ASDF file with a serialized coordinate frame (which might contain data) the converter that handles the deserialization could see that the file contains data and generate a SkyCoord instead (possibly with a warning that the file should be updated).

Writing files when coordinate frames no longer support data

This should be fine. Technically we could keep the same schema and just never write data (it's not required) but it might be a good opportunity to include some other schema cleanup (at the expense of some schema churn since many schemas reference the base coordinate frame schema).

Implementation details

The details about how this is implemented may impact ASDF serialization. Specifically the first item under implementation:

  1. Splitting the frame classes into two hierarchies: ones with and without data.

The converter(s) would need to be updated to support these new classes. This isn't a problem but is work that wouldn't be needed if instead the existing classes were updated (deprecating and then removing the data feature).

  1. Deprecating the legacy with-data frame classes.

At this point asdf-astropy could be updated to not use the legacy classes.

Copy link
Member

@eteq eteq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I've left several specific inline comments. I also have several bigger picture concerns. Note that I am not dead-set against this - I get the motivation for this change and can see how it might be an improvement. But I want to be sure we address these concerns before going down this road:

  • This is a major change in communicating the purpose of SkyCoord. Previously (i.e. in APE5) SkyCoord is highlighted as a "convenience class" - primarily this is 1) it has a much smarter and easier to use initializer, and 2) it carries the frame attributes along with the data, regardless of whether that frame attribute is in the current frame. This makes subtly different behavior for certain transformations depending on whether you give the to frame the same frame attributes or not. I don't really understand how this proposal will change that - i.e., if there are no frame classes with data, how do you say "do the transform that only uses the frame attributes in the to-frame, but doesn't carry them over from the from-frame"? I can thinks of ways to do this, but this was one of the prime reasons for data and frames being tied together in the first place, to keep that case simpler and not dependent on SkyCoord.
  • We should get someone in Sunpy look at this before proceeding. hopefully this won't be a big change for them since I think mostly they define their own frames and transforms, which should be backwards compatible with these changes. But it may be they use the low-level classes directly more often and hence this would be a huge negative impact on them.
  • Relatedly, I'm concerned about the number of examples out in the wild that use the low-level classes directly. There are various use cases where we have encouraged people to use the low-level classes with data because they are much cheaper for initialization. If we are going to remove that option they will be impacted whenever the removal finally happens.
  • I think it's absolutely critical we highlight the change in philosophy from what APE5 said. This sub-package is sufficiently complex that documentation (either in APEs or otherwise) are critical to helping people understand the big-picture. This might be as simple as just making sure https://docs.astropy.org/en/stable/coordinates/index.html#astropy-coordinates-overview and related documentation spots are up-to-date, but still, that should be explicitly in the implementation plan IMHO since it's a non-trivial effort to update all the docs for this.

:type: Standard Track
:status: Discussion

Abstract
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly speaking, this is missing any mention of APE5 and how this APE modifies it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - I've added text to give the context of APE 5, and to motivate what we propose changing and why, in 46a6d82.

APE26.rst Outdated
Comment on lines 38 to 39
The above reveals a problematic oddity with the coordinate frame implementation – an
insufficient `separation of concerns <https://en.wikipedia.org/wiki/Separation_of_concerns>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the wording is a bit misleading here. This was not a "problematic oddity" but rather a conscious choice. See APE5. It's reasonable to say we have now learned enough to change this, but it wasn't just by accident, it was a concious design choice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point - I've reworded the text to reflect this in 46a6d82.

APE26.rst Outdated
different usage modes (with and without data), each exhibiting different behavior. For
instance, some methods such as ``separation`` work fine with frames with data, while
doing this on coordinate frames without data results in an error. This kind of
multi-modal structure is considered an anti-pattern because it forces any code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as with above I think "is considered an anti-pattern" is a bit strong. It makes code working with frame classes be in one place but explicitly have to check for the presence or absence of data. Which for someone trying to understand the code might be a benefit.

I think that this is really a tradeoff between two use cases:

  1. Someone trying to understand coordinates-related functionality (not just astropy maintainers - this includes e.g. sunpy folks)
  2. An astropy maintainer who is really groking the layout of the package, how this stuff all works

I think this change benefits 2 over 1. Which doesn't mean we shouldn't do it, but it leaves out the consequence that it makes the bigger picture harder to understand. If we want to go ahead despite that, well, we can, but we need to recognize that choice is being made and be explicit about it here.

(I do, however, agree with the points immediately below that the end-user is the more important thing - this is more like "new maintainer" vs "old/more knowledgable maintainer")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Erik - I've reworked the text to reflect this in 46a6d82 (including explicitly noting the design choice we propose that departs from APE 5, as you suggested).

APE26.rst Outdated
Comment on lines 86 to 87
overlap leads to confusion where beginner users end up creating ``BaseCoordinateFrame``
objects such as ``ICRS``, when the docs are clear that these are for more advanced users
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is an end-user gain: it's a feature not a bug that SkyCoord and ICRS both can be instantiated with data. SkyCoord is supposed to be a convenience class, with ICRS the simpler underlying layer - that's the whole principle here. I don't really understand how the proposed change makes things any clearer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how this is a feature, but I think the improved user clarity is motivated earlier on in the text - currently, a user can initialize an ICRS object or a SkyCoord object, which "represent the same thing, but return different objects with similar but not identical APIs". The proposed change gives only way to have an object hold both data and frame information - SkyCoord - preventing user confusion about subtle differences between SkyCoord and a with-data frame like the current ICRS (including which one to use, and why their API differs in perhaps unexpected ways).

APE26.rst Outdated
Comment on lines 158 to 159
2. Switching ``SkyCoord`` to use the data-less frame classes, and enabling automatic
conversion of the with-data frames into ``SkyCoord`` objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused how this doesn't create a huge mess: does this mean all of the coordinate conversion process during the transition phase is duplicated between the "with data" classes and the "new SkyCoord" classes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There wouldn't be duplication because the coordination conversion machinery would be in the new (data-less) frame classes, and the legacy (with-data) frame classes would subclass the new (data-less) classes.

jeffjennings and others added 2 commits January 29, 2025 10:27
@jeffjennings
Copy link
Author

jeffjennings commented Feb 7, 2025

Hi @ayshih and @Cadair, we're starting to finalize this APE and are wondering if you anticipate any major concerns with respect to SunPy (as well as any suggestions you have, e.g. for addressing Stuart's point 2 above). The first phase of the APE wouldn't introduce any breaking changes, the second phase (timeline flexible) would add deprecation warnings, and the final phase (several major astropy versions away) would require changes to SunPy. Thanks!

Integrate text on APE 5 and departures from it
@pllim
Copy link
Member

pllim commented Feb 27, 2025

Thanks! Any thoughts on performance impact?

@nstarman
Copy link
Member

There should be no performance impact on SkyCoord. Small performance improvement on byte representations e.g. file schemas related to the new frame classes since they don't carry data. Memory improvements for the new Coordinate class since it doesn't do cacheing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants