Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-supplied credential callback #234

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from
Draft

Conversation

kylebarron
Copy link
Member

@kylebarron kylebarron commented Feb 6, 2025

There's a myriad number of ways to handle credentials for each of these stores, and I don't want to be implementing every last one of these. Luckily, object_store allows for external credential providers, and we can allow users to implement their own totally custom authentication in Python!

This is a proof of concept that is tested as working with both a synchronous or asynchronous credential provider! Here are a couple examples:

def credential_provider() -> S3Credential:
    session = boto3.Session(aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, aws_session_token=aws_session_token)
    credentials = session.get_credentials().get_frozen_credentials()
    return {
        "access_key_id": credentials.access_key,
        "secret_access_key": credentials.secret_key,
        "token": credentials.token,
        "timeout": datetime.now() + timedelta(days=1000)
    }

store = S3Store("ds-wheels", credential_provider=credential_provider)
test = obs.list(store).collect()
async def credential_provider() -> S3Credential:
    session = boto3.Session(aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, aws_session_token=aws_session_token)
    credentials = session.get_credentials().get_frozen_credentials()
    return {
        "access_key_id": credentials.access_key,
        "secret_access_key": credentials.secret_key,
        "token": credentials.token,
        "timeout": datetime.now() + timedelta(days=1000)
    }

store = S3Store("ds-wheels", credential_provider=credential_provider)
await obs.list(store).collect_async()

Notes:

  • Only try to refresh credentials (i.e. call the Python callback) when necessary. This includes handling the timeout datetime correctly.
  • Add credential_provider to obstore.store.from_url
  • require tz set on expiry time
  • Passing an async callback will hang if you use sync obstore APIs. You need to use async obstore APIs if you pass in an async callback. Is there a way for us to validate and error if the user passes in an async callback to a sync function? So we don't hang?
  • Example with at least S3, maybe aws sts, and maybe azure (planetary computer?)
  • Use something like upstream's TokenCache
  • Update pickle "advanced" doc for whether this will work in pickle. It's ok for this not to work in pickle for now.
  • Expose from obstore.store.google.auth instead of obstore.google.auth?
  • Move all auth implementations to obstore.auth, which is a module we can keep expanding. So obstore.auth.earthdata, obstore.auth.google, obstore.auth.boto3,
  • Create abstract base classes for auth callback?
    • Thinking of not having base classes to reinforce that protocols, not inheritance, is the intended usage. And type checkers enforce correctness of input.
    • These base classes should have some way to "declare" which store they're compatible with, as a simple validation when a class is passed into the provider callback
  • Update docs for "simple" and "complex" auth providers. Simple providers are just function callbacks; complex providers are class-based.
  • Naming: AuthProvider vs CredentialProvider? Shorter might be better.
  • Ability to pass config down from credential provider class, in the case of like boto3 which infers a default region.

Closes #232, closes #269

@kylebarron kylebarron added this to the 0.5.0 milestone Feb 7, 2025
@kylebarron
Copy link
Member Author

NASA earthdata also has a time limit on its S3 credentials. cc @chuckwondo maybe we can figure out a good example there (although those credentials are only valid for same-region requests IIRC)

@chuckwondo
Copy link

NASA earthdata also has a time limit on its S3 credentials. cc @chuckwondo maybe we can figure out a good example there (although those credentials are only valid for same-region requests IIRC)

Sure, let me pull from something I experimented with a while back.

@kylebarron
Copy link
Member Author

kylebarron commented Feb 14, 2025

@chuckwondo in 4d45b54 (#234) I added an example of a custom AWS credential provider for NASA Earthdata (which will automatically refresh credentials each hour, 5 minutes before they expire), drawing from @abarciauskas-bgse 's code here, which I think comes from here.

I created #271 , where we can discuss Earthdata specifically more.

Comment on lines +114 to +117
"""Request updated credentials."""
resp = self.session.get(CREDENTIALS_API, allow_redirects=True, timeout=15)
auth_resp = self.session.get(resp.url, allow_redirects=True, timeout=15)
creds = auth_resp.json()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be using this as an example for 2 reasons:

  1. The session is never closed
  2. It is insecure because you're leaking EDL creds to the CREDENTIALS_API. The EDL creds should be supplied only to the URS (Earthdata Login) URL.

Although your original implementation is more verbose, it is secure, and does not leave dangling resources. (There's no strong motivation for using a session since auth occurs no more than once an hour, or more likely, only once it total for most use cases.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original implementation didn't work, see #271 (comment). I'm not sure why it didn't work; the current implementation was suggested by Aimee in #271, and appears to be what the earthaccess Python package is using.

Using a session seemed to make it easier to automatically manage cookies between the two requests, but it's not a requirement for me. In the async case I add a close() method to the credential provider and document that it should be called after all obstore use has finished.

Perhaps this isn't a great use for an example, and we should find something simpler for the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Custom google auth via google-auth optional auth callback for each store
2 participants