Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of persistence with a DB or KV store #98

Open
viperfx opened this issue Sep 24, 2019 · 4 comments
Open

Example of persistence with a DB or KV store #98

viperfx opened this issue Sep 24, 2019 · 4 comments

Comments

@viperfx
Copy link

viperfx commented Sep 24, 2019

Hi there,

It would be great to see an example implementation of how to modify the tokenCache to store in a simple DB/Cache system such as Redis.

def add(self, event, **kwargs):
super(SerializableTokenCache, self).add(event, **kwargs)
self.has_state_changed = True
def modify(self, credential_type, old_entry, new_key_value_pairs=None):
super(SerializableTokenCache, self).modify(
credential_type, old_entry, new_key_value_pairs)
self.has_state_changed = True
def deserialize(self, state):
# type: (Optional[str]) -> None
"""Deserialize the cache from a state previously obtained by serialize()"""
with self._lock:
self._cache = json.loads(state) if state else {}
self.has_state_changed = False # reset
def serialize(self):
# type: () -> str
"""Serialize the current cache state into a string."""
with self._lock:
self.has_state_changed = False
return json.dumps(self._cache, indent=4)

For a DB such as Redis:

  • What methods are most important to modify?
  • What is the structure of the cache? What is the key/value to store?

I also have a couple of other high-level questions:

  • Is there a cache value for each account? Does it make sense to store it in a place related to that user?
  • Is the cache value one big encrypted string that needs to be stored a cache system such as Redis and has no direct relation to an account?

Thanks

@rayluo
Copy link
Collaborator

rayluo commented Sep 27, 2019

@viperfx Thank you for all these excellent questions!

I also have a couple of other high-level questions:

  • Is there a cache value for each account? Does it make sense to store it in a place related to that user?

Short answer: Yes, MSAL cache system internally maintains tokens-and-account relationship. But that "account" concept is probably different than what you think of "user", so you may not really need/want to split them into a per-account data structure/storage.

Long answer:

MSAL and its token cache were optimized for Public Client, such as a mobile app running on one end user's device. So the implication here is:

  • The total amount of tokens in one cache would be small. Probably within dozens. Or even less.
  • The cache still separates tokens by account. The account concept is about different identities belong to the same end user, such as his/her guest account in a different tenant. By the way, the get_accounts() API is designed for a front-end app to render a drop-down list for the same end user to select his/her own accounts.

Therefore, MSAL Python token cache system stores all tokens as a list of json objects, in memory. During cache look-up, MSAL Python will filter tokens by account.

Such setup works well for public client apps, such as Azure CLI az. But if you are building a web app, that won't scale. Therefore we recommend a "one cache per user" pattern. You as the app developer still treat the current instance of MSAL cache as an opaque blob, and you can store one such blob per a real user. One of the ways is to maintain one token cache instance (and one MSAL instance itself) per session. We demonstrate that in a newly published web app sample here.

  • Is the cache value one big encrypted string that needs to be stored a cache system such as Redis and has no direct relation to an account?

The MSAL cache value is one big blob, with specific internal structure which MSAL token cache logic relies on. It is not encrypted, but we provide basically only serialize() and deserialize() as the public API, so you are not expected to peek into it.

The one-cache-per-user approach we used the sample above, can be configured (via Flask-Session) to use Redis, Memcache, or MongoDB as actual storage system.

For a DB such as Redis:

  • What methods are most important to modify?
  • What is the structure of the cache? What is the key/value to store?

I guess now you do not need to look into the MSAL token cache internals, do you?
If you really want to refactor the cache data structure, you probably need to refactor this entire file.

@viperfx
Copy link
Author

viperfx commented Sep 27, 2019

Thanks for the answer. Let me explain my use cases further so hopefully, it will make sense why a one-cache-per-user approach is really needed. I currently have two immediate use cases that we have already prototyped and is working with this library but as you said, cache storage right now stored as one blog will not scale.

The use cases are the following:

  • Request User.Read to do SSO
  • Request Calendar.Read to sync vacations/away schedule

The first one used for sign-in can be accepted as a session-based storage. However, for calendar, we will be requesting a Delegated Token, and hoping to refresh the token to sync the Calendar without user input.

Let's say I have a long running app, and I have multiples users signing into the app and getting tokens and affecting the cache. I would prefer to have the cache value stored in a table or field related to the user. Or have a redis key related to the userID and value is the cache. So that when I am about to refresh the token for example, I fetch the cache for only that user based on their key.

Would you follow the approach in the example given my use case? Would appreciate your input.

@rayluo
Copy link
Collaborator

rayluo commented Sep 27, 2019

@viperfx
Yes I can understand your scenario. That kind of change is not in our current token cache design. We will revisit this at a later time to see whether we can retrofit that into the library itself. For now, I think you would probably try to grab the access_token, refresh_token, and id_token_claims returned by MSAL Python, and store them into your own DB, and go from there.

@arnoldknott
Copy link

For a DB such as Redis:

  • What methods are most important to modify?

I had success implementing the distributed cache in Redis like this

  • What is the structure of the cache? What is the key/value to store?

The init() method of TokenCache generates the keys here in self.key_makers The _add method shows the structure of the values in the cache

As @rayluo pointed out - there is no need to worry about the data structure in the cache. MSAL is taking care of this through it's API. The instantiation of the relevant client class allows passing the cache. This cache can take your personal storage preference into account, for example prefixing all keys with "msal:" in the get_location()method like this

    def get_location(self):
        """Returns the location in the cache"""
        location = f"msal:{self.user_account['homeAccountId']}"
        return location

Simliar data modifications can be applied in the save() and load() methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

5 participants