Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ACL SETUSER Thread Handling #989

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

kevinmichaelbowersox
Copy link
Member

@kevinmichaelbowersox kevinmichaelbowersox commented Feb 1, 2025

This pull requests fixes #988. Please see the detailed discussion provided in the issue for a deeper discussion of the problem this PR addresses.

Quick Problem Overview

The ACL SETUSER command may produce inconsistent results when multiple clients/threads attempt to concurrently modify the same user. Please find test cases in this PR that demonstrate the concurrency issues this PR addresses.

Solution Overview

This PR addresses the issue by implementing the following changes:

  • Establishing a table of lock objects in the AccessControlList that can be used for taking granular level locks on a per user basis. This locking strategy should be used sparingly, possibly only for modifications to users via ACL SETUSER.
  • In the ACL SETUSER implementation, a user level lock is taken prior to a read, copy, write sequence performed in the critical section of the lock. This is necessary to prevent one client from reading the user object from the AccessControlList while another client is in the process of modifying it during this sequence. A copy constructor was added to User to protect against external modification while performing this sequence.
  • An observer pattern is implemented for subscribers interested in being notified of when ACL modifications are performed on the AccessControlList. Sessions subscribe to these notifications at creation time, giving them awareness of changes to the ACL for the user in the current session and the ability to refresh the ACL for the respective user. With this capability, ACL modifications are still applied in real time when users are modified.
  • A flag was added to the session to signal staleness of the ACL. When the flag is raised, the session will refresh the ACL for the user before the next command execution.

Testing and Verification

  • Added unit tests that repro the situation prior to making changes and verified existence of the issue.
  • Performed local testing manually to verify expected results used in test cases.
  • Verified new and existing tests passed once fix was implemented.

@kevinmichaelbowersox kevinmichaelbowersox marked this pull request as ready for review February 1, 2025 21:41
/// Registers a subscriber to receive notifications when modifications are performed to the <see cref="AccessControlList"/>.
/// </summary>
/// <param name="subscriber">The <see cref="IAccessControlListSubscriber"/> to register.</param>
internal void Subscribe(IAccessControlListSubscriber subscriber)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any way subscriptions are removed - won't this extend the lifetime of all RespServerSessions indefinitely?

Copy link
Member Author

@kevinmichaelbowersox kevinmichaelbowersox Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a fix that unsubscribes the session from the ACL notifications when the session is killed.

        /// Subsequent calls will return false.
        /// </summary>
        public bool TryKill()
        {
            if (!networkSender.TryClose())
            {
                return false;
            }

            if (_authenticator is GarnetACLAuthenticator aclAuthenticator)
            {
                aclAuthenticator.GetAccessControlList().Unsubscribe(this);
            }

            return true;
        }

/// <param name="user">The created or updated <see cref="User"/> that triggered the notification.</param>
private void NotifySubscribers(User user)
{
foreach (IAccessControlListSubscriber subscriber in _subscribers.Values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is a race here:

  1. A new RespServerSession is created, it runs through this.AuthenticateUser (L236) and the thread is paused before subscription runs
  2. Another thread executes a ACL SETUSER, triggering this NotifySubscribers method
  3. The thread running in #1 resumes, and registers itself

Net effect, that new RespServerSession ends up with the old user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a fix that checks the authenticated user is current prior to finishing instantiation of the session.

            // Ensure user was not updated during initialization.
            if (_authenticator is GarnetACLAuthenticator aclAuthenticator && _user != aclAuthenticator.GetUser())
            {
                this.RefreshUser();
            }

libs/server/ACL/IAccessControlListSubscriber.cs Outdated Show resolved Hide resolved
libs/server/Resp/AdminCommands.cs Outdated Show resolved Hide resolved
libs/server/Resp/RespServerSession.cs Outdated Show resolved Hide resolved
@@ -438,6 +469,11 @@ private void ProcessMessages()
// Check ACL permissions for the command
if (cmd != RespCommand.INVALID)
{
if (_isUserAclStale && cmd != RespCommand.AUTH)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very hot code, we'll want to see benchmarks to measure the impact.

I'd be more comfortable if this was in a failure path or something, rather than at the very start of the command dispatch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is essential to the solution because it forces the refresh when the user's acl has been updated. We'll need to check the benchmarks to see if the approach is viable. I was hoping the conditional evaluation of the simple boolean would not be a drain on performance.

If the benchmarks do not perform well, the entire design will need reconsidered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main Commit Sha(a63ad80) BDN Results:

Method Params Mean Error StdDev Allocated
Set ACL 14.20 μs 0.052 μs 0.049 μs -
SetEx ACL 19.98 μs 0.035 μs 0.031 μs -
SetNx ACL 20.45 μs 0.055 μs 0.052 μs -
SetXx ACL 23.73 μs 0.101 μs 0.085 μs -
GetFound ACL 15.55 μs 0.029 μs 0.026 μs -
GetNotFound ACL 10.95 μs 0.016 μs 0.015 μs -
Increment ACL 21.16 μs 0.044 μs 0.039 μs -
Decrement ACL 21.12 μs 0.064 μs 0.053 μs -
IncrementBy ACL 25.56 μs 0.032 μs 0.028 μs -
DecrementBy ACL 27.57 μs 0.096 μs 0.085 μs -
Set AOF 19.96 μs 0.064 μs 0.057 μs -
SetEx AOF 25.55 μs 0.089 μs 0.079 μs -
SetNx AOF 27.04 μs 0.138 μs 0.129 μs -
SetXx AOF 28.60 μs 0.292 μs 0.273 μs -
GetFound AOF 15.62 μs 0.028 μs 0.025 μs -
GetNotFound AOF 10.80 μs 0.024 μs 0.022 μs -
Increment AOF 27.29 μs 0.120 μs 0.107 μs -
Decrement AOF 27.98 μs 0.053 μs 0.049 μs -
IncrementBy AOF 31.82 μs 0.121 μs 0.107 μs -
DecrementBy AOF 32.33 μs 0.150 μs 0.133 μs -
Set None 14.11 μs 0.018 μs 0.015 μs -
SetEx None 19.96 μs 0.118 μs 0.110 μs -
SetNx None 20.70 μs 0.038 μs 0.035 μs -
SetXx None 23.47 μs 0.039 μs 0.033 μs -
GetFound None 15.64 μs 0.041 μs 0.034 μs -
GetNotFound None 11.03 μs 0.023 μs 0.019 μs -
Increment None 20.76 μs 0.027 μs 0.023 μs -
Decrement None 21.87 μs 0.037 μs 0.033 μs -
IncrementBy None 27.58 μs 0.180 μs 0.160 μs -
DecrementBy None 27.71 μs 0.075 μs 0.066 μs -

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR Commit SHA(4ca6db5) BDN Results:

Method Params Mean Error StdDev Allocated
Set ACL 15.13 us 0.103 us 0.091 us -
SetEx ACL 20.49 us 0.064 us 0.060 us -
SetNx ACL 21.70 us 0.072 us 0.060 us -
SetXx ACL 23.91 us 0.058 us 0.052 us -
GetFound ACL 15.58 us 0.105 us 0.087 us -
GetNotFound ACL 10.92 us 0.016 us 0.014 us -
Increment ACL 21.11 us 0.049 us 0.043 us -
Decrement ACL 21.81 us 0.057 us 0.050 us -
IncrementBy ACL 26.21 us 0.114 us 0.101 us -
DecrementBy ACL 26.17 us 0.063 us 0.059 us -
Set AOF 19.36 us 0.077 us 0.068 us -
SetEx AOF 25.41 us 0.063 us 0.056 us -
SetNx AOF 27.82 us 0.081 us 0.072 us -
SetXx AOF 27.71 us 0.086 us 0.076 us -
GetFound AOF 15.55 us 0.029 us 0.026 us -
GetNotFound AOF 11.46 us 0.010 us 0.009 us -
Increment AOF 26.35 us 0.036 us 0.032 us -
Decrement AOF 27.86 us 0.043 us 0.040 us -
IncrementBy AOF 32.92 us 0.109 us 0.097 us -
DecrementBy AOF 33.06 us 0.090 us 0.085 us -
Set None 14.89 us 0.019 us 0.017 us -
SetEx None 20.96 us 0.080 us 0.071 us -
SetNx None 20.12 us 0.039 us 0.037 us -
SetXx None 23.88 us 0.045 us 0.040 us -
GetFound None 15.77 us 0.026 us 0.022 us -
GetNotFound None 10.59 us 0.020 us 0.018 us -
Increment None 22.16 us 0.206 us 0.183 us -
Decrement None 22.51 us 0.115 us 0.096 us -
IncrementBy None 26.93 us 0.138 us 0.122 us -

/// <summary>
/// Dictionary containing stable lock objects for each user in the ACL.
/// </summary>
ConcurrentDictionary<string, object> _userLockObjects = new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get why we're introducing lock objects - we can just lock on the User if it's already exists?

For new users things could be phrased in terms of GetOrAdd().

Copy link
Member Author

@kevinmichaelbowersox kevinmichaelbowersox Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered using the user as the lock object, however we are replacing the user object in the access control list's dictionary when ACL SETUSER is called. This makes it a bad candidate for a lock object because we would introduce additional threading issues (different threads would lock on different objects for the user). We need a stable object for each thread to obtain the user level lock on.

@badrishc
Copy link
Contributor

badrishc commented Feb 4, 2025

Minor nit: can you fix the title of this PR to be a bit descriptive? Thanks.

@kevinmichaelbowersox kevinmichaelbowersox changed the title Users/kbowersox/aclsetuser threads Improve ACL SETUSER Thread Handling Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve ACL SETUSER Thread Handling
3 participants