Adding Efa device level metrics #277
Open
+223
−33
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
This PR adds the Elastic Network Interface (ENI) ID as a dimension to EFA (Elastic Fabric Adapter) metrics. This enhancement improves the observability of EFA devices by linking them to their corresponding ENI IDs in AWS, making it easier to correlate EFA metrics with network interface resources.
We calculate ElasticNetworkInterfaceId by the following steps.
Implementation Details
The ENI ID is derived through the following process:
For each EFA device, we extract its MAC address from the IPv6 GID (Global Identifier) using the following steps:
/sys/class/infiniband/<device>/ports/<port>/gids/0
Using the AWS EC2 metadata service, we map the MAC address to its corresponding ENI ID
The ENI ID is then added as a dimension to all EFA metrics:
Example Metric Output
ClusterName, ContainerName, ElasticNetworkInterfaceId, FullPodName, Namespace, PodName: for container level metrics
ClusterName, ElasticNetworkInterfaceId, FullPodName, Namespace, PodName: for pod level metrics
ClusterName, ElasticNetworkInterfaceId, InstanceId, NodeName: for node level metrics
Testing:
EMF Output