Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved determination of what resource the job ran on #2094

Open
rynge opened this issue Feb 20, 2025 · 2 comments
Open

Improved determination of what resource the job ran on #2094

rynge opened this issue Feb 20, 2025 · 2 comments
Assignees
Milestone

Comments

@rynge
Copy link
Member

rynge commented Feb 20, 2025

We should improve a Pegasus job's ability to figure out what resource it is running on. This is a two-step process:

  1. The job should determine where it is running. This used to be easier based on hostnames and environments, but today where jobs can run in VMs and containers, we probably have to call an external service in those cases. For example, consider a job running in a k8s container with hostname avl43fkz4 (not enough information), it could call a new service which would return the resolved domain name of the public address the call would come from (let's say sdsc.edu).
  2. The site would be stored in the Stampede database and in some cased forwarded to the Pegasus friends database.

The team discussed what the site name should be, and at least initially agreed that top level domain would be the most useful. For example, we would see entries like sdsc.edu and aws.com. There is an open question if we should have something more granular (like which resource at SDSC), and if it is even possible to determine that.

@rynge rynge added this to the 5.1.0 milestone Feb 20, 2025
@rynge rynge self-assigned this Feb 20, 2025
@vahi
Copy link
Member

vahi commented Feb 21, 2025

two avenues where this information is of interest

  1. via the grafana dashboard. then it probably has to be piped via the job composite events
  2. the stampede database. for a wf running on OSG then having a breakdown of what OSG sites the job ran on
  3. the metrics server is not an option. since it is not aware of the jobs

For 5.1.0 we could do 1).

At the service end, the public IP is looked up against the DNS. initially the cardinality would be 1-1 lookup, and do the hostname lookup against the DNS.

the hostnames will be normalized to an institution name. so full hostnames will get mapped to site name

@vahi
Copy link
Member

vahi commented Mar 7, 2025

Need to make sure this reporting is turned off if user turns off metrics reporting by turning pegasus.metrics off

Also, need to update the documentation to reflect this new thing https://pegasus.isi.edu/documentation/reference-guide/funding-citing-usage-stats.html#usage-statistics-collection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants