[Refactor][Gitlab] Single account collection for on-prem instance #8283
Labels
component/plugins
This issue or PR relates to plugins
improvement
type/refactor
This issue is to refactor existing code
What and why to refactor
The gitlab plugin will collect account information relating to users (account_collector.go). For Gitlab.com this is done on a per project basis, and run for each gitlab repository collection.
For on-premise instance, there is a test which allows the plugin to use the global /users API endpoint. However this results in duplicated operations for each repository (data scope) in the project. I.e. for a DevLake project with 20 data scopes, this will result in the account information being gathered, extracted and converted 20 times. 19 of them will be repeats of the same data
User collection on a large user base (7000 users) takes 3min 30 seconds for collection, extraction and conversation per stage.
Describe the solution you'd like
Ideally account Collection for on-premise needs to be a single Gitlab stage that is added to the pipeline. However it could also be a added to the first collection stage as a subtask, and then not added as a subtask to other stages - however this makes that collection less visible.
The text was updated successfully, but these errors were encountered: