Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding two or more SC in separate KERs sometimes should not sometimes cause them to not register the other. #551

Open
bnouwt opened this issue Oct 31, 2024 · 1 comment

Comments

@bnouwt
Copy link
Collaborator

bnouwt commented Oct 31, 2024

In the TDI-500 docker compose project, a race condition seemed to occur where three knowledge mappers started up at the same time and registered their smart connector (in separate KERs) at the same time and this sometimes caused them to miss the registration of the other and this did not automatically fix itself after some time.

We think this is caused by a timing issue where SC A asks (at startup) which other SCs are already in the network and gets no response from SC B because SC B is not yet fully started. This should not be a problem, because every SC should notify all others of its existence by using a Post KI, but when SC B posts this notification SC A is not yet ready to receive this message and also does not register SC B in that way. So, SC A will never know that SC B exists.

There have been issues with this before that were partly fixed, but apparently it is not fully fixed. There is a workaround that @kadevgraaf-tno can attach to this issue, until we have fixed the underlying issue of SC startup.

@kadevgraaf-tno
Copy link
Contributor

kadevgraaf-tno commented Oct 31, 2024

I think I have a workaround by using Docker healthcheck and depends_on for starting knowledge mappers in succession/sequentially:

services:
  kd:
    image: ghcr.io/tno/knowledge-engine/knowledge-directory:1.2.3
 
  service1-km:
    build: ./service1-km
    environment:
      - SERVICE1_CLIENT_ID
      - SERVICE1_CLIENT_SECRET
      - SERVICE1_REFRESH_TOKEN
    restart: always
    healthcheck:
        test: ["CMD", "curl", "-f", "http://service1-sc:8280/rest/sc"]
        interval: 10s
        timeout: 5s
        retries: 5      
 
  service1-sc:
    image: ghcr.io/tno/knowledge-engine/smart-connector:1.2.4
    environment:
      HOSTNAME: service1-sc
      PORT: 8081 # The (knowledge engine internal) port that is used for inter-runtime communication
      KE_RUNTIME_EXPOSED_URL: http://service1-sc:8081/
      KE_RUNTIME_PORT: 8081
      KD_URL: http://kd:8282/
    restart: always          
 
  service2-km:
    build: ./service2-km
    environment:
      - TS_EMAIL
      - TS_PASSWORD
      - INTERFACE_ID
    restart: always
    healthcheck:
        test: ["CMD", "curl", "-f", "http://service2-sc:8280/rest/sc"]
        interval: 10s
        timeout: 5s
        retries: 5        
    depends_on:
      service1-km:
        condition: service_healthy
 
  service2-sc:
    image: ghcr.io/tno/knowledge-engine/smart-connector:1.2.4
    environment:
      HOSTNAME: service2-sc
      PORT: 8081 # The (knowledge engine internal) port that is used for inter-runtime communication
      KE_RUNTIME_EXPOSED_URL: http://service2-sc:8081/
      KE_RUNTIME_PORT: 8081
      KD_URL: http://kd:8282/
    restart: always
    depends_on:
      service1-km:
        condition: service_healthy      
 
  service3-km:
    build: ./service3-km
    environment:
      - SERVICE3_ACCESS_TOKEN
    restart: always
    healthcheck:
        test: ["CMD", "curl", "-f", "http://service3-sc:8280/rest/sc"]
        interval: 10s
        timeout: 5s
        retries: 5        
    depends_on:
      service2-km:
        condition: service_healthy
       
 
  service3-sc:
    image: ghcr.io/tno/knowledge-engine/smart-connector:1.2.4
    environment:
      HOSTNAME: service3-sc
      PORT: 8081 # The (knowledge engine internal) port that is used for inter-runtime communication
      KE_RUNTIME_EXPOSED_URL: http://service3-sc:8081/
      KE_RUNTIME_PORT: 8081
      KD_URL: http://kd:8282/
    restart: always
    depends_on:
      service2-km:
        condition: service_healthy      
     
  service4-km:
    build: ./service4-km
    environment:
      - SERVICE4_CLIENT_ID
      - SERVICE4_SECRET
      - SERVICE4_REFRESH_TOKEN
      - SERVICE4_SUBSCRIPTION_KEY
    restart: always
    depends_on:
      service3-km:
        condition: service_healthy
    healthcheck:
        test: ["CMD", "curl", "-f", "http://service3-sc:8280/rest/sc"]
        interval: 10s
        timeout: 5s
        retries: 5          
 
  service4-sc:
    image: ghcr.io/tno/knowledge-engine/smart-connector:1.2.4
    environment:
      HOSTNAME: service4-sc
      PORT: 8081 # The (knowledge engine internal) port that is used for inter-runtime communication
      KE_RUNTIME_EXPOSED_URL: http://service4-sc:8081/
      KE_RUNTIME_PORT: 8081
      KD_URL: http://kd:8282/
    restart: always
    depends_on:
      service3-km:
        condition: service_healthy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants