Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly test failure with @pki/master copr repo #4710

Closed
amore17 opened this issue Apr 2, 2024 · 16 comments
Closed

Nightly test failure with @pki/master copr repo #4710

amore17 opened this issue Apr 2, 2024 · 16 comments

Comments

@amore17
Copy link

amore17 commented Apr 2, 2024

FreeIPA nightly tests is failing with the copr repo @pki/master. The test is installing IPA server

See PR #freeipa-pr-ci2/freeipa#3540, with the following report and logs:

E           subprocess.CalledProcessError: Command '['ipa-server-install', '-n', 'ipa.test', '-r', 'IPA.TEST', '-p', 'Secret.123', '-a', 'Secret.123', '--domain-level=1', '--dirsrv-config-file', '/ipatests/ipatests_dse.ldif', '-U', '--setup-dns', '--forwarder', '192.168.121.1', '--auto-reverse']' returned non-zero exit status 1.

pytest_ipa/integration/host.py:202: CalledProcessError

---------------------------- Captured stderr setup -----------------------------
ipa: ERROR: stderr: Checking DNS domain ipa.test., please wait ...
Reverse record for IP address 192.168.121.144 already exists
Synchronizing time
No SRV records of NTP servers found and no NTP server or pool address was provided.
Attempting to sync time with chronyc.
Process chronyc waitsync failed to sync time!
Unable to sync time with chrony server, assuming the time is in sync. Please check that 123 UDP port is opened, and any time server is on network.
Validate installation settings ...
Create file system structures ...
Perform SELinux labeling ...
Create database backend: dc=ipa,dc=test ...
Perform post-installation tasks ...
Certificate issuance failed (CA_REJECTED: Server at "https://master.ipa.test:8443/ca/agent/ca//profileProcess" replied: 1: Server Internal Error)
The ipa-server-install command failed. See /var/log/ipaserver-install.log for more information

@flo-renaud
Copy link

The test is failing in the step requesting a RA certificate. ipa-server-install is using certmonger to get the RA cert (see ipaserver-install.log):

2024-04-01T21:25:45Z DEBUG   [18/31]: requesting RA certificate from CA
2024-04-01T21:25:45Z DEBUG Starting external process
2024-04-01T21:25:45Z DEBUG args=['/usr/bin/openssl', 'pkcs7', '-inform', 'DER', '-print_certs', '-out', '/var/lib/ipa/tmpigo40rl2']
2024-04-01T21:25:45Z DEBUG Process finished, return code=0
2024-04-01T21:25:45Z DEBUG stdout=
2024-04-01T21:25:45Z DEBUG stderr=
2024-04-01T21:25:45Z DEBUG Starting external process
2024-04-01T21:25:45Z DEBUG args=['/usr/bin/openssl', 'pkcs12', '-nokeys', '-clcerts', '-in', '/root/ca-agent.p12', '-out', '/var/lib/ipa/tmp16fahv2i', '-passin', 'file:/tmp/tmpj53_zdw3']
2024-04-01T21:25:45Z DEBUG Process finished, return code=0
2024-04-01T21:25:45Z DEBUG stdout=
2024-04-01T21:25:45Z DEBUG stderr=
2024-04-01T21:25:45Z DEBUG Starting external process
2024-04-01T21:25:45Z DEBUG args=['/usr/bin/openssl', 'pkcs12', '-nocerts', '-in', '/root/ca-agent.p12', '-out', '/var/lib/ipa/tmpa34r7c6p', '-passin', 'file:/tmp/tmpf2owpkas', '-nodes']
2024-04-01T21:25:45Z DEBUG Process finished, return code=0
2024-04-01T21:25:45Z DEBUG stdout=
2024-04-01T21:25:45Z DEBUG stderr=
2024-04-01T21:26:05Z DEBUG certmonger request is in state 'GENERATING_KEY_PAIR'
2024-04-01T21:26:05Z DEBUG certmonger request is in state 'SUBMITTING'
2024-04-01T21:26:06Z DEBUG certmonger request is in state 'CA_REJECTED'
2024-04-01T21:26:06Z DEBUG Cert request 20240401212605 failed: CA_REJECTED (Server at "https://master.ipa.test:8443/ca/agent/ca//profileProcess" replied: 1: Server Internal Error)
2024-04-01T21:26:06Z DEBUG Giving up on cert request 20240401212605
2024-04-01T21:26:06Z DEBUG Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/ipaserver/install/service.py", line 686, in start_creation
    run_step(full_msg, method)
  File "/usr/lib/python3.12/site-packages/ipaserver/install/service.py", line 672, in run_step
    method()
  File "/usr/lib/python3.12/site-packages/ipaserver/install/cainstance.py", line 923, in __request_ra_certificate
    reqId = certmonger.request_and_wait_for_cert(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ipalib/install/certmonger.py", line 414, in request_and_wait_for_cert
    raise RuntimeError(
RuntimeError: Certificate issuance failed (CA_REJECTED: Server at "https://master.ipa.test:8443/ca/agent/ca//profileProcess" replied: 1: Server Internal Error)

PKI correctly processes the cert request, adds the cert request to the LDAP database (from /var/log/pki/pki-tomcat/ca/debug.2024-04-01.log):

2024-04-01 21:26:05 [http-nio-8080-exec-11] INFO: CertProcessor: Submitting certificate request to caSubsystemCert profile
2024-04-01 21:26:05 [http-nio-8080-exec-11] INFO: LDAPSession: Adding cn=7,ou=ca, ou=requests,o=ipaca

but when the cert is issued, PKI tries to write the cert to the LDAP database with an already existing serial number:

2024-04-01 21:26:06 [https-jsse-jss-nio-8443-exec-4] INFO: CAEnrollProfile: Processing enrollment request 0x7
2024-04-01 21:26:06 [https-jsse-jss-nio-8443-exec-4] INFO: CAService: Signing cert 0x1
2024-04-01 21:26:06 [https-jsse-jss-nio-8443-exec-4] INFO: CASigningUnit: Getting algorithm context for SHA256withRSA RSASignatureWithSHA256Digest
2024-04-01 21:26:06 [https-jsse-jss-nio-8443-exec-4] INFO: CASigningUnit: Signing ...
2024-04-01 21:26:06 [https-jsse-jss-nio-8443-exec-4] INFO: CAService: Storing cert 0x1
2024-04-01 21:26:06 [https-jsse-jss-nio-8443-exec-4] INFO: LDAPSession: Adding cn=1,ou=certificateRepository, ou=ca,o=ipaca
2024-04-01 21:26:06 [https-jsse-jss-nio-8443-exec-4] SEVERE: Could not store certificate serial number 0x1
Unable to add certificate record: Unable to add LDAP entry: Already exists
	at com.netscape.cmscore.dbs.CertificateRepository.addCertificateRecord(CertificateRepository.java:744)

@flo-renaud
Copy link

@edewata could you have a look at this regression in the nightly builds?

@edewata
Copy link
Contributor

edewata commented Apr 8, 2024

@rcritten @tbordaz I think this is the issue I mentioned to you a few days ago. In upstream PKI CI we're only observing this issue in IPA tests which use sequential serial numbers (instead of RSNv3) which still depends on VLV. Was there any VLV-related change in DS recently that might have caused this?

@amore17 @flo-renaud Is it possible to test IPA with RSNv3 enabled by default to isolate the issue?

@fmarco76 We might need to prioritize VLV removal in this area.

@tbordaz
Copy link

tbordaz commented Apr 9, 2024

@edewata , if I am correct the test ran 389-Directory/2.4.5 that is using VLV/BDB. I know that during the transition VLV/LMDB, @progier discovered VLV bug(s) that have been fixed only in 3.0.

@edewata
Copy link
Contributor

edewata commented Apr 9, 2024

@tbordaz Thanks for the info.
@amore17 @flo-renaud @rcritten Which DS version was used in the above test? Is it possible to test with the latest DS?

@flo-renaud
Copy link

@edewata you can get the list of packages from the report url: http://freeipa-org-pr-ci.s3-website.eu-central-1.amazonaws.com/jobs/c570a07c-f06c-11ee-b706-fa163e8e2127/installed_packages/installed_packages_master.log.gz
the test is using 389-ds-base-2.4.5-1.fc39.x86_64

@edewata
Copy link
Contributor

edewata commented Apr 10, 2024

@flo-renaud Thanks. In that case I would suggest switching to DS 3.0 as @tbordaz suggested.

From PKI side it might be possible to change the sequential serial numbers to use paged results instead of VLV, but the performance might downgrade significantly. @fmarco76 might have better idea about this.

@fmarco76
Copy link
Member

@flo-renaud @edewata if update DS is possible and it resolves the problem I think it will be faster.

@progier389
Copy link

FYI: According to git log and git blame, There are no changes in vlv code in 2.x branches for the last two years
(last impacted version was 2.2)

That is true that while fixing the vlv issues on lmdb, I noticed a suspicious thing about vlv that could impact online import /bulk import (and maybe reindex) if there are vlv with ONELEVEL search scope (as it is the case with freeipa) but this issue exists for decades, so why are we seeing it now ?)

@edewata
Copy link
Contributor

edewata commented Apr 18, 2024

Just FYI, I checked PKI CI execution history, here's the last commit where IPA installation was still successful (although there was a subsequent test failure):
043ab92
https://github.com/dogtagpki/pki/actions/runs/8605380653/job/23581813228#step:6:268

Then here is the next commit where IPA installation failure was first observed:
a54130a
https://github.com/dogtagpki/pki/actions/runs/8607022021/job/23586823996#step:6:149

As you can see, the changes in PKI do not seem to be related at all to IPA, KRA, pkispawn, VLV, simple paged results, or sequential serial numbers. So I'm still thinking that the issue was caused by changes in another package that was pushed to Fedora 39 around the beginning of April.

@edewata
Copy link
Contributor

edewata commented May 13, 2024

FYI, recently @fmarco76 managed to drop VLV dependency from the code that we use to access the cert records in DS:
#4735

With this change at least IPA installation was able to complete successfully:
https://github.com/dogtagpki/pki/actions/runs/9067072889/job/24918746702

There are still some occasional test failures, but they happen after installation so they seem to be unrelated and will be investigated separately.

@amore17 @flo-renaud Please verify IPA installation with the latest PKI COPR build. Thanks!

@flo-renaud
Copy link

@edewata on our last run (with dogtag-pki-base-11.6.0-0.1.alpha1.20240510195204UTC.078a33ca.fc39.noarch) we now see 19 failing tests (link) instead of 54 the previous week. There is definitely an improvement but we still see some installation issues.

  • installation with an externally signed CA cert (looks like pkispawn is expecting a filename but is provided with the cert content instead)
  • installation with a self signed CA (issuance of the RA cert fails with Error 56 connecting to https://master.ipa.test:8443/ca/agent/ca//profileReview: Failure when receiving data from the peer). Seems to happen in a scenario install / uninstall / install

@edewata
Copy link
Contributor

edewata commented May 14, 2024

Could we close this ticket and create new tickets for those issues? We will need more specific steps to reproduce and logs too.

@flo-renaud
Copy link

For the externally signed CA I opened this ticket: #4745
I am still trying to reproduce the other issue with Error 56, will create a ticket when I have a reproducer.

In the meantime I agree, we can close this ticket as the problem described here does not happen any more.

@flo-renaud
Copy link

For the install / uninstall / re-install scneario: ticket #4746

@edewata
Copy link
Contributor

edewata commented May 15, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants