Performance regression in `cudaq.observe` and spin_operator creation #2437

bebora · 2024-11-28T16:56:52Z

Required prerequisites

Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
Make sure you've read the documentation. Your issue may be addressed there.
Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

Creating a SpinOperator and observing the expectation value of a kernel with a given operator took a performance hit from v0.8.0 to v0.9.0.

I'm experimenting on a GH200 and getting the following slowdowns with a 20 qubit kernel and a Hamiltonian created from 3000 random SpinOperators:
SpinOperator creation: 22x
cudaq.observe duration: 46x

Steps to reproduce the bug

Create the file obs.py with the following content:

import random
import time

import numpy as np

import cudaq


def random_pauli_word(qubit_count: int) -> str:
    ops = "IXYZ"
    return "".join(random.choices(ops, k=qubit_count))


def random_hamiltonian(qubit_count: int, terms: int) -> dict[str, float]:
    res = {}
    for _ in range(terms):
        res[random_pauli_word(qubit_count)] = random.uniform(-0.1, 0.1)
    return res


def create_spin_operator(
    h: dict[str, float]
):  # -> cudaq.mlir._mlir_libs._quakeDialects.cudaq_runtime.SpinOperator
    res = 0
    for pauli_word, coeff in h.items():
        term = cudaq.SpinOperator.from_word(pauli_word)
        res += term * coeff

    return res


@cudaq.kernel
def ansatz(
    qubit_count: int,
):
    qc = cudaq.qvector(qubit_count)
    param = 0

    for i in range(qubit_count):
        ry(param, qc[i])
        param += 1


def observe_ansatz(qubit_count: int) -> tuple[float, float]:
    random.seed(0xE4)

    random_h = random_hamiltonian(qubit_count, 3000)
    operator_start = time.time()
    random_spin_operator = create_spin_operator(random_h)
    operator_time = time.time() - operator_start

    observe_start = time.time()
    value = cudaq.observe(ansatz, random_spin_operator, qubit_count).expectation()
    observe_time = time.time() - observe_start

    return operator_time, observe_time


if __name__ == "__main__":
    op_time_acc = []
    obs_time_acc = []
    for _ in range(10):
        op_time, obs_time = observe_ansatz(20)
        op_time_acc.append(op_time)
        obs_time_acc.append(obs_time)
    print(f"SpinOperator creation: {np.mean(op_time_acc)}")
    print(f"cudaq.observe duration: {np.mean(obs_time_acc)}")

I am getting the following times on a GH200 machine:
Version: 0.8.0

$ python obs.py
SpinOperator creation: 0.010332965850830078
cudaq.observe duration: 0.042797398567199704

Version: cu12-0.9.0

$ python obs.py
SpinOperator creation: 0.22831881046295166
cudaq.observe duration: 1.961019015312195

cu12-latest, cu11-latest, and cu12-0.9.0 have comparable running times.

Expected behavior

I expect the latest version to have comparable running times to the previous stable version (v0.8.0).

Is this a regression? If it is, put the last known working version (or commit) here.

Performance regression from v0.8.0

Environment

CUDA-Q version: cu11-0.9.0, cu12-0.9.0, and cu12-latest
Python version: 3.10.12
Operating system: Docker container

Suggestions

No response

The text was updated successfully, but these errors were encountered:

boschmitt · 2024-11-28T22:57:57Z

For those looking into to this: I can reproduce the issue. The problem seems to lie on the new SpinOperator and its lazy evaluation capabilities. Using the old one shows no performance regression.

Using the following create_spin_operator function:

def create_spin_operator(
    h: dict[str, float]
):  # -> cudaq.mlir._mlir_libs._quakeDialects.cudaq_runtime.SpinOperator
    res = 0
    for pauli_word, coeff in h.items():
        term = cudaq.mlir._mlir_libs._quakeDialects.cudaq_runtime.SpinOperator.from_word(pauli_word)
        res += term * coeff

    return res

# Version 0.8.0
SpinOperator creation: 0.007181382179260254
cudaq.observe duration: 0.03874294757843018

# Version 0.9.0-cu12 (new SpinOperator)
SpinOperator creation: 0.18058569431304933
cudaq.observe duration: 1.308878755569458

# Version 0.9.0-cu12 (old SpinOperator)
SpinOperator creation: 0.007892823219299317
cudaq.observe duration: 0.03526647090911865

1tnguyen added performance python-lang Anything related to the Python CUDA Quantum language implementation labels Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in `cudaq.observe` and spin_operator creation #2437

Performance regression in `cudaq.observe` and spin_operator creation #2437

bebora commented Nov 28, 2024

boschmitt commented Nov 28, 2024

Performance regression in cudaq.observe and spin_operator creation #2437

Performance regression in cudaq.observe and spin_operator creation #2437

Comments

bebora commented Nov 28, 2024

Required prerequisites

Describe the bug

Steps to reproduce the bug

Expected behavior

Is this a regression? If it is, put the last known working version (or commit) here.

Environment

Suggestions

boschmitt commented Nov 28, 2024

Performance regression in `cudaq.observe` and spin_operator creation #2437

Performance regression in `cudaq.observe` and spin_operator creation #2437