Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add e2e tests for the processing engine #26017

Open
jdstrand opened this issue Feb 13, 2025 · 2 comments
Open

Add e2e tests for the processing engine #26017

jdstrand opened this issue Feb 13, 2025 · 2 comments
Labels

Comments

@jdstrand
Copy link
Contributor

jdstrand commented Feb 13, 2025

The processing engine should have some end to end tests to ensure it is working properly, and these should run in CI. To start I suggest:

  1. running influxdb --help to ensure that libpython can be found
  2. for each trigger type, running influxdb --plugin-dir ... followed by influxdb test <trigger type> trigger-simple.py to ensure that python scripts can run at all
  3. adding an import that uses compiled code to the default venv then running influxdb --plugin-dir ... followed by influxdb test schedule_plugin ... trigger-with-imports.py where trigger-with-imports.py uses the import to see if compiled extension code (wheels) work
  4. ensuring the plugin-example script from https://docs.influxdata.com/influxdb3/core/get-started/#example-python-plugin-for-wal-flush (et al) continues to work.
@jdstrand jdstrand added the v3 label Feb 13, 2025
@jdstrand
Copy link
Contributor Author

For inspiration, here are two scripts that I used for testing #25969:

$ cat /path/to/plugin/dir/simple.py
import importlib.util                                                       
import sys                                                                  
def process_scheduled_call(influxdb3_local, call_time, args=None):          
    influxdb3_local.info("sys.prefix = %s" % sys.prefix)                    
    influxdb3_local.info("sys.path = %s" % sys.path)                        
    spec = importlib.util.find_spec("requests")                             
    if spec is not None:                                                    
        influxdb3_local.info("'requests' location = %s" % spec.origin)      
    else:                                                                   
        influxdb3_local.info("'requests' location = not found") 
$ cat /path/to/plugin/dir/venv-import.py
import importlib.util                                                       
import requests                                                             
import sys                                                                  
def process_scheduled_call(influxdb3_local, call_time, args=None):          
    influxdb3_local.info("sys.prefix = %s" % sys.prefix)                    
    influxdb3_local.info("sys.path = %s" % sys.path)                        
    spec = importlib.util.find_spec("requests")                             
    if spec is not None:                                                    
        influxdb3_local.info("'requests' location = %s" % spec.origin)      
    else:                                                                   
        influxdb3_local.info("'requests' location = not found")             
                                                                            
    influxdb3_local.info("requests = %s" % requests.__version__)

@jdstrand
Copy link
Contributor Author

jdstrand commented Feb 14, 2025

For more information, here are my raw notes for testing on Linux, OSX and Windows:

# setup - non-docker
$ rm -rf ~/tmp/influxdb3-pe/data/ ~/tmp/influxdb3-pe/install  # if re-testing
$ mkdir -p ~/tmp/influxdb3-pe/install ~/tmp/influxdb3-pe/data/plugins
$ tar -C ~/tmp/influxdb3-pe/install --strip-components=1 -zxf ~/Nextcloud/Work/Security/reviews/processing-engine/testing/ci-2025-02-13/influxdb3-3.0.0+snapshot-8daccb7e_linux_amd64.tar.gz
$ ~/tmp/influxdb3-pe/install/influxdb3 --help   # should show help

# until one is created for us and we enter it correctly
# NOT NEEDED, export PATH instead: linux/osx
#$ ~/tmp/influxdb3-pe/install/python/bin/python3 -m venv ~/tmp/influxdb3-pe/data/plugins/.venv
#source ~/tmp/influxdb3-pe/data/plugins/.venv/bin/activate
# windows
$ influxdb3-pe\install\python\python.exe -m venv influxdb3-pe\data\plugins\.venv
# eventually this:
# $ influxdb3-pe\data\plugins\.venv\Scripts\activate
# for now this ():
$ influxdb3-pe\data\plugins\.venv\Scripts\activate
(venv)$ python -m pip install requests
(venv)$ deactivate
$ set PYTHONPATH=Z:\influxdb3-pe\data\plugins\.venv\Lib\site-packages

# setup - docker (adjust '1500' as necessary)
$ mkdir -p ~/tmp/influxdb3-pe/data/plugins && sudo chgrp -R 1500 ~/tmp/influxdb3-pe/data && chmod -R 775 ~/tmp/influxdb3-pe/data

# server
# linux/osx (PATH is temporary; will be fixed soon)
$ export PATH=~/tmp/influxdb3-pe/install/python/bin:$PATH ; ~/tmp/influxdb3-pe/install/influxdb3 serve --node-id=local01 --object-store=file --data-dir ~/tmp/influxdb3-pe/data --plugin-dir ~/tmp/influxdb3-pe/data/plugins
# windows
$ influxdb3-pe\install\influxdb3.exe serve --node-id=local01 --object-store=file --data-dir influxdb3-pe\data --plugin-dir influxdb3-pe\data\plugins
# docker (PATH is temporary; will be fixed soon)
$ docker run --rm -it -e PATH=/usr/lib/influxdb3/python/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin -v $HOME/tmp/influxdb3-pe/data:/data -p 8181:8181 quay.io/influxdb/influxdb3-core:latest serve --node-id node0 --object-store file --data-dir /data --plugin-dir /data/plugins

# client - create database
$ ~/tmp/influxdb3-pe/install/influxdb3 create database mydb01  # once

# client - write a point (this point is needed for
# 'test wal_flush test-args.py', below)
$ ~/tmp/influxdb3-pe/install/influxdb3 write -d mydb01 "system,cpu=cpu-total,host=foo usage_guest=0.0,usage_guest_nice=0.0,usage_idle=98.8476953907837,usage_iowait=0.10020040080161435,usage_irq=0.0,usage_nice=0.0,usage_softirq=0.0,usage_steal=0.05010020040080273,usage_system=0.6513026052104621,usage_user=0.3507014028056636"

# client - query a point
$ ~/tmp/influxdb3-pe/install/influxdb3 q -d mydb01 "SELECT * FROM system WHERE host = 'foo'"
+-----------+------+-------------------------------+-------------+------------------+------------------+---------------------+-----------+------------+---------------+---------------------+--------------------+--------------------+
| cpu       | host | time                          | usage_guest | usage_guest_nice | usage_idle       | usage_iowait        | usage_irq | usage_nice | usage_softirq | usage_steal         | usage_system       | usage_user         |
+-----------+------+-------------------------------+-------------+------------------+------------------+---------------------+-----------+------------+---------------+---------------------+--------------------+--------------------+
| cpu-total | foo  | 2025-02-14T14:11:58.383533974 | 0.0         | 0.0              | 98.8476953907837 | 0.10020040080161435 | 0.0       | 0.0        | 0.0           | 0.05010020040080273 | 0.6513026052104621 | 0.3507014028056636 |
+-----------+------+-------------------------------+-------------+------------------+------------------+---------------------+-----------+------------+---------------+---------------------+--------------------+--------------------+


# client - testme.py (python works at all)
$ cat > ~/tmp/influxdb3-pe/data/plugins/testme.py <<EOM
def process_scheduled_call(influxdb3_local, call_time, args=None):
    influxdb3_local.info("HERE")
    influxdb3_local.info("done")
EOM
$ ~/tmp/influxdb3-pe/install/influxdb3 test schedule_plugin -d mydb01 testme.py
{
  "trigger_time": "2025-02-14T13:54:11Z",
  "log_lines": [
    "INFO: HERE",
    "INFO: done"
  ],
  "database_writes": {},
  "errors": []
}

# client - simple.py (show sys.prefix and sys.path and see if requests is
# present)
$ cat > ~/tmp/influxdb3-pe/data/plugins/simple.py <<EOM
import importlib.util
import sys
def process_scheduled_call(influxdb3_local, call_time, args=None):
    influxdb3_local.info("sys.prefix = %s" % sys.prefix)
    influxdb3_local.info("sys.path = %s" % sys.path)
    spec = importlib.util.find_spec("requests")
    if spec is not None:
        influxdb3_local.info("'requests' location = %s" % spec.origin)
    else:
        influxdb3_local.info("'requests' location = not found")
EOM
$ ~/tmp/influxdb3-pe/install/influxdb3 test schedule_plugin -d mydb01 simple.py
{
  "trigger_time": "2025-02-14T13:55:36Z",
  "log_lines": [
    "INFO: sys.prefix = /home/jamie/tmp/influxdb3-pe/venv",
    "INFO: sys.path = ['/home/jamie/tmp/influxdb3-pe/venv/lib/python3.11/site-packages', '/home/jamie/tmp/influxdb3-pe/install/python/lib/python311.zip', '/home/jamie/tmp/influxdb3-pe/install/python/lib/python3.11', '/home/jamie/tmp/influxdb3-pe/install/python/lib/python3.11/lib-dynload']",
    "INFO: 'requests' location = not found"
  ],
  "database_writes": {},
  "errors": []
}

# client install 'requests'
$ ~/tmp/influxdb3-pe/install/influxdb3 install package requests   # once

# client - simple.py again (see requests is present)
$ ~/tmp/influxdb3-pe/install/influxdb3 test schedule_plugin -d mydb01 simple.py
{
  "trigger_time": "2025-02-14T13:56:21Z",
  "log_lines": [
    "INFO: sys.prefix = /home/jamie/tmp/influxdb3-pe/venv",
    "INFO: sys.path = ['/home/jamie/tmp/influxdb3-pe/venv/lib/python3.11/site-packages', '/home/jamie/tmp/influxdb3-pe/install/python/lib/python311.zip', '/home/jamie/tmp/influxdb3-pe/install/python/lib/python3.11', '/home/jamie/tmp/influxdb3-pe/install/python/lib/python3.11/lib-dynload']",
    "INFO: 'requests' location = /home/jamie/tmp/influxdb3-pe/venv/lib/python3.11/site-packages/requests/__init__.py"
  ],
  "database_writes": {},
  "errors": []
}

# client - venv-import.py (wheels work)
$ cat > ~/tmp/influxdb3-pe/data/plugins/venv-import.py <<EOM
import importlib.util
import requests
import sys
def process_scheduled_call(influxdb3_local, call_time, args=None):
    influxdb3_local.info("sys.prefix = %s" % sys.prefix)
    influxdb3_local.info("sys.path = %s" % sys.path)
    spec = importlib.util.find_spec("requests")
    if spec is not None:
        influxdb3_local.info("'requests' location = %s" % spec.origin)
    else:
        influxdb3_local.info("'requests' location = not found")

    influxdb3_local.info("requests = %s" % requests.__version__)
EOM
$ ~/tmp/influxdb3-pe/install/influxdb3 test schedule_plugin -d mydb01 venv-import.py
{
  "trigger_time": "2025-02-14T13:57:47Z",
  "log_lines": [
    "INFO: sys.prefix = /home/jamie/tmp/influxdb3-pe/venv",
    "INFO: sys.path = ['/home/jamie/tmp/influxdb3-pe/venv/lib/python3.11/site-packages', '/home/jamie/tmp/influxdb3-pe/install/python/lib/python311.zip', '/home/jamie/tmp/influxdb3-pe/install/python/lib/python3.11', '/home/jamie/tmp/influxdb3-pe/install/python/lib/python3.11/lib-dynload']",
    "INFO: 'requests' location = /home/jamie/tmp/influxdb3-pe/venv/lib/python3.11/site-packages/requests/__init__.py",
    "INFO: requests = 2.32.3"
  ],
  "database_writes": {},
  "errors": []
}

# client - wal_plugin args work (requires the 'system' table exists and has
# a tag with host=foo (see above). IMPORTANT: backslashes for 'query_result'
# are for 'cat' in the shell and not the python script
$ cat > ~/tmp/influxdb3-pe/data/plugins/test-args.py <<EOM
def process_writes(influxdb3_local, table_batches, args=None):
    influxdb3_local.info("args: %s" % args)

    if "host" not in args:
        influxdb3_local.info("Could not find 'host' in args'")
    else:
        # direct query
        query_result = influxdb3_local.query("SELECT * FROM system WHERE host = 'foo'")
        influxdb3_local.info("direct query result: " + str(query_result))

        query_params = {"host": args["host"]}
        # https://influxdata.slack.com/archives/C084G9LR2HL/p1739550127057669
        # discusses how quoting has changed (previously needed '\$host')
        query_result = influxdb3_local.query("SELECT * FROM system WHERE host = \$host", query_params)
        influxdb3_local.info("parameterized query result: " + str(query_result))

    influxdb3_local.info("done")
EOM
$ ~/tmp/influxdb3-pe/install/influxdb3 test wal_plugin -d mydb01 --lp="system,cpu=cpu-total,host=foo usage_guest=0.0,usage_guest_nice=0.0,usage_idle=90.0,usage_iowait=0.3,usage_irq=0.0,usage_nice=0.0,usage_softirq=0.0,usage_steal=0.07,usage_system=0.7,usage_user=0.4" --input-arguments="host=foo" test-args.py
{
  "log_lines": [
    "INFO: args: {'host': 'foo'}",
    "INFO: direct query result: [{'cpu': 'cpu-total', 'host': 'foo', 'time': 1739551301406761605, 'usage_guest': 0.0, 'usage_guest_nice': 0.0, 'usage_idle': 98.8476953907837, 'usage_iowait': 0.10020040080161435, 'usage_irq': 0.0, 'usage_nice': 0.0, 'usage_softirq': 0.0, 'usage_steal': 0.05010020040080273, 'usage_system': 0.6513026052104621, 'usage_user': 0.3507014028056636}]",
    "INFO: parameterized query result: [{'cpu': 'cpu-total', 'host': 'foo', 'time': 1739551301406761605, 'usage_guest': 0.0, 'usage_guest_nice': 0.0, 'usage_idle': 98.8476953907837, 'usage_iowait': 0.10020040080161435, 'usage_irq': 0.0, 'usage_nice': 0.0, 'usage_softirq': 0.0, 'usage_steal': 0.05010020040080273, 'usage_system': 0.6513026052104621, 'usage_user': 0.3507014028056636}]",
    "INFO: done"
  ],
  "database_writes": {
    "mydb01": []
  },
  "errors": []
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant