Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

age_graph _wrap_query() method does not consider the combing queries like sql statements with operator like UNION, EXCEPT #29429

Open
5 tasks done
zhaohuizh opened this issue Jan 26, 2025 · 1 comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@zhaohuizh
Copy link

zhaohuizh commented Jan 26, 2025

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

just calling graph.query(cypher_query, params=params) with a cypher query which is a combined query contains UNION

Error Message and Stack Trace (if applicable)

  File "/home/abc/workspaces/mem0/mem0/memory/main.py", line 124, in add
    graph_result = future2.result()
                   ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abc/workspaces/mem0/mem0/memory/main.py", line 254, in _add_to_graph
    added_entities = self.graph.add(data, filters)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abc/workspaces/mem0/mem0/memory/graph_memory.py", line 71, in add
    search_output = self._search_graph_db(node_list=list(entity_type_map.keys()), filters=filters)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abc/workspaces/mem0/mem0/memory/graph_memory.py", line 287, in _search_graph_db
    ans = self.graph.query(cypher_query, params=params)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abc/venv/lib/python3.12/site-packages/langchain_community/graphs/age_graph.py", line 630, in query
    raise AGEQueryException(
langchain_community.graphs.age_graph.AGEQueryException: {'message': "Error executing graph query: \n            MATCH (n)\n            WHERE n.properties->'embedding' IS NOT NULL \n            WITH n, \n                 reduce(dot = 0.0, i IN range(0, array_length(n.properties->'embedding', 1) - 1) |\n                     dot + (n.properties->'embedding'->i)::FLOAT * $n_embedding[i]) AS dot_product,\n                 sqrt(reduce(l2 = 0.0, i IN range(0, array_length(n.properties->'embedding', 1) - 1) |\n                     l2 + ((n.properties->'embedding'->i)::FLOAT)^2)) AS n_magnitude,\n                 sqrt(reduce(l2 = 0.0, i IN range(0, array_length($n_embedding, 1) - 1) |\n                     l2 + ($n_embedding[i])^2)) AS query_magnitude\n            WITH n, \n                 round(dot_product / (n_magnitude * query_magnitude), 4) AS similarity\n            WHERE similarity >= $threshold\n            MATCH (n)-[r]->(m)\n            RETURN n.properties->'name' AS source, \n                   id(n) AS source_id, \n                   type(r) AS relationship, \n                   id(r) AS relation_id, \n                   m.properties->'name' AS destination, \n                   id(m) AS destination_id, \n                   similarity\n            UNION\n            MATCH (n)\n            WHERE n.properties->'embedding' IS NOT NULL \n            WITH n, \n                 reduce(dot = 0.0, i IN range(0, array_length(n.properties->'embedding', 1) - 1) |\n                     dot + (n.properties->'embedding'->i)::FLOAT * $n_embedding[i]) AS dot_product,\n                 sqrt(reduce(l2 = 0.0, i IN range(0, array_length(n.properties->'embedding', 1) - 1) |\n                     l2 + ((n.properties->'embedding'->i)::FLOAT)^2)) AS n_magnitude,\n                 sqrt(reduce(l2 = 0.0, i IN range(0, array_length($n_embedding, 1) - 1) |\n                     l2 + ($n_embedding[i])^2)) AS query_magnitude\n            WITH n, \n                 round(dot_product / (n_magnitude * query_magnitude), 4) AS similarity\n            WHERE similarity >= $threshold\n            MATCH (m)-[r]->(n)\n            RETURN m.properties->'name' AS source, \n                   id(m) AS source_id, \n                   type(r) AS relationship, \n                   id(r) AS relation_id, \n                   n.properties->'name' AS destination, \n                   id(n) AS destination_id, \n                   similarity\n            ORDER BY similarity DESC\n            LIMIT $limit; \n            ", 'detail': 'syntax error at or near "->"\nLINE 47: ...tination agtype, destination_id agtype, properties->\'embeddi...\n  

Description

In the code age_graph.py, there is a method _wrap_query() which converts a cypher query to an
age compatible query. It will find the return key word, and turns all the fields after the return key word as the return fields.

But if the query is a combining query, which contains operator like UNION EXCEPT, it could contain multiple return statements. The following code will incorrectly treat all the statements after the first return key word as the return fields.

        # pgsql template
        template = """SELECT {projection} FROM ag_catalog.cypher('{graph_name}', $$
            {query}
        $$) AS ({fields});"""

        # if there are any returned fields they must be added to the pgsql query
----->  return_match = re.search(r'\breturn\b(?![^"]*")', query, re.IGNORECASE)
        if return_match:
            # Extract the part of the query after the RETURN keyword
            return_clause = query[return_match.end() :]

            # parse return statement to identify returned fields
            fields = (
                return_clause.lower()
                .split("distinct")[-1]
                .split("order by")[0]
                .split("skip")[0]
                .split("limit")[0]
                .split(",")
            )

            # raise exception if RETURN * is found as we can't resolve the fields
            if "*" in [x.strip() for x in fields]:
                raise ValueError(
                    "AGE graph does not support 'RETURN *'"
                    + " statements in Cypher queries"
                )

            # get pgsql formatted field names
            fields = [
                AGEGraph._get_col_name(field, idx) for idx, field in enumerate(fields)
            ]

            # build resulting pgsql relation
            fields_str = ", ".join(
                [
                    field.split(".")[-1] + " agtype"
                    for field in fields
                    if field.split(".")[-1]
                ]
            )

System Info

System Information
------------------
> OS:  Linux
> OS Version:  #52-Ubuntu SMP PREEMPT_DYNAMIC Thu Dec  5 13:32:09 UTC 2024
> Python Version:  3.12.3 (main, Nov  6 2024, 18:32:19) [GCC 13.2.0]

Package Information
-------------------
> langchain_core: 0.3.29
> langchain: 0.3.14
> langchain_community: 0.3.14
> langsmith: 0.2.10
> langchain_text_splitters: 0.3.4

Optional packages not installed
-------------------------------
> langserve

Other Dependencies
------------------
> aiohttp: 3.11.11
> async-timeout: Installed. No version info available.
> dataclasses-json: 0.6.7
> httpx: 0.28.1
> httpx-sse: 0.4.0
> jsonpatch: 1.33
> langsmith-pyo3: Installed. No version info available.
> numpy: 2.0.2
> orjson: 3.10.13
> packaging: 24.2
> pydantic: 2.10.4
> pydantic-settings: 2.7.1
> PyYAML: 6.0.2
> requests: 2.32.3
> requests-toolbelt: 1.0.0
> SQLAlchemy: 2.0.36
> tenacity: 9.0.0
> typing-extensions: 4.12.2
> zstandard: Installed. No version info available.
@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jan 26, 2025
rawathemant246 added a commit to rawathemant246/langchain that referenced this issue Jan 30, 2025
✅Added testcases to validate tests/unit_tests/graphs/test_age_graph.py
@rawathemant246
Copy link
Contributor

rawathemant246 commented Jan 30, 2025

@zhaohuizh Fix the issue now _wrap_query method can take UNION and EXCEPT arguments in the cypher query. Add the additional test cases to validate simple cases and complex cases

ccurme pushed a commit that referenced this issue Feb 2, 2025
## Description:

This PR addresses issue #29429 by fixing the _wrap_query method in
langchain_community/graphs/age_graph.py. The method now correctly
handles Cypher queries with UNION and EXCEPT operators, ensuring that
the fields in the SQL query are ordered as they appear in the Cypher
query. Additionally, the method now properly handles cases where RETURN
* is not supported.

### Issue: #29429

### Dependencies: None


### Add tests and docs:

Added unit tests in tests/unit_tests/graphs/test_age_graph.py to
validate the changes.
No new integrations were added, so no example notebook is necessary.
Lint and test:

Ran make format, make lint, and make test to ensure code quality and
functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants