Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Handling of Escaped Quotes in Cookie Values #6890

Open
Konano opened this issue Feb 11, 2025 · 2 comments · May be fixed by #6891
Open

Incorrect Handling of Escaped Quotes in Cookie Values #6890

Konano opened this issue Feb 11, 2025 · 2 comments · May be fixed by #6891

Comments

@Konano
Copy link

Konano commented Feb 11, 2025

Expected Result

Legitimate escaped quotes (e.g., \") in cookie values should be preserved. For example:
Input value "159\\"687" (actual string: 159\"687) should remain unchanged.

Actual Result

Requests incorrectly replaces escaped quotes with an empty string, causing "159\\"687" to become "159687" (string becomes 159687), which corrupts valid values.

Reproduction Steps

import requests
from requests.cookies import create_cookie

# Create a cookie with escaped quotes
cookie = create_cookie(
    name="test_cookie",
    value='"159\\"687"',  # Actual stored value should be 159\"687
    domain="example.com"
)

# Test using a session
with requests.Session() as s:
    s.cookies.set_cookie(cookie)
    retrieved = s.cookies.get("test_cookie")
    print(f"Expected: 159\\\"687 | Actual: {retrieved.value}")  # Actual output: 159687

Issue Analysis

The code at src/requests/cookies.py#L349-L356 has the following problem:

# Problematic code snippet
if (
    hasattr(cookie.value, "startswith")
    and cookie.value.startswith('"')
    and cookie.value.endswith('"')
):
    cookie.value = cookie.value.replace('\\"', "")  # Incorrectly removes all escaped quotes

This logic makes incorrect assumptions about cookie value sanitization. While RFC 6265 specifies that cookie values shouldn't contain escaped characters (through its cookie-value definition), many real-world implementations:

  1. Allow backslash-escaped quotes in cookie values for historical compatibility
  2. Expect clients to preserve such values verbatim for proper server-side parsing
  3. Use these patterns in legitimate scenarios (e.g., JSON fragments in cookies)

By forcibly stripping escaped quotes, Requests breaks values that:

  • Were explicitly escaped by servers
  • Contain valid escaped sequences from non-standard implementations
  • Include quote characters in structured data formats

Suggested Fix

Remove this non-standard cleanup logic entirely.

@Konano
Copy link
Author

Konano commented Feb 12, 2025

I have noticed the historical discussions on #1440, #3759, and #5459, and have written a supplementary analysis.


Supplemental Analysis: Cookie Value Handling in Requests

1. RFC 6265 vs. Real-World Practices

RFC 6265 defines cookie-value as:

cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )  
cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %xD-7E  

Notably:

  • The RFC explicitly prohibits double quotes (%x22) within cookie values, even when wrapped in outer quotes.
  • It does not specify any escape mechanism for quotes within quoted values.

However, many real-world servers (often unintentionally) generate cookie values like "{\"bar\":\"baz\"}" by serializing structured data (e.g., JSON). While technically non-compliant, these values are widely used in practice.

Requests relies on http.cookiejar.CookieJar from the Python standard library, which does not reject such RFC-noncompliant values. To align with upstream behavior and ensure compatibility, Requests should likewise preserve these values verbatim instead of altering them.


2. Current Implementation Actively Corrupts Data

Requests’ existing logic (introduced ~12 years ago) strips escaped quotes (\\") from cookie values under the assumption that they are RFC violations. For example:

  • Input: "{\"user\": \"admin\"}" (intended value: {"user": "admin"})
  • Output after stripping: "{user: admin}" (invalid JSON, breaks parsing)

This behavior violates Postel’s Law (“Be conservative in what you send, liberal in what you accept”) and introduces silent data corruption in critical scenarios:

  • Authentication/Session Tokens: Altered values (e.g., OAuth/JWT tokens with escaped quotes) cause authorization failures.
  • Structured Data: JSON/XML fragments become unparseable, crashing downstream logic.

3. Industry Standards Prioritize Faithful Transmission

Major HTTP clients (browsers, curl, Postman, etc.) preserve escaped quotes in cookie values to avoid data corruption. Examples:

  • Browsers: If a server sets Set-Cookie: test="foo\"bar", browsers retain foo\"bar and transmit it verbatim.
  • curl/CLI Tools: No automatic sanitization of escaped quotes occurs unless explicitly requested.

By deviating from this norm, Requests introduces unique failure modes, forcing developers to implement brittle workarounds (e.g., double-escaping). This undermines interoperability and contradicts user expectations.


4. Proposed Resolution

While backward compatibility is important, data integrity is non-negotiable. We propose:

  • Remove the quote-stripping logic entirely.
  • Align with Python’s http.cookiejar and industry practices.

Why This Cannot Wait

The argument that “users rely on this behavior” conflates legacy inertia with intentional adoption. Most users impacted by this bug:

  • Are unaware their data is being altered, attributing failures to server-side issues.
  • Have no valid reason to depend on quote stripping—this behavior is a workaround, not a feature.

By contrast, the harm is concrete and escalating:

  • Debugging efforts waste hours due to silent data corruption.
  • Production systems fail unpredictably (e.g., token mismatches).
  • Technical debt compounds as developers adopt fragile fixes.

Final Appeal
“Maintaining 12-year-old technical debt should not justify forcing users into silent data loss. By addressing this proactively, Requests can uphold its role as a reliable, standards-aware library while respecting real-world needs.”

@Konano
Copy link
Author

Konano commented Feb 12, 2025

I have noticed the historical discussions on #1440, #3759, and #5459, and have written a supplementary analysis.

cc @Lukasa @sigmavirus24 @nateprewitt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant