Shrink the PBS uids cookie #1985

bretg · 2021-09-03T16:17:34Z

There are so many server-side adapters now that the PBS uids cookie has grown so large that it's starting to affect what other cookie values the host company domain can receive. e.g. my uids cookie is 3900 bytes. (!)

We need to address this.

Here are some values from my cookie:

"rtbhouse":{"uid":"0wWgMF3hXfr8PHKLfY6t","expires":"2021-09-14T21:10:07.402Z"}
"gumgum":{"uid":"u_c3dfbb5b-a7ca-4e65-9a1e-af26396e1ff8","expires":"2021-09-14T21:10:09.065Z"}
"triplelift":{"uid":"9328941297032053459","expires":"2021-09-14T21:10:09.63Z"}
"smartadserver":{"uid":"1814551064657459315","expires":"2021-09-14T21:10:10.895Z"}
"trustx":{"uid":"a11dcb02-b540-4caf-9964-5e338a0809ce","expires":"2021-09-14T21:10:15.324Z"}
"adform":{"uid":"1707054018971720697","expires":"2021-09-14T21:10:16.778Z"}
"brightroll":{"uid":"y-Q2s1jsNE2oJGw_4G5H1Bl7ujyBlC.zQf1y3C_yBISlvZ8o6yJXdsVA--~A","expires":"2021-09-14T21:10:16.991Z"}
"aja":{"uid":"s2suidLVGJLBaPujvTU7axZBrQ0QShiSi4xDAlP1z3TC4fm8OYkiQVqADI1UlFQzIy36Pn9-p2G4","expires":"2021-09-08T19:08:36.147Z"}

Totally, I have 32 bidder entries my cookie, for an average of 121 bytes per bidder encoded.

(See below for a major update to the original proposal.)

Current Structure

The expires value is used to drop the value from the cookie so /cookie_sync will get an updated ID from that bidder.

Structure of the current cookie:

{
  "uids": {},
  "tempUIDs": {
    "BIDDER": {
      "uid": "AQEIu9KdNAyV4wEISYkJAQEBAQE",
      "expires": "2021-09-14T21:10:50.503Z"
    },
  },
  "bday": "2021-05-11T15:15:24.619Z"
}

Background on the current structure

Here's the comments from the code: (usersync/cookie.go)
// "Legacy" cookies had UIDs without expiration dates, and recognized "0" as a legitimate UID for audienceNetwork.
// "Current" cookies always include UIDs with expiration dates, and never allow "0" for audienceNetwork.
//
// This Unmarshal method interprets both data formats, and does some conversions on legacy data to make it current.
// If you're seeing this message after March 2018, it's safe to assume that all the legacy cookies have been
// updated and remove the legacy logic.

Possible new structure

Here's a straw proposal based on using a relative timestamp rather than absolute:

{
  "v": 3,   // implement a version number so /cookie_sync can parse the different formats
  "expbase": "2021-08-14T21:10:50.503Z", // expires base timestamp
  "uids": { // get rid of the tempUIDs field. 
    "BIDDER": {
      "uid": "AQEIu9KdNAyV4wEISYkJAQEBAQE",
      "exp": "22" // days from expbase
    },
  },
  // get rid of the 'bday' field.
}

The text was updated successfully, but these errors were encountered:

bretg · 2021-09-03T20:57:11Z

Ran an experiment with an online tool to see whether the uids cookie would be helped by protobuf. In short, I found we're be better off with compressed JSON. Here's what I did.

Baseline JSON

The contents of the JSON uids file with 20 IDs:

    version: 2,
    expirationBase: 1630698267,
    uids: [
        {
            bidder: "osirjgosj",
            uid: "3487rn8wu4niuesrgiuhreuhsiuerw9uersuj",
            expirationOffset: 5
        },
        {
            bidder: "oiuiuosr",
            uid: "984579wmeg huibfsouisrhiuosrgnosriunsbru",
            expirationOffset: 8
        },
        {
            bidder: "oisoisvn",
            uid: "5878 w9urhuhw9urs ehusierhisudr9suer9ms8e5u9ser8u",
            expirationOffset: 8
        },
        {
            bidder: "jhrubuyr",
            uid: "9w4usireuiosroirusjsijroserijuhiuhuih8943eousxrf",
            expirationOffset: 8
        },
        {
            bidder: "nvxusruy",
            uid: "98w4mt9wm49chwe8orghms8erhgiosuhthidstohjdotho",
            expirationOffset: 8
        },
        {
            bidder: "ajiuneuwy",
            uid: "w8754mtc8whmg8uhsermg8uhwsr8gh8merhgoisduhtgosiuhrgoisuherg",
            expirationOffset: 8
        },
        {
            bidder: "ueusyeb",
            uid: "q23uyhfoawyuehrg8myseh8rghwe85mctg8tsehrgm89chse",
            expirationOffset: 8
        },
        {
            bidder: "hgsdguaifu",
            uid: "19u9473789a87sr8ymrc8uyrt8gdhiudth9se74984wu9s8euodisrdiohugiu",
            expirationOffset: 8
        },
        {
            bidder: "jhsdhjggr",
            uid: "287w48w73ymt8s8drhisudhrf9s9ser589e89hr9fudo089re9e84uoisjosijopwkpk",
            expirationOffset: 8
        },
        {
            bidder: "sisdishi",
            uid: "68eha84a8a9w4373f99393w89w734yw8468w64m8 syg48yesgm8 y4e8yseisyhf",
            expirationOffset: 8
        },
        {
            bidder: "lskjdhsifu",
            uid: "q873847q6mw4h9m8 h4e97c4my9chmy48se7cy48c9s7ey49s7ce4y9se4759ses9",
            expirationOffset: 8
        },
        {
            bidder: "sttysufuhsf",
            uid: "858958958989e976896e8868968989896e89yt9utw9u9ustuisgihg",
            expirationOffset: 8
        },
        {
            bidder: "yshdfbvishr",
            uid: "7857w9579h97gw97w9763673675357452457427524756426752uywriyfwhwhgui4e",
            expirationOffset: 8
        },
        {
            bidder: "vaieirausv",
            uid: "7547w56769869w4736667569000y9486565w3759090y687y636535646ujgi576twge",
            expirationOffset: 8
        },
        {
            bidder: "cusinuesinu",
            uid: "64w86q4597y68997wq486e5086e79w68479e56898w7579wru95978w580w5s79",
            expirationOffset: 8
        },
        {
            bidder: "xargsr",
            uid: "387s8745s8764t795es97s5708ws0845s68w4685w508ws97w97s5498se5987shghj",
            expirationOffset: 8
        },
        {
            bidder: "ssgdsrgs",
            uid: "0s0roaisrej9sawrga00s8es47876sw674756aq57s7e4iusui9su89u8serh",
            expirationOffset: 8
        },
        {
            bidder: "zsrfgsrg",
            uid: "78sdsuhgvuyvsotirjhsporiyaburiseung3ibaieyraiyureiaireauhiiuyiuaiuyrsibyurh",
            expirationOffset: 8
        },
        {
            bidder: "qsfrgs",
            uid: "8s9rusvjrsiuniurnvs7s78erv8srvn8shrvsiyhrv8ysrvshifhbsiruhigru9eiru",
            expirationOffset: 8
        },
        {
            bidder: "fshyfjy",
            uid: "98ae98ay4m9a8cm948ym94y9ah4eauyeh8aw7y38a73yw63476nt3486s4cgm87sc4my84m7",
            expirationOffset: 8
        }
    ]

This is 1441 bytes gzipped and base64 encoded. Note that I took care to minimize repeated strings so as not to favor gzip. When I used the same uid or bidder values, gzip did much better. :-)

Protobuf

Came up with this protobuf definition:

syntax = "proto3";
message uidCollection {
  message uidObject {
    string bidder = 1;
    string uid = 2;
    uint32 expirationOffset = 3;
  }
  uint32 version = 1;
  uint64 expirationBase = 2;
  repeated uidObject uids = 3;
}

Used the tool at https://www.sisik.eu/proto to create a binary representation of the JSON above and it came out to be 1522 bytes.

bretg · 2021-09-17T20:28:44Z

Discussed in PBS committee. @SyntaxNode has a hypothesis that protobuf would have the better CPU-performance profile, so even though the JSON approach saves a few bytes, he's planning to run an experiment to get some data.

SyntaxNode · 2021-09-30T20:20:35Z

Used the tool at https://www.sisik.eu/proto to create a binary representation of the JSON above and it came out to be 1522 bytes.

I initially thought the 1522 bytes for Protobuf looked great against the 1441 bytes for JSON+GZIP, but I realized in my testing the Protobuf result is not Base64 encoded. When we add that in we get a less exciting 2028 bytes.

Using the same Baseline JSON model @bretg shared in his earlier comment, I've performed a comparison with several different formats. The first, JSON+Base64 is basically what we do today as a point of reference (but using the relative timestamps).

Write / Encoding

Format	Size	Write Speed	Write Memory Size	Write Memory Allocs
JSON+Base64	3032 bytes	7,372 ns	8,492 bytes	4 allocations
JSON+GZIP+Base64	1340 bytes	153,944 ns	823,966 bytes	28 allocations
Protobuf+Base64	2028 bytes	4,117 ns	5,632 bytes	3 allocations
Protobuf+GZIP+Base64	1372 bytes	149,189 ns	821,795 bytes	25 allocations
Protobuf+Brotli-Level-0+Base64	1456 bytes	37,251 ns	38,890 bytes	9 allocations
Protobuf+Brotli-Level-6+Base64	1380 bytes	308,450 ns	2,177,637 bytes	21 allocations
Protobuf+Brotli-Level-11+Base64	1296 bytes	5,292,766 ns	34,635,047 byes	55 allocations
Protobuf+LZ4+Base64	2012 bytes	52,670 ns	534,785 bytes	6 allocations

Read / Decoding

Format	Read Speed	Read Memory Size	Read Memory Allocs
JSON+Base64	30,641 ns	10,704 bytes	62 allocations
JSON+GZIP+Base64	56,016 ns	56,144 bytes	80 allocations
Protobuf+Base64	7,080 ns	7,336 bytes	69 allocations
Protobuf+Brotli-Level-0+Base64	39,060 ns	75,360 bytes	87 allocations

Protobuf+Base64 is the most efficient option and while the output is 33% smaller than what we use today that compares unfavorably to the other formats. If we are optimizing for both speed and size equally, this is the best option. However, if we want to optimize more for size and a little less for speed, I propose the Protobuf+Brotli-Level-0+Base64 format which is just ~8% larger than GZIP while 400%+ faster in my write benchmarks and 30% faster in my read benchmarks.

I experimented with using GZIP and Brotli for just the UIDs, but the overhead of both compression algorithms actually increased the final size. Similarly, I tested Protobuf+Snappy+Base64 but the Snappy compression increased the size a bit from just Protobuf+Base64.

Are there any other compression libraries you'd like to see me add to this benchmark comparison? Any suggestions must have Go and Java libraries.

SyntaxNode · 2021-09-30T20:36:11Z

Versioning

We need to represent the version outside of the encoded payload to determine which decoding approach we need to use. This approach needs to be backwards compatible with the current JSON+Base64 encoded format. We'll need to use a separate character not present in the Base64 URL character set, of which I think period "." is a good choice. This is the same choice made by JWT tokens and TCF2 consent strings.

Example:

2.i_cCAICqqqrq_1SRGhY3PdnRLwVQpwJLN11EoHKBhMhDQiVARRVU1qGOtq....

We should be fine for a long time with just using 1 character for the version followed by 1 character for the separator, but in the future we could extend the number of characters proceeding the separator if need be. The algorithm for version detection would be:

Is there a separator character present? If not, classify as version 1. This check can be optimized for validating if only the second character is the separator until we need to represent a higher version.
If the version is anything other than 2, then it's invalid.
If the version is 2 and there is a valid separator character, parse as this new format.

josepowera · 2021-10-06T11:50:42Z

Protobuf is great - but first cookie values "uid" should be split based on datatype... Since protobuf real record size is based on datatype - longs would be way smaller than same data written as string. Also bidder string name could be moved to some table with uint32 ID. If needed adapters would have to declare using string/long field data (it could be also done automatically based on content).

syntax = "proto3";
message uidCollection {
  message uidObject {
    string bidder = 1;
    string uid_string = 2;   //use uid_string or uid_long but not both
    uint64 uid_long = 3;
    uint32 expirationOffset = 4;
  }
  uint32 version = 1;
  uint64 expirationBase = 2;
  repeated uidObject uids = 3;
}

SyntaxNode · 2021-10-06T15:36:39Z

first cookie values "uid" should be split based on datatype

That's a good observation. There are several possible data types we could detect and optimize. You mentioned long in your example, there are also uuid, base64 encoded bytes, and hex strings. My hope is the compression layer on top of the protobuf binary encoding will solve for these storage inefficiencies without needing to add structure complexity. Let's test it. I'll use the protobuf structure you provided.

I replaced 25% of the entries in the Baseline JSON example with long values. This seems to be a slightly generous distribution based on the real world examples. The runtime complexity of both are close enough that I'll only list the sizes.

Format	Size
ProtbufString+Base64	1772 bytes
ProtbufString+Brotli0+Base64	1304 bytes
ProtbufOptimized+Base64	1700 bytes
ProtbufOptimized+Brotli0+Base64	1356 bytes

Without compression there is a 4% size reduction. With compression, there is a 4% size increase. It seems the compression algorithm fares a bit better with string numerics than with binary encoded numerics. This might not hold true for other potential optimized types though.

Also bidder string name could be moved to some table with uint32 ID

Yes, I agree that would provide a size savings. I'm going to test it with the following protobuf definition:

syntax = "proto3";
message uidCollection {
  message uidObject {
    uint32 bidder = 1;
    string uid = 2;
    uint32 expirationOffset = 3;
  }
  uint64 expirationBase = 1;
  repeated uidObject uids = 2;
}

Format	Size
Protobuf+StringBidders+Base64	2028 bytes
Protobuf+StringBidders+Brotli0+Base64	1456 bytes
Protobuf+IntBidders+Base64	1800 bytes
Protobuf+IntBidders+Brotli0+Base64	1320 bytes

Without compression there is a 11% size reduction. With compression, there is a 9% size reduction. I'd like opinions if this is worth the complexity of maintaining a list of bidder ids. This cannot be as simple as an alphabetical index since we need to account for added bidders, removed bidders, and different bidders between PBS-Go, PBS-Java, and forks.

bretg · 2021-10-15T19:26:14Z

Discussed in PBS committee. We're leaning towards the 'Protobuf+StringBidders+Brotli0-Base64' solution, with external version number.

bretg · 2021-11-15T22:38:13Z

Java results are at https://github.com/snahornyi/uids-java-tests

The team suggests using 'Protobuf+StringBidders+Brotli0**+**Base64' -- i.e. brotli and base64.

@SyntaxNode - please confirm that your table above meant to include Base64... i.e. was the minus sign a typo?

SyntaxNode · 2022-01-07T15:26:45Z

@SyntaxNode - please confirm that your table above meant to include Base64... i.e. was the minus sign a typo?

Confirmed. That is a typo. I'll fix it in my other comment.

bretg · 2022-01-14T15:52:01Z

There's a detail here I don't think we've ironed out: how the UID itself is represented.

I don't like the idea of having to configure for each adapter the datatype of their ID. Looking at the list of IDs in my current cookie, the pattern I see is that most are strings, and the ones that are ints are generally shorter (~20chars rather than ~40).

I suppose we could manage this in the /setuid code that creates the values:

scan the uid value
if it's all decimals and would fit within uint64, then place the value in uid_long
else place it in uid_string

It's easy enough on the read side to deal with this, but we'd have to agree to prefer one over the other in case somehow both of them get set.

SyntaxNode · 2022-01-14T15:53:49Z

@bretg Please review the conversation in this issue. The data type specific UID storage was discussed, explored, and ultimately rejected. We will be storing all UID values as strings.

SyntaxNode · 2022-01-21T16:09:59Z

This is the updated protobuf definition that I am proposing. I've followed the best practices from the protobuf style guide.

cookie2.proto

// Definition of Prebid Server's version 2 user sync cookie value encoding.

syntax = "proto3";

import "google/protobuf/timestamp.proto";

option go_package = "github.com/prebid/prebid-server/usersync";

message Cookie2 {
  google.protobuf.Timestamp expiration_base = 1;
  message UID {
    string bidder = 1;
    string value = 2;
    uint32 expiration_offset_days = 3;
  }
  repeated UID uids = 2;
}

I've added the go package option required for the code generator. Java would need to add their own Java options as well. Alternatively, we could provide the options directly to the code generator, but the best practices from Google on the matter are to include them in the file. I'm ok with making it as easy as possible for us to generate code for use in both PBS implementations.

This produces the following generated code structs for Go:

type Cookie2 struct {
  ExpirationBase *timestamppb.Timestamp `protobuf:"bytes,1,opt,name=expiration_base,json=expirationBase,proto3" json:"expiration_base,omitempty"`
  Uids           []*Cookie2_UID         `protobuf:"bytes,2,rep,name=uids,proto3" json:"uids,omitempty"`
}

type Cookie2_UID struct {
  Bidder               string `protobuf:"bytes,1,opt,name=bidder,proto3" json:"bidder,omitempty"`
  Value                string `protobuf:"bytes,2,opt,name=value,proto3" json:"value,omitempty"`
  ExpirationOffsetDays uint32 `protobuf:"varint,3,opt,name=expiration_offset_days,json=expirationOffsetDays,proto3" json:"expiration_offset_days,omitempty"`
}

bretg · 2022-01-21T21:10:10Z

Thanks @SyntaxNode - just to make sure we're on the same page on the /cookie_sync endpoint expectations... when it receives a cookie, it's going to have to first see if it's base64 encoded JSON. If not, then it passes the body through the brotli and protobuf decoders, right?

SyntaxNode · 2022-01-21T21:46:36Z

That's not what I had in mind. I proposed using a version prefix to make it easier / quicker for us to determine the correct code paths to use for decoding. In short, if it starts with "2." then remove those characters and decode the rest as a base64 encoded brotli compressed protobuf message. Else, consider it version 1 and error if there is a decoding problem.

I think the progression of ideas in this issue thread has muddled the intended proposal. I'll create a separate Google doc with the full proposed specs on Monday.

linux019 · 2025-01-24T15:22:03Z

I have a proposal to move cookies to an external cache like Redis. The problem with base64-encoded cookies is that they can exceed the maximum allowed header size of 4-8K (default in nginx), and nginx will drop the request. The nginx buffer size has been increased, but as far as I understand, this is a temporary solution because if a user wants to sync with 30 bidders, the cookie size will become very large.

bretg · 2025-01-24T16:40:33Z

Interesting idea @linux019 - so you're proposing bidder IDs are stored in PBC and PBS to set that cache ID in its cookie so that it can be used to look up in PBC?

That could work, but would be quite costly: PBC volume would go way up and latency on the auction as well as it waited for the cache results.

I would not want the /cookie_sync endpoint to attempt a write to PBC unless it knew that 3PC were supported. That would mean either a handshake or examining the UA details.

linux019 · 2025-01-27T13:37:14Z

This is from the production log. nginx was dropping these requests. It was fixed by increase proxy_buffer_size up 16k but this value is very large as for header size so it's a temporal workaround until cookie size exceed max allowed buffer size again

[error] : * upstream sent too big header while reading response header from upstream, client: 67.55.***, server: ***, request: "GET /setuid?bidder=minutemedia&gdpr=&gdpr_consent=&gpp=&gpp_sid=&f=b&uid=***-Cp_mm HTTP/2.0", upstream: "http://***.***.***:8000/setuid?bidder=minutemedia&gdpr=&gdpr_consent=&gpp=&gpp_sid=&f=b&uid=***-Cp_mm"

If cookie uids= isn't parsed by prebid.js it can be stored in the distributed key-value DB

Slind14 · 2025-01-27T17:08:44Z

Java results are at https://github.com/snahornyi/uids-java-tests

The team suggests using 'Protobuf+StringBidders+Brotli0**+**Base64' -- i.e. brotli and base64.

@SyntaxNode - please confirm that your table above meant to include Base64... i.e. was the minus sign a typo?

A lot of time has passed. It likely makes sense to use zstd over brotli nowadays (+40% faster at same compression ratio).

bretg · 2025-01-27T17:37:13Z

stored in the distributed key-value DB

This is an interesting idea, but a very large change that will incur significant extra cost and headache from a PBS host company for unknown benefit.

I'll open a separate issue to talk through the details as a community, but honestly I would not expect the core team to consider this very high priority. You should assume that it will something the community will need to contribute.

likely makes sense to use zstd over brotli nowadays

My take is that I don't think this protobuf thing is going to happen. It's too much work for the added benefit. I propose we close this particular issue as "won't do". If we're going to address the too-many-bidder-ids problem, I think #4080 is the main solution. This distributed key-value DB seems like a reasonable alternate solution.

bretg changed the title ~~Revise PBS uids cookie~~ Shrink the size of the PBS uids cookie Sep 3, 2021

bretg changed the title ~~Shrink the size of the PBS uids cookie~~ Shrink the PBS uids cookie Sep 3, 2021

bretg mentioned this issue Dec 13, 2021

Passing userIds to Prebid Server using AMP #1404

Open

SyntaxNode added the Intent to implement An issue describing a plan for a major feature. These are intended for community feedback label Jan 10, 2022

bretg mentioned this issue Mar 4, 2022

Refining the cookie-sync process #2173

Closed

bretg added the projectboard label Sep 8, 2022

prebid-server-prioritization bot added this to Prebid Server Prioritization Sep 8, 2022

prebid-server-prioritization bot moved this to Triage in Prebid Server Prioritization Sep 8, 2022

bretg removed the projectboard label Sep 8, 2022

bretg moved this from Triage to Needs Requirements in Prebid Server Prioritization Oct 7, 2022

bretg mentioned this issue Apr 18, 2023

Investigate shaving the uids cookie #2706

Closed

bretg added the match rate label May 16, 2023

bretg removed the Intent to implement An issue describing a plan for a major feature. These are intended for community feedback label Dec 1, 2023

bretg mentioned this issue Nov 27, 2024

Support multiple uids cookies #4080

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shrink the PBS uids cookie #1985

Shrink the PBS uids cookie #1985

bretg commented Sep 3, 2021 •

edited

Loading

bretg commented Sep 3, 2021

bretg commented Sep 17, 2021

SyntaxNode commented Sep 30, 2021 •

edited

Loading

SyntaxNode commented Sep 30, 2021 •

edited

Loading

josepowera commented Oct 6, 2021

SyntaxNode commented Oct 6, 2021 •

edited

Loading

bretg commented Oct 15, 2021

bretg commented Nov 15, 2021

SyntaxNode commented Jan 7, 2022

bretg commented Jan 14, 2022

SyntaxNode commented Jan 14, 2022

SyntaxNode commented Jan 21, 2022 •

edited

Loading

bretg commented Jan 21, 2022

SyntaxNode commented Jan 21, 2022

linux019 commented Jan 24, 2025 •

edited

Loading

bretg commented Jan 24, 2025

linux019 commented Jan 27, 2025

Slind14 commented Jan 27, 2025 •

edited

Loading

bretg commented Jan 27, 2025

Shrink the PBS uids cookie #1985

Shrink the PBS uids cookie #1985

Comments

bretg commented Sep 3, 2021 • edited Loading

Current Structure

Background on the current structure

Possible new structure

bretg commented Sep 3, 2021

Baseline JSON

Protobuf

bretg commented Sep 17, 2021

SyntaxNode commented Sep 30, 2021 • edited Loading

Write / Encoding

Read / Decoding

SyntaxNode commented Sep 30, 2021 • edited Loading

Versioning

josepowera commented Oct 6, 2021

SyntaxNode commented Oct 6, 2021 • edited Loading

bretg commented Oct 15, 2021

bretg commented Nov 15, 2021

SyntaxNode commented Jan 7, 2022

bretg commented Jan 14, 2022

SyntaxNode commented Jan 14, 2022

SyntaxNode commented Jan 21, 2022 • edited Loading

bretg commented Jan 21, 2022

SyntaxNode commented Jan 21, 2022

linux019 commented Jan 24, 2025 • edited Loading

bretg commented Jan 24, 2025

linux019 commented Jan 27, 2025

Slind14 commented Jan 27, 2025 • edited Loading

bretg commented Jan 27, 2025

bretg commented Sep 3, 2021 •

edited

Loading

SyntaxNode commented Sep 30, 2021 •

edited

Loading

SyntaxNode commented Sep 30, 2021 •

edited

Loading

SyntaxNode commented Oct 6, 2021 •

edited

Loading

SyntaxNode commented Jan 21, 2022 •

edited

Loading

linux019 commented Jan 24, 2025 •

edited

Loading

Slind14 commented Jan 27, 2025 •

edited

Loading