This sample application is used to demonstrate how to improve Product Search with Learning to Rank (LTR).
Blog post series:
- Improving Product Search with Learning to Rank - part one introduces the dataset used in this sample application and several baseline ranking models.
- Improving Product Search with Learning to Rank - part two
demonstrates how to train neural methods for search ranking. The neural training routine is found in
Learning to rank with Transformer models
.
- Improving Product Search with Learning to Rank - part three shows how to train GBDT methods for search ranking. The model uses also neural signals as features. See notebooks:
This work uses the largest product relevance dataset released by Amazon:
We introduce the “Shopping Queries Data Set”, a large dataset of difficult search queries, released with the aim of fostering research in the area of semantic matching of queries and products. For each query, the dataset provides a list of up to 40 potentially relevant results, together with ESCI relevance judgements (Exact, Substitute, Complement, Irrelevant) indicating the relevance of the product to the query. Each query-product pair is accompanied by additional information. The dataset is multilingual, as it contains queries in English, Japanese, and Spanish.
The dataset is found at amazon-science/esci-data. The dataset is released under the Apache 2.0 license.
The following is a quick start recipe on how to get started with this application.
- Docker Desktop installed and running. 6 GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
- Alternatively, deploy using Vespa Cloud
- Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
- Architecture: x86_64 or arm64
- Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.
- zstd:
brew install zstd
- Either, python3 with
pyvespa
pyarrow
andpandas
installed, or uv
Validate Docker resource settings, should be minimum 6 GB:
$ docker info | grep "Total Memory" or $ podman info | grep "memTotal"
Install Vespa CLI:
$ brew install vespa-cli
For local deployment using docker image:
$ vespa config set target local
Pull and start the vespa docker container image:
$ docker pull vespaengine/vespa $ docker run --detach --name vespa --hostname vespa-container \ --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \ vespaengine/vespa
Verify that configuration service (deploy api) is ready:
$ vespa status deploy --wait 300
Download this sample application:
$ vespa clone commerce-product-ranking my-app && cd my-app
Download cross-encoder model:
$ curl -L -o application/models/title_ranker.onnx \ https://data.vespa-cloud.com/sample-apps-data/title_ranker.onnx
See scripts/export-bi-encoder.py and scripts/export-cross-encoder.py for how to export models from PyTorch to ONNX format.
Deploy the application:
$ vespa deploy --wait 600 application
If the above fails, check the logs:
$ docker logs vespa
It is possible to deploy this app to Vespa Cloud.
This step is optional, but it indexes two documents and runs a query test
$ (cd application; vespa test tests/system-test/feed-and-search-test.json)
Download the pre-processed sample product data for 16 products:
$ zstdcat sample-data/sample-products.jsonl.zstd | vespa feed -
Evaluate the semantic-title
rank profile using the evaluation
script (scripts/evaluate.py).
Install requirements
$ pip3 install pandas pyarrow pyvespa>=0.53.0
With the dependencies installed, we can evaluate the ranking model using the evaluation script:
$ python3 scripts/evaluate.py --endpoint http://localhost \ --example_file sample-data/test-sample.parquet \ --ranking semantic-title \ --qrel_file https://data.vespa-cloud.com/sample-apps-data/test.qrels > results.txt
evaluate.py runs all the queries in the test split using the --ranking
<rank-profile>
and prints the NDCG score (and search time statistics).
Note that the evaluation script uses custom NDCG label gains:
- Label 1 is Irrelevant with 0 gain
- Label 2 is Supplement with 0.01 gain
- Label 3 is Complement with 0.1 gain
- Label 4 is Exact with 1 gain
$ cat results.txt
Example ranking produced by Vespa using the semantic-title
rank-profile for query 535:
B08PB9TTKT 1 0.4638 B00B4PJC9K 2 0.4314 B0051GN8JI 3 0.4199 B084TV3C1B 4 0.4177 B08NVQ8MZX 5 0.4175 B00DHUA9VA 6 0.4155 B08SHMLP5S 7 0.4151 B08VSJGP1N 8 0.4147 B08QGZMCYQ 9 0.4110 B0007KPRIS 10 0.4073 B08VJ66CNL 11 0.4040 B000J1HDWI 12 0.4035 B0007KPS3C 13 0.3977 B0072LFB68 14 0.3933 B01M0SFMIH 15 0.3920 B0742BZXC2 16 0.3778
This particular product ranking for the query produces a NDCG score of 0.7046.
Note that the sample-data/test-sample.parquet
file only contains one query.
To get the overall score, one must compute all the NDCG scores of all queries in the
test split and report the average NDCG score.
We can also try another ranking model:
$ python3 scripts/evaluate.py \ --endpoint http://localhost \ --example_file sample-data/test-sample.parquet \ --ranking cross-title \ --qrel_file https://data.vespa-cloud.com/sample-apps-data/test.qrels
Which for this query produces a NDCG score of 0.8208, better than the semantic-title model.
$ docker rm -f vespa
Download a pre-processed feed file with all (1,215,854) products:
$ curl -L -o product-search-products.jsonl.zstd \ https://data.vespa-cloud.com/sample-apps-data/product-search-products.jsonl.zstd
This step is resource intensive as the semantic embedding model encodes the product title and description into the dense embedding vector space.
$ zstdcat product-search-products.jsonl.zstd | vespa feed -
Evaluate the hybrid
baseline rank profile using the evaluation
script (scripts/evaluate.py).
$ python3 scripts/evaluate.py \ --endpoint http://localhost \ --example_file "https://github.com/amazon-science/esci-data/blob/main/shopping_queries_dataset/shopping_queries_dataset_examples.parquet?raw=true" \ --ranking semantic-title --qrel_file https://data.vespa-cloud.com/sample-apps-data/test.qrels
For Vespa cloud deployments we need to pass certificate and the private key.
$ python3 scripts/evaluate.py \ --endpoint https://productsearch.samples.aws-us-east-1c.perf.z.vespa-app.cloud \ --example_file "https://github.com/amazon-science/esci-data/blob/main/shopping_queries_dataset/shopping_queries_dataset_examples.parquet?raw=true" \ --ranking semantic-title \ --cert <path-to-data-plane-cert.pem> \ --key <path-to-data-plane-private-key.pem> --qrel_file https://data.vespa-cloud.com/sample-apps-data/test.qrels