CSCE 482 ScriptSearch 🔍

For transcript-api, all the packages and libraries required can be found in transcript_api/requirements.txt. This code is meant to run on a google cloud function but can be emulated on a local machine via the following method. First create a python virtual environment within the transcript api folder using the following command:

python3 -m venv venv

Then activate the virtual environment with the following command:

source ./venv/bin/activate

Then install all the dependencies within the virtual environment by running the following command:

./venv/bin/python3 -m pip install -r requirements.txt

Then start a local functions framework server by running the following command:

functions-framework --target=transcript_api --port 8080

Then to interact with the backend use several curl commands:

curl -X POST -H "Content-Type: application/json" -d "{"url": "https://www.youtube.com/watch?v=jNQXAC9IVRw"}" http://localhost:8080
curl -X POST -H "Content-Type: application/json" -d "{"query": "dynamic programming"}" http://localhost:8080
curl -X POST -H "Content-Type: application/json" -d "{"query": "dynamic programming", "channel_id": "UC_mYaQAE6-71rjSN6CeCA-g"}" http://localhost:8080
curl -X POST -H "Content-Type: application/json" -d "{"query": "dynamic programming", "video_ids": '["0baCL9wwJTA","RRP_GFSGrDA","gOFrQs_13mQ","dXBapNjnab4","StSGz5dx52s"]'}" http://localhost:8080

These curls emulate the four possible interaction states: populating the database, searching with no filter, searching with a channel filter, and searching with a playlist filter. Feel free to change the various arguments to search a filter differently.

Within the functions folder there are 4 core functions integral to the scraping pipeline. This code is also meant to run in a cloud environment in a google cloud function, but should be able to be spun up rather simply. The end user needs to navigate to the function directory they're interested in and run cmd/main.go, which will spin up a server instance. Then, they'll be able to make a cURL request to the localhost:8080 url using some payload.

It's important to note that all of the functions are event-driven by pub/sub, which means that the user must base64 encode the data. This may be cumbersome to do within the command line, and may be easier to write a script for it such as below:

import requests
import json
import base64

# The URL to send the POST request to
url = "http://localhost:8080"

# The headers for the POST request
headers = {
    "Content-Type": "application/cloudevents+json"
}

# The data you want to send, originally as an array of objects
data_array = [
    "KBZlN0izeiY"
]

# Convert the data array to a JSON string
json_string = json.dumps(data_array)

# Base64 encode the JSON string
encoded_data = base64.b64encode(json_string.encode()).decode()

# Create the payload with the encoded data
payload = {
    "specversion": "1.0",
    "type": "example.com.cloud.event",
    "source": "https://example.com/cloudevents/pull",
    "subject": "123",
    "id": "A234-1234-1234",
    "time": "2018-04-05T17:31:00Z",
    "data": {
        "Message": {
            "Data": encoded_data
        }
    }
}

# Sending the POST request
response = requests.post(url, headers=headers, data=json.dumps(payload))

# Printing the response from the server
print(response.status_code)
print(response.text)

Name		Name	Last commit message	Last commit date
Latest commit History 430 Commits
.github/workflows		.github/workflows
functions		functions
transcript_api		transcript_api
.gcloudignore		.gcloudignore
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSCE 482 ScriptSearch 🔍

About

Releases 5

Contributors 5

Languages

License

Script-Search/backend

Folders and files

Latest commit

History

Repository files navigation

CSCE 482 ScriptSearch 🔍

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Contributors 5

Languages