-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add frontend into docker-compose * RAG_VERSION now defaults to git commit hash * add CHANGELOG ci * add more spaces to avoid merge conflict * make changelog manual_dispatch, fix docker CI --------- Signed-off-by: Jack Luar <[email protected]>
- Loading branch information
Showing
9 changed files
with
141 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
if __name__ == "__main__": | ||
with open("../../CHANGELOG.md") as f: | ||
temp = f.readlines() | ||
|
||
# divide the file into four categories | ||
commit, name, date, msg = [], [], [], [] | ||
|
||
# regex is <commit>\t<name>\t<date>\t<msg> | ||
for line in temp: | ||
line = line.split('\t') | ||
commit.append(line[0]) | ||
name.append(line[1]) | ||
date.append(line[2]) | ||
msg.append(line[3]) | ||
|
||
# first detect the number of unique year-month combo | ||
date_year_month = [x[:7] for x in date] | ||
unique = sorted(list(set(date_year_month)), reverse=True) | ||
|
||
# based on this write from the reverse order | ||
final_lines = [] | ||
|
||
for year_month in unique: | ||
# first write the header | ||
final_lines.append(f"# {year_month}\n\n") | ||
|
||
# loop through and stop when the year_month is lesser | ||
for idx in range(len(date)): | ||
if date[idx][:7] == year_month: | ||
l = f"- {commit[idx]} {name[idx]} {date[idx]} {msg[idx]}" | ||
final_lines.append(l) | ||
final_lines.append("\n") | ||
|
||
with open('../../CHANGELOG.md', 'w') as f: | ||
for l in final_lines: | ||
f.write(l) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
name: Update Changelog | ||
|
||
on: | ||
workflow_dispatch: | ||
|
||
jobs: | ||
updateChangeLog: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Check out repository code | ||
uses: actions/checkout@v4 | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: '3.12' | ||
|
||
- name: Create Changelog file | ||
run: | | ||
make changelog | ||
- name: Create Pull Request | ||
id: cpr | ||
uses: peter-evans/create-pull-request@v6 | ||
with: | ||
token: ${{ secrets.GH_PAT }} | ||
commit-message: Update report | ||
committer: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> | ||
author: ${{ github.actor }} <${{ github.actor_id }}+${{ github.actor }}@users.noreply.github.com> | ||
signoff: true | ||
base: master | ||
branch: update-chglog | ||
delete-branch: true | ||
title: Changelog Update | ||
body: | | ||
- Auto-generated by [create-pull-request][1] | ||
[1]: https://github.com/peter-evans/create-pull-request | ||
labels: | | ||
docs | ||
assignees: luarss | ||
reviewers: luarss |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
FROM python:3.12.3-slim | ||
|
||
WORKDIR /ORAssistant-frontend | ||
|
||
COPY ./requirements.txt /ORAssistant-frontend/requirements.txt | ||
COPY ./requirements-test.txt /ORAssistant-frontend/requirements-test.txt | ||
COPY ./pyproject.toml /ORAssistant-frontend/pyproject.toml | ||
|
||
RUN pip install --upgrade pip && \ | ||
pip install --no-cache-dir -r requirements.txt && \ | ||
pip install --no-cache-dir -r requirements-test.txt && \ | ||
pip install --no-cache-dir -e . | ||
|
||
COPY streamlit_app.py . | ||
COPY ./utils ./utils | ||
COPY ./assets ./assets | ||
|
||
CMD ["streamlit", "run", "streamlit_app.py"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7090704
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
===================================
==> Dataset: EDA Corpus
==> Running tests for agent-retriever
/home/luarss/actions-runner/_work/ORAssistant/ORAssistant/evaluation/.venv/lib/python3.12/site-packages/deepeval/init.py:49: UserWarning: You are using deepeval version 1.4.9, however version 1.5.0 is available. You should consider upgrading via the "pip install --upgrade deepeval" command.
warnings.warn(
Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s]
Fetching 2 files: 50%|█████ | 1/2 [00:00<00:00, 4.28it/s]
Fetching 2 files: 100%|██████████| 2/2 [00:00<00:00, 8.56it/s]
Evaluating: 0%| | 0/100 [00:00<?, ?it/s]
Evaluating: 1%| | 1/100 [00:13<22:54, 13.89s/it]
Evaluating: 2%|▏ | 2/100 [00:25<20:14, 12.39s/it]
Evaluating: 3%|▎ | 3/100 [00:37<20:17, 12.55s/it]
Evaluating: 4%|▍ | 4/100 [00:48<18:36, 11.63s/it]
Evaluating: 5%|▌ | 5/100 [00:58<17:41, 11.17s/it]
Evaluating: 6%|▌ | 6/100 [01:09<17:08, 10.94s/it]
Evaluating: 7%|▋ | 7/100 [01:20<17:26, 11.26s/it]
Evaluating: 8%|▊ | 8/100 [01:32<17:30, 11.42s/it]
Evaluating: 9%|▉ | 9/100 [01:44<17:18, 11.41s/it]
Evaluating: 10%|█ | 10/100 [01:55<17:01, 11.35s/it]
Evaluating: 11%|█ | 11/100 [02:05<16:24, 11.06s/it]
Evaluating: 12%|█▏ | 12/100 [02:15<15:32, 10.60s/it]
Evaluating: 13%|█▎ | 13/100 [02:27<15:56, 11.00s/it]
Evaluating: 14%|█▍ | 14/100 [02:39<16:08, 11.26s/it]
Evaluating: 15%|█▌ | 15/100 [02:49<15:44, 11.11s/it]
Evaluating: 16%|█▌ | 16/100 [03:02<16:02, 11.46s/it]
Evaluating: 17%|█▋ | 17/100 [03:14<16:22, 11.83s/it]
Evaluating: 18%|█▊ | 18/100 [03:27<16:21, 11.97s/it]
Evaluating: 19%|█▉ | 19/100 [03:37<15:24, 11.41s/it]
Evaluating: 20%|██ | 20/100 [03:48<15:07, 11.34s/it]
Evaluating: 21%|██ | 21/100 [03:59<14:59, 11.38s/it]
Evaluating: 22%|██▏ | 22/100 [04:11<15:00, 11.55s/it]
Evaluating: 23%|██▎ | 23/100 [04:23<14:42, 11.47s/it]
Evaluating: 24%|██▍ | 24/100 [04:33<14:00, 11.06s/it]
Evaluating: 25%|██▌ | 25/100 [04:43<13:27, 10.77s/it]
Evaluating: 26%|██▌ | 26/100 [04:52<12:52, 10.44s/it]
Evaluating: 27%|██▋ | 27/100 [05:04<13:05, 10.76s/it]
Evaluating: 28%|██▊ | 28/100 [05:16<13:18, 11.09s/it]
Evaluating: 29%|██▉ | 29/100 [05:26<12:52, 10.89s/it]
Evaluating: 30%|███ | 30/100 [05:39<13:21, 11.45s/it]
Evaluating: 31%|███ | 31/100 [05:52<13:42, 11.92s/it]
Evaluating: 32%|███▏ | 32/100 [06:04<13:35, 12.00s/it]
Evaluating: 33%|███▎ | 33/100 [06:16<13:16, 11.89s/it]
Evaluating: 34%|███▍ | 34/100 [06:25<12:20, 11.22s/it]
Evaluating: 35%|███▌ | 35/100 [06:36<12:02, 11.12s/it]
Evaluating: 36%|███▌ | 36/100 [06:47<11:38, 10.91s/it]
Evaluating: 37%|███▋ | 37/100 [06:57<11:18, 10.76s/it]
Evaluating: 38%|███▊ | 38/100 [07:08<11:03, 10.71s/it]
Evaluating: 39%|███▉ | 39/100 [07:17<10:21, 10.19s/it]
Evaluating: 40%|████ | 40/100 [07:28<10:26, 10.44s/it]
Evaluating: 41%|████ | 41/100 [07:39<10:22, 10.55s/it]
Evaluating: 42%|████▏ | 42/100 [07:51<10:40, 11.05s/it]
Evaluating: 43%|████▎ | 43/100 [08:03<10:51, 11.44s/it]
Evaluating: 44%|████▍ | 44/100 [08:16<10:58, 11.75s/it]
Evaluating: 45%|████▌ | 45/100 [08:28<10:58, 11.98s/it]
Evaluating: 46%|████▌ | 46/100 [08:40<10:50, 12.04s/it]
Evaluating: 47%|████▋ | 47/100 [08:53<10:49, 12.25s/it]
Evaluating: 48%|████▊ | 48/100 [09:08<11:11, 12.92s/it]
Evaluating: 49%|████▉ | 49/100 [09:21<11:12, 13.18s/it]
Evaluating: 50%|█████ | 50/100 [09:36<11:14, 13.48s/it]
Evaluating: 51%|█████ | 51/100 [09:46<10:10, 12.46s/it]
Evaluating: 52%|█████▏ | 52/100 [09:58<09:53, 12.37s/it]
Evaluating: 53%|█████▎ | 53/100 [10:09<09:23, 11.99s/it]
Evaluating: 54%|█████▍ | 54/100 [10:20<09:06, 11.88s/it]
Evaluating: 55%|█████▌ | 55/100 [10:32<08:52, 11.84s/it]
Evaluating: 56%|█████▌ | 56/100 [10:43<08:21, 11.41s/it]
Evaluating: 57%|█████▋ | 57/100 [10:54<08:07, 11.33s/it]
Evaluating: 58%|█████▊ | 58/100 [11:05<07:50, 11.21s/it]
Evaluating: 59%|█████▉ | 59/100 [11:16<07:35, 11.12s/it]
Evaluating: 60%|██████ | 60/100 [11:27<07:28, 11.22s/it]
Evaluating: 61%|██████ | 61/100 [11:38<07:18, 11.23s/it]
Evaluating: 62%|██████▏ | 62/100 [11:49<07:02, 11.13s/it]
Evaluating: 63%|██████▎ | 63/100 [11:59<06:42, 10.87s/it]
Evaluating: 64%|██████▍ | 64/100 [12:11<06:33, 10.93s/it]
Evaluating: 65%|██████▌ | 65/100 [12:19<05:51, 10.04s/it]
Evaluating: 66%|██████▌ | 66/100 [12:29<05:44, 10.13s/it]
Evaluating: 67%|██████▋ | 67/100 [12:40<05:40, 10.32s/it]
Evaluating: 68%|██████▊ | 68/100 [12:50<05:32, 10.38s/it]
Evaluating: 69%|██████▉ | 69/100 [13:02<05:31, 10.69s/it]
Evaluating: 70%|███████ | 70/100 [13:12<05:19, 10.65s/it]
Evaluating: 71%|███████ | 71/100 [13:24<05:16, 10.91s/it]
Evaluating: 72%|███████▏ | 72/100 [13:35<05:08, 11.03s/it]
Evaluating: 73%|███████▎ | 73/100 [13:47<05:06, 11.36s/it]
Evaluating: 74%|███████▍ | 74/100 [13:59<04:58, 11.50s/it]
Evaluating: 75%|███████▌ | 75/100 [14:10<04:45, 11.44s/it]
Evaluating: 76%|███████▌ | 76/100 [14:22<04:37, 11.56s/it]
Evaluating: 77%|███████▋ | 77/100 [14:34<04:26, 11.58s/it]
Evaluating: 78%|███████▊ | 78/100 [14:45<04:15, 11.61s/it]
Evaluating: 79%|███████▉ | 79/100 [14:56<03:55, 11.20s/it]
Evaluating: 80%|████████ | 80/100 [15:06<03:38, 10.94s/it]
Evaluating: 81%|████████ | 81/100 [15:16<03:24, 10.79s/it]
Evaluating: 82%|████████▏ | 82/100 [15:27<03:12, 10.67s/it]
Evaluating: 83%|████████▎ | 83/100 [15:37<02:59, 10.56s/it]
Evaluating: 84%|████████▍ | 84/100 [15:47<02:48, 10.51s/it]
Evaluating: 85%|████████▌ | 85/100 [15:56<02:31, 10.07s/it]
Evaluating: 86%|████████▌ | 86/100 [16:09<02:31, 10.81s/it]
Evaluating: 87%|████████▋ | 87/100 [16:19<02:18, 10.65s/it]
Evaluating: 88%|████████▊ | 88/100 [16:31<02:12, 11.05s/it]
Evaluating: 89%|████████▉ | 89/100 [16:42<02:01, 11.02s/it]
Evaluating: 90%|█████████ | 90/100 [16:56<01:57, 11.78s/it]
Evaluating: 91%|█████████ | 91/100 [17:08<01:47, 11.95s/it]
Evaluating: 92%|█████████▏| 92/100 [17:21<01:37, 12.14s/it]
Evaluating: 93%|█████████▎| 93/100 [17:31<01:21, 11.66s/it]
Evaluating: 94%|█████████▍| 94/100 [17:42<01:07, 11.30s/it]
Evaluating: 95%|█████████▌| 95/100 [17:53<00:56, 11.22s/it]
Evaluating: 96%|█████████▌| 96/100 [18:04<00:44, 11.16s/it]
Evaluating: 97%|█████████▋| 97/100 [18:15<00:33, 11.24s/it]
Evaluating: 98%|█████████▊| 98/100 [18:26<00:22, 11.22s/it]
Evaluating: 99%|█████████▉| 99/100 [18:36<00:10, 10.84s/it]
Evaluating: 100%|██████████| 100/100 [18:49<00:00, 11.35s/it]
Evaluating: 100%|██████████| 100/100 [18:49<00:00, 11.29s/it]
✨ You're running DeepEval's latest Contextual Precision Metric! (using
gemini-1.5-pro-002, strict=False, async_mode=True)...
✨ You're running DeepEval's latest Contextual Recall Metric! (using
gemini-1.5-pro-002, strict=False, async_mode=True)...
✨ You're running DeepEval's latest Hallucination Metric! (using
gemini-1.5-pro-002, strict=False, async_mode=True)...
Evaluating 100 test case(s) in parallel: | | 0% (0/100) [Time Taken: 00:00, ?test case/s]
‼️ Friendly reminder 😇: You can also run evaluations with ALL of deepeval's
Evaluating 100 test case(s) in parallel: | | 1% (1/100) [Time Taken: 00:10, 10.35s/test case]
Evaluating 100 test case(s) in parallel: |▏ | 2% (2/100) [Time Taken: 00:10, 4.47s/test case]
Evaluating 100 test case(s) in parallel: |▎ | 3% (3/100) [Time Taken: 00:11, 2.93s/test case]
Evaluating 100 test case(s) in parallel: |▍ | 4% (4/100) [Time Taken: 00:11, 1.82s/test case]
Evaluating 100 test case(s) in parallel: |▌ | 5% (5/100) [Time Taken: 00:12, 1.47s/test case]
Evaluating 100 test case(s) in parallel: |▋ | 7% (7/100) [Time Taken: 00:13, 1.22test case/s]
Evaluating 100 test case(s) in parallel: |▉ | 9% (9/100) [Time Taken: 00:13, 1.96test case/s]
Evaluating 100 test case(s) in parallel: |█ | 11% (11/100) [Time Taken: 00:13, 2.89test case/s]
Evaluating 100 test case(s) in parallel: |█▎ | 13% (13/100) [Time Taken: 00:13, 3.75test case/s]
Evaluating 100 test case(s) in parallel: |█▌ | 15% (15/100) [Time Taken: 00:14, 3.37test case/s]
Evaluating 100 test case(s) in parallel: |█▌ | 16% (16/100) [Time Taken: 00:14, 3.12test case/s]
Evaluating 100 test case(s) in parallel: |█▋ | 17% (17/100) [Time Taken: 00:15, 3.03test case/s]
Evaluating 100 test case(s) in parallel: |█▊ | 18% (18/100) [Time Taken: 00:15, 3.14test case/s]
Evaluating 100 test case(s) in parallel: |██ | 20% (20/100) [Time Taken: 00:15, 4.58test case/s]
Evaluating 100 test case(s) in parallel: |██ | 21% (21/100) [Time Taken: 00:15, 4.81test case/s]
Evaluating 100 test case(s) in parallel: |██▎ | 23% (23/100) [Time Taken: 00:16, 5.91test case/s]
Evaluating 100 test case(s) in parallel: |██▍ | 24% (24/100) [Time Taken: 00:16, 4.96test case/s]
Evaluating 100 test case(s) in parallel: |██▌ | 26% (26/100) [Time Taken: 00:16, 5.71test case/s]
Evaluating 100 test case(s) in parallel: |██▋ | 27% (27/100) [Time Taken: 00:16, 5.05test case/s]
Evaluating 100 test case(s) in parallel: |██▉ | 29% (29/100) [Time Taken: 00:17, 5.57test case/s]
Evaluating 100 test case(s) in parallel: |███ | 30% (30/100) [Time Taken: 00:17, 5.47test case/s]
Evaluating 100 test case(s) in parallel: |███ | 31% (31/100) [Time Taken: 00:17, 4.73test case/s]
Evaluating 100 test case(s) in parallel: |███▏ | 32% (32/100) [Time Taken: 00:17, 4.67test case/s]
Evaluating 100 test case(s) in parallel: |███▍ | 34% (34/100) [Time Taken: 00:18, 5.62test case/s]
Evaluating 100 test case(s) in parallel: |███▌ | 35% (35/100) [Time Taken: 00:18, 4.18test case/s]
Evaluating 100 test case(s) in parallel: |███▌ | 36% (36/100) [Time Taken: 00:18, 4.73test case/s]
Evaluating 100 test case(s) in parallel: |████ | 40% (40/100) [Time Taken: 00:18, 9.67test case/s]
Evaluating 100 test case(s) in parallel: |████▏ | 42% (42/100) [Time Taken: 00:18, 11.24test case/s]
Evaluating 100 test case(s) in parallel: |████▌ | 46% (46/100) [Time Taken: 00:19, 16.01test case/s]
Evaluating 100 test case(s) in parallel: |████▉ | 49% (49/100) [Time Taken: 00:19, 13.12test case/s]
Evaluating 100 test case(s) in parallel: |█████▏ | 52% (52/100) [Time Taken: 00:19, 15.82test case/s]
Evaluating 100 test case(s) in parallel: |█████▌ | 55% (55/100) [Time Taken: 00:19, 13.79test case/s]
Evaluating 100 test case(s) in parallel: |█████▊ | 58% (58/100) [Time Taken: 00:19, 13.89test case/s]
Evaluating 100 test case(s) in parallel: |██████ | 60% (60/100) [Time Taken: 00:20, 14.24test case/s]
Evaluating 100 test case(s) in parallel: |██████▎ | 63% (63/100) [Time Taken: 00:20, 16.44test case/s]
Evaluating 100 test case(s) in parallel: |██████▌ | 66% (66/100) [Time Taken: 00:20, 17.73test case/s]
Evaluating 100 test case(s) in parallel: |██████▊ | 68% (68/100) [Time Taken: 00:20, 18.14test case/s]
Evaluating 100 test case(s) in parallel: |███████ | 71% (71/100) [Time Taken: 00:20, 17.71test case/s]
Evaluating 100 test case(s) in parallel: |███████▍ | 74% (74/100) [Time Taken: 00:20, 17.35test case/s]
Evaluating 100 test case(s) in parallel: |███████▌ | 76% (76/100) [Time Taken: 00:20, 15.69test case/s]
Evaluating 100 test case(s) in parallel: |███████▊ | 78% (78/100) [Time Taken: 00:21, 14.61test case/s]
Evaluating 100 test case(s) in parallel: |████████ | 80% (80/100) [Time Taken: 00:21, 9.50test case/s]
Evaluating 100 test case(s) in parallel: |████████▏ | 82% (82/100) [Time Taken: 00:21, 8.26test case/s]
Evaluating 100 test case(s) in parallel: |████████▍ | 84% (84/100) [Time Taken: 00:22, 8.58test case/s]
Evaluating 100 test case(s) in parallel: |████████▌ | 86% (86/100) [Time Taken: 00:22, 5.25test case/s]
Evaluating 100 test case(s) in parallel: |████████▊ | 88% (88/100) [Time Taken: 00:23, 5.30test case/s]
Evaluating 100 test case(s) in parallel: |████████▉ | 89% (89/100) [Time Taken: 00:23, 5.42test case/s]
Evaluating 100 test case(s) in parallel: |█████████ | 91% (91/100) [Time Taken: 00:24, 4.19test case/s]
Evaluating 100 test case(s) in parallel: |█████████▏| 92% (92/100) [Time Taken: 00:24, 3.71test case/s]
Evaluating 100 test case(s) in parallel: |█████████▎| 93% (93/100) [Time Taken: 00:24, 3.68test case/s]
Evaluating 100 test case(s) in parallel: |█████████▌| 95% (95/100) [Time Taken: 00:24, 5.22test case/s]
Evaluating 100 test case(s) in parallel: |█████████▋| 97% (97/100) [Time Taken: 00:26, 2.80test case/s]
Evaluating 100 test case(s) in parallel: |█████████▊| 98% (98/100) [Time Taken: 00:26, 3.12test case/s]
Evaluating 100 test case(s) in parallel: |█████████▉| 99% (99/100) [Time Taken: 00:30, 1.12s/test case]
Evaluating 100 test case(s) in parallel: |██████████|100% (100/100) [Time Taken: 00:35, 2.10s/test case]
Evaluating 100 test case(s) in parallel: |██████████|100% (100/100) [Time Taken: 00:35, 2.82test case/s]
✓ Tests finished 🎉! Run 'deepeval login' to save and analyze evaluation results
on Confident AI.
metrics directly on Confident AI instead.
Average Metric Scores:
Contextual Precision 0.7364126984126984
Contextual Recall 0.8805555555555555
Hallucination 0.5281958874458874
Metric Passrates:
Contextual Precision 0.71
Contextual Recall 0.82
Hallucination 0.57