Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attachment server #399

Open
jpfleischer opened this issue Jan 9, 2025 · 17 comments · Fixed by #400
Open

Attachment server #399

jpfleischer opened this issue Jan 9, 2025 · 17 comments · Fixed by #400
Labels
enhancement New feature or request
Milestone

Comments

@jpfleischer
Copy link

The documentation states

  The benefits of using "Archive" over "Export" and "Dump" are:

  • it is well documented (see  slackdump help chunk );
  • it can be converted to other formats, including the native Slack Export

However upon using archive, it generates a folder of json.gz files.
It is not possible to use a folder as input to the convert command

How can we use archive, set to output a native Slack export? I am trying to use Slackord which works with native Slack format, but cannot achieve that format here

@rusq
Copy link
Owner

rusq commented Jan 9, 2025

Hey @jpfleischer , thanks for asking.

  1. It is possible to convert archive to Slack Export format, but not Dump format (not yet):

Here's the example on my toy workspace:
First, archive --

$ slackdump archive
2025/01/09 22:13:59 INFO stream result=<CHM82GF99>
2025/01/09 22:14:00 INFO stream result=<CHY5HUESG>
2025/01/09 22:14:00 INFO stream result=<CHYLGDP0D>
2025/01/09 22:14:01 INFO stream result=<C011D885FP0>
2025/01/09 22:14:03 INFO stream result=<C045TUGSSTW>
2025/01/09 22:14:04 INFO stream result=<C04BJATRQRL>
2025/01/09 22:14:05 INFO stream result=<C07V963QS7K>
2025/01/09 22:14:05 INFO stream result=<Thread[C07V963QS7K:1730798743.474859]>
2025/01/09 22:14:06 INFO stream result=<D03MW5QR8R3>
2025/01/09 22:14:07 INFO stream result=<D034LJA178B>
2025/01/09 22:14:08 INFO stream result=<D015RNCFNRG>
2025/01/09 22:14:09 INFO stream result=<DNC8P5L69>
2025/01/09 22:14:10 INFO stream result=<DL98HT3QA>
2025/01/09 22:14:11 INFO stream result=<DHYNUJ00Y>
2025/01/09 22:14:11 INFO stream result=<Thread[DHYNUJ00Y:1710145284.728069]>
2025/01/09 22:14:12 INFO stream result=<Thread[DHYNUJ00Y:1710144976.814909]>
2025/01/09 22:14:12 INFO stream result=<Thread[DHYNUJ00Y:1665917454.731419]>
2025/01/09 22:14:12 INFO stream result=<DHMAB25DY>
2025/01/09 22:14:12 INFO stream result=<Thread[DHMAB25DY:1710063528.879959]>
2025/01/09 22:14:12 INFO Recorded workspace data filename=slackdump_20250109_221358 took=14.31128438s

Next — convert. No flags specified, converts to export by default --

$ slackdump convert slackdump_20250109_221358
2025/01/09 22:14:28 INFO converting input_format=chunk source=slackdump_20250109_221358 output_format=export output=slackdump_20250109_221428.zip
2025/01/09 22:14:28 WARN skipping file=F047E154GDN error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F046MB9M29K error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06P7HCJF7B error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06PU0ZAN2Z error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06PU0ZJR9T error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06QKMU57SL error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06PZC3LB1A error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06QKNJJ2F2 error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06PWUTV3B4 error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 INFO completed took=396.510631ms
  1. You can use slackdump view to view any of the generated formats, for example to view the archive that was just generated:
$ slackdump view slackdump_20250109_221358
2025/01/09 22:19:18 INFO listening on addr=localhost:8080
<...>

@rusq
Copy link
Owner

rusq commented Jan 9, 2025

In order to generate an export for Slackord, just run slackdump convert <dir name>

It is not possible to use a folder as input to the convert command

Where did you encounter that limitation?

@jpfleischer
Copy link
Author

I see, thank you for your prompt response.
I notice that in your pasted terminal input/output, you do not slackdump view the newly generated zip file but rather the folder from before. When I do that, I can see all the channels listed on the left hand side, but clicking on any of the channels or direct messages gives an empty output in the right.

Here is what happens when I try to run it on the zip file. Clicking on any of the channels listed does nothing:

$ ./slackdump view slackdump_20250109_112936.zip
2025/01/09 11:32:22 INFO listening on addr=localhost:8080
2025/01/09 11:32:22 "GET http://localhost:8080/ HTTP/1.1" from 127.0.0.1:6264 - 200 9248B in 1.6589ms
2025/01/09 11:32:34 ERROR AllMessages in=channelHandler channel=CQKALJEJE error="AllMessages: walk: file does not exi
st: general"
2025/01/09 11:32:34 "GET http://localhost:8080/archives/CQKALJEJE HTTP/1.1" from 127.0.0.1:6264 - 500 48B in 3.2734ms
2025/01/09 11:33:27 ERROR AllMessages in=channelHandler channel=CQY47JGG0 error="AllMessages: walk: file does not exi
st: random"
2025/01/09 11:33:27 "GET http://localhost:8080/archives/CQY47JGG0 HTTP/1.1" from 127.0.0.1:6264 - 500 47B in 0s
2025/01/09 11:33:29 ERROR AllMessages in=channelHandler channel=D01AT204YAJ error="AllMessages: walk: file does not e
xist: D01AT204YAJ"
2025/01/09 11:33:29 "GET http://localhost:8080/archives/D01AT204YAJ HTTP/1.1" from 127.0.0.1:6264 - 500 52B in 0s
2025/01/09 11:33:30 ERROR AllMessages in=channelHandler channel=D01AT204YAJ error="AllMessages: walk: file does not e
xist: D01AT204YAJ"

Here is what happens when I run it on the folder, clicking the channels changes the header but has no messages

image

$ ./slackdump view slackdump_20250109_044827/
2025/01/09 11:34:19 INFO listening on addr=localhost:8080
2025/01/09 11:34:20 "GET http://localhost:8080/ HTTP/1.1" from 127.0.0.1:6274 - 200 9235B in 1.0603ms
2025/01/09 11:34:26 "GET http://localhost:8080/archives/C0187EFDFFY HTTP/1.1" from 127.0.0.1:6274 - 200 909B in 1.409
8ms
2025/01/09 11:34:39 "GET http://localhost:8080/archives/C01B5TWD6TS HTTP/1.1" from 127.0.0.1:6274 - 200 915B in 1.080
9ms

@rusq
Copy link
Owner

rusq commented Jan 9, 2025

Interesting, to list the contents of the folder up to the file attachment id and output their sizes, would you mind running this against the archive folder?

find slackdump_20250109_044827 -depth 1 -exec ls -lgo {} +  > chunk_contents.txt

Note the output is redirected to chunk_contents.txt

And then for the export zip (lists all files except attachment names):

unzip -l slackdump_20250109_221428.zip | grep -v '__uploads\/F[0-9A-Z]\+\/.\+$' > archive_contents.txt

and upload these files?

If there are sensitive channel names, that you'd rather not share in public issue, you can gpg encrypt it with my public key like this:

find slackdump_20250109_044827 -depth 1 -exec ls -lgo {} +  | slackdump tools encrypt > chunk_contents.gpg

@rusq
Copy link
Owner

rusq commented Jan 9, 2025

Also: did you specify a time range when archiving?

If you have jq installed, could you also run the following command for me to get the idea of number of chunks of different types in the #general channel (hopefully there are some):

gzcat CQKALJEJE.json.gz | jq '.t' | awk '{count[$1]++}END{for(t in count)print t,count[t]}' > counts.txt

and upload counts.txt as well?

@jpfleischer
Copy link
Author

jpfleischer commented Jan 9, 2025

As a note, i deleted the exported files and reran it again to see if it was a fluke (the issue still persists) and that is why the folder name is different.
chunk_contents.txt
archive_contents.txt

(why is there only one channel? that's the slackbot channel but I want others too)

I did not specify a time range since I want absolutely everything.
counts.txt

P.S. i am on windows 10

@rusq
Copy link
Owner

rusq commented Jan 10, 2025

Hey, thanks for posting the files. I'm looking at the general channel counts, and it seems that it doesn't contain any messages -
There're only 3 chunks in total: (1) type 0 - channel messages, but most likely judging by the size of json.gz it's a terminal empty slice, (2) type 5 which is channel information and (3) type 7 which is channel users.

Could you confirm, If you open this channel (#general) in slack client, are there any messages which are not hidden by 90 days Slack's paywall?

Out of all chunks, it seems that only the conversation ID DQY47J7DW has messages.

@jpfleischer
Copy link
Author

I see. The messages are all hidden by the paywall, is this tool not able to circumvent that? I can see the messages are there but they are just blurry.

@rusq
Copy link
Owner

rusq commented Jan 10, 2025

The only way to work around the paywall, is to pay them. The API doesn't allow to access the data behind paywall as well.

If you really need that data, I suggest the following masterplan:

  1. deactivate all users to minimise the sub cost
  2. then get a Pro subscription
  3. download everything, and
  4. cancel the sub.

I.e. this is the price for my toy workspace (1 active user):
image

@rusq
Copy link
Owner

rusq commented Jan 10, 2025

I guess you already have it figured out with Discord, but here's the tip of the day - did you know that you can have your own Slack with blackjack and hookers: https://github.com/mattermost/mattermost

@rusq rusq closed this as completed Jan 10, 2025
@jpfleischer
Copy link
Author

jpfleischer commented Jan 10, 2025

this is great, thanks for your patience. I paid the subscription fee but some files are giving

 INFO Recorded workspace data filename=slackdump_20250110_001853 took=53m10.9868448s
2025/01/10 01:14:48 ERROR WithRetry maxAttempts=3 error="download to \"__uploads\\\\somestring\\\\20200913_213541.mp4\" failed, [src=https://files.slack.com/files-pri/someOtherString-somestring/download/20200913_213541.mp4]: unexpected EOF" attempt=1

any way to retry only those that failed?

@rusq rusq reopened this Jan 10, 2025
@rusq
Copy link
Owner

rusq commented Jan 10, 2025

  1. I see that this happened after the main archive processor has finished. Did this happen only after that or was there EOF errors before that as well?
  2. If the number of failed files is only a handful, you could run slackdump search files "filename" then it would try and redownload it.
  3. Looks like Slack terminates the connection, could you try and locate this file and see if it exists and you can play it through Slack client? The easiest way would be to use Slack search files feature and search by file name.

@rusq
Copy link
Owner

rusq commented Jan 10, 2025

Hey @jpfleischer , I submitted #400 to address this type of errors and add an ability to manually run slackdump against the archive to redownload any missing files, see slackdump tools redownload <archive_directory> command, slackdump tools help redownload to get detailed description (which you can read in the PR as well :) )

Once merged into master, you can use it if you compile it from sources, I'll include it in the v3.0.3, but before I release, I'd like to to add the long awaited channel canvas support.

@rusq rusq added this to the v3.0.3 milestone Jan 10, 2025
@rusq rusq closed this as completed in #400 Jan 10, 2025
@rusq
Copy link
Owner

rusq commented Jan 11, 2025

@jpfleischer did it work for you?

@jpfleischer
Copy link
Author

jpfleischer commented Jan 11, 2025

Hi rusq, i did end up using the search command for the two files that failed, and it worked.
So, as my end goal was to use Slackord after using the convert command to get native slack format, I did try to upload everything to a discord server.

Maybe my problem becomes less of a slackdump problem and more of a slackdump-slackord-collaboration problem, but while every message was indeed uploaded, the attachments that were sent were not the actual attachments (such as an MP3 file), but rather the name of the actual attachment (example.mp3) that was actually an HTML file pointing to a slack.com site.

The HTML looks like output.txt
maybe there is a way to fix this in slackdump code?
otherwise, I made a slackord issue at thomasloupe/Slackord#109

But nonetheless, you have been immense help that has helped me accomplish my end goal of saving everything, because the attachments are still available on my local computer, and my direct messages as well. Thank you for being very responsive and for writing code in response to the errors I identified.

@rusq
Copy link
Owner

rusq commented Jan 11, 2025

Thanks for your feedback, I'm glad that it worked.

For attachments, you could try and use this solution: #371 (comment) which was built by @codeallthethingz to import Slackdump-generated export into Slack. It updates the file links within the Slack Export archive and starts up a proxy to serve the files on the request of the target system, that's what probably the new reborn Slackord expects.

@rusq
Copy link
Owner

rusq commented Jan 17, 2025

I'll reopen this, looks like it would benefit many, if this would be a built-in feature.

@rusq rusq reopened this Jan 17, 2025
@rusq rusq changed the title How to convert from archive to native Slack export format Attachment server Jan 17, 2025
@rusq rusq modified the milestones: v3.0.3, v3.1 Jan 17, 2025
@rusq rusq added the enhancement New feature or request label Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants