Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Microsoft Word on Windows unable to open MHT output, treats it as unknown HTML file #799

Open
ronaldtse opened this issue Jul 11, 2023 · 7 comments
Labels
help wanted Extra attention is needed

Comments

@ronaldtse
Copy link
Contributor

As reported (and demonstrated) by @stuartgalt in https://github.com/metanorma/iso-10303-2/issues/316, Microsoft Word is unable to recognize the Metanorma Word output (an MHT file) at all regardless of the file extension used (.doc and .mht).

The same file opens perfectly using Microsoft Word on Mac.

On loading the file, the user is prompted with a modal popup box requesting the user to select the type of file it is, providing close to or more than 10 different options such as "Single Page HTML File" (which was the only one that made sense).

When selecting "Single Page HTML File", Word will take a long time to process the document, just like when loading a normal HTML file. Once done, it seems that the "Word content" is newly formatted and the embedded Word styles are all gone.

This is a huge problem for Windows users. I'm not sure how we can address this using https://github.com/metanorma/html2doc .

@ronaldtse ronaldtse added the help wanted Extra attention is needed label Jul 11, 2023
@opoudjis
Copy link
Contributor

opoudjis commented Jul 16, 2023

The only difference between .DOC out of the factory and MHT is that the DOC is zipped. The user who reported it's "just an HTML file" is wrong about that, it is a MIME package. I will need to investigate, I am not convinced the issue has been diagnosed correctly. We have dozens of Windows users, and we have not heard this before.

@ronaldtse
Copy link
Contributor Author

@opoudjis no, I saw with my own eyes the entire flow on Windows Word. The file cannot be recognized by Word as a Word file.

@ronaldtse
Copy link
Contributor Author

It might not be an ecosystem-wide issue but it is certainly an issue with Windows Word.

@opoudjis
Copy link
Contributor

I have just opened a document I have just generated, on Windows 10, Word for Office 365 MSO, Version 2307, Build 16.0.16626.20086.

This makes this a bug I cannot address, because I cannot replicate it. Word on opening the document (on a PC that has never opened a Metanorma document before) did say "Do you want to make Word your default Web browser", which means it is seeing MHT as HTML, but all styling is preserved.

I require from @stuartgalt details of his Office setup, and confirmation that he's getting the same behaviour on the attached document, before we can do anything further about this.

a.doc.zip

We also need to be asking our users how widespread an issue this is.

@ronaldtse
Copy link
Contributor Author

@opoudjis when you said:

did say "Do you want to make Word your default Web browser", which means it is seeing MHT as HTML, but all styling is preserved.

Can you confirm that the Styles pane in Word preserved all styling?

Your experience actually validates the ticket — Word had to convert the MHT into Word first, it could not open the MHT file.

If you’re using a small document, you won’t see the effects. Try using a document like ISO 10303-2, then you can clearly see the two stages:

  • Taking a long time to load the MHT
  • Taking a long time to convert the HTML into Word

While on a Mac, the file just opened right away without spending this much time.

@ronaldtse
Copy link
Contributor Author

And might I suggest the root cause of this issue. I believe MHT functionality has been stripped away from Windows Word. This is why it requires a conversion from HTML.

@opoudjis opoudjis removed their assignment Aug 16, 2023
@opoudjis
Copy link
Contributor

I am unassigning myself. I have been very clear that I will NEVER work on porting Metanorma to OOXML: I simply do not have the capacity; and like Sébastien Sauvage said a decade ago,

I don’t have time to read a 7500 pages specification no-one is capable of implementing - not even Microsoft !

(I'll add that the 1500 pages specific to Word are insufficient—Word CSS style commands are not fully documented.)

There are SDKs out there, but Metanorma repeatedly pushes Word formatting to its limits, so simple SDKs are not going to cut it: whoever would port Metanorma to OOXML needs to know OOXML backwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants