Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MatChar stores encoding, but it's inaccessible to the user #21

Closed
nedtwigg opened this issue Nov 30, 2018 · 4 comments
Closed

MatChar stores encoding, but it's inaccessible to the user #21

nedtwigg opened this issue Nov 30, 2018 · 4 comments
Labels
enhancement New feature or request

Comments

@nedtwigg
Copy link
Contributor

MatChar has a configurable encoding field:

https://github.com/HebiRobotics/MatFileIO/blob/6b7cc2661a20523c3cbc3918977bc50d6e1127dd/src/main/java/us/hebi/matlab/mat/format/MatChar.java#L46-L51

But it's inaccesible from the public Char API:

https://github.com/HebiRobotics/MatFileIO/blob/6b7cc2661a20523c3cbc3918977bc50d6e1127dd/src/main/java/us/hebi/matlab/mat/types/Char.java

Do MAT-files really support multiple encodings, or can we hardcode UTF8? MatFileRw seems to only support UTF8.

https://github.com/diffplug/matfilerw/blob/2a4965c7eaf7f25c2153ec06404fa9b457c2098a/src/main/java/com/jmatio/types/MLChar.java

Adding degrees of freedom for future backward-compatibility seems tricky. Not a big deal though :)

@ennerf
Copy link
Collaborator

ennerf commented Nov 30, 2018

All of the Mat* are package private, but the custom encoding is accessible via Mat5.newChar(rows, cols, encoding). The default is UTF8.

The official Mat-File Format (page 15) added UTF16 and UTF32 encoding sometime a few years ago. UTF32 doesn't seem to actually be readable by MATLAB though.

MatFileRW implements reading UTF16/32, but it may not offer an option to serialize it.

https://github.com/diffplug/matfilerw/blob/1a73e4635eb02afa1db32ec94277945d93e41f7f/src/main/java/com/jmatio/io/MatFileReader.java#L1378

@ennerf
Copy link
Collaborator

ennerf commented Nov 30, 2018

Sorry, I think I may have misread the issue. Making the encoding be accessible via e.g. Char::getEncoding() seems reasonable.

The CharEncoding class should move the types package then, and the serialization should be separated out. So far I treated that more like an implementation detail that most people are unlikely to need.

@ennerf ennerf added the enhancement New feature or request label Nov 30, 2018
@nedtwigg
Copy link
Contributor Author

Roger. I'll submit a PR after #22 gets merged :)

@ennerf
Copy link
Collaborator

ennerf commented Dec 7, 2018

Resolved with #26

@ennerf ennerf closed this as completed Dec 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants