Welcome! Please see the About page for a little more info on how this works.

+1 vote
in Libs by

I need read Microsoft Word docx documents.
The documents may have tables to ingest, and I wish to capture the color (highlight) of in the text.

So far, I only find the following plausible:
http://www.felix-johnson.com/docx4j.html

I wish to start off from the state of the art.

Thanks for your help!

2 Answers

+1 vote
by
by
No, docx-utils only addresses the generation of docx document. I need to read in docx file.
+1 vote
by

I've used the Apache POI library directly via Java interop and that worked fine for everything I needed. https://poi.apache.org/

I built some wrappers around the tiny bits of that that I used to make it more palatable for my needs.

by
Thanks! Would you mind to share the bits of wrapper that you have built? I need to learn from example of Java interop.
by
Sorry, don't know where any of that was off-hand, it was just shaving some yak a long time ago.
by
POI (for Excel spreadsheets anyway) is quite manageable with interop.

POI has so much surface area that a wrapper might not be of much value.

Another option is to convert somehow to RTF - I think LibreOffice has a command-line way to do that - and then bungle your way through reading the RTF.

Third option, a Word macro to write an EDN file :-)
...