docx4j
JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files
I just followed approach No 2 in the VariableReplace example from docx4j 2.8.1 and everything it does, is to remove the variable markers ${}.
The steps I did:
- Opened Word 2013, typed ${variable} as text only
- Saved it to somewhere
- read it in my Java program and build my HashMap with .put("variable", "TEST");
- other code is copied and pasted from the example above.
- Saved the document
I'd expect 'TEST' solely, and get just 'variable' without the markers in the output document.
Source: (StackOverflow)
I'm trying to traverse through a word document and save all the images found in the word document. I tried uploading the sample word document to the online demo and noticed that images are listed as:
/word/media/image1.png rId5 image/png
/word/media/image2.png rId5 image/png
/word/media/image3.jpg rId5 image/jpeg
How can I programmatically save these images while traversing the document?
Currently I get all the text from the document like this:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(filePath))
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart()
Document wmlDocumentEl = (org.docx4j.wml.Document)documentPart.getJaxbElement()
Body body = wmlDocumentEl.getBody();
DocumentTraverser traverser = new DocumentTraverser();
class DocumentTraverser extends TraversalUtil.CallbackImpl {
@Override
public List<Object> apply(Object o) {
if (o instanceof org.docx4j.wml.Text) {
....
}
return null;
}
}
Source: (StackOverflow)
By the tutorials that I have seen. I learned how to add text on generating a docx file. but then Every time I add a line of text. I noticed that there is always a space between the first line of text and the second line of text. just like hitting the enter key twice. I know that the main cause is that everytime I add a line of text, I use a paragraph. and a paragraph starts with a space after another paragraph.
This is how I add a text
ObjectFactory factory;
factory = Context.getWmlObjectFactory();
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
P spc = factory.createP();
R rspc = factory.createR();
rspc.getContent().add(wordMLPackage.getMainDocumentPart().createParagraphOfText("sample"));
spc.getContent().add(rspc);
java.io.InputStream is = new java.io.FileInputStream(file);
wordMLPackage.getMainDocumentPart().addObject(spc);
so this code successfully runs and produces the right output. but when i add another paragraph. or text. i want it to be just under the first line of text. is there any way that i can add a simple line of text without using a paragraph? thanks in advance
EDIT: I've also tried adding a simple org.docx4j.wml.Text like this
Text newtext = factory.createText();
newtext.setValue("sample new text");
wordMLPackage.getMainDocumentPart().addObject(newtext);
the program will run but when i open the generated docx file, it will just prompt a message saying that there are problem with the contents.
Source: (StackOverflow)
could we convert microsoft office documents(doc, docx, ppt, pptx, xls, xlsx, etc.) in to html string in Android.
i need to show office documents in my app. i have searched and found docx4j, apache poi and http://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/ to convert files in html. this approach is working fine in desktop version. but when using in android i am getting "Unable to convert in Dalvik format error 1". which is may be due to using too much jars in my android project.
i want to know is there a single way from which i convert office document to html in android.
sorry for english.
EDIT
i am now able to convert doc to html using apache poi.
here is method
public void showsimpleWord() {
File file = new File("/sdcard/test.doc");
HWPFDocumentCore wordDocument = null;
try {
wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream(file));
} catch (Exception e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
WordToHtmlConverter wordToHtmlConverter = null;
try {
wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder()
.newDocument());
wordToHtmlConverter.processDocument(wordDocument);
org.w3c.dom.Document htmlDocument = wordToHtmlConverter
.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
String result = new String(out.toByteArray());
System.out.println(result);
((WebView) findViewById(R.id.webview)).loadData(result,
"text/html", "utf-8");
} catch (Exception e) {
e.printStackTrace();
}
}
now searching for others.
Source: (StackOverflow)
I have a requirement that I have a Word(.DOCX) file. by using a java program i need to put an image at a certain place in the document by using DOCX4J. can anyone please help me!!!
I'm trying with the following code...
final String XPATH = "//w:t";
String image_Path = "D:\\Temp\\ex.png";
String template_Path = "D:\\Temp\\example.docx";
WordprocessingMLPackage package = WordprocessingMLPackage.createPackage();
List texts = package.getMainDocumentPart().getJAXBNodesViaXPath(XPATH, true);
for (Object obj : texts) {
Text text = (Text) ((JAXBElement) obj).getValue();
ObjectFactory factory = new ObjectFactory();
P paragraph = factory.createP();
R run = factory.createR();
paragraph.getContent().add(run);
Drawing drawing = factory.createDrawing();
run.getContent().add(drawing);
drawing.getAnchorOrInline().add(image_Path);
package.getMainDocumentPart().addObject(paragraph);
package.save(new java.io.File("D:\\Temp\\example.docx"));here
Source: (StackOverflow)
I have wrote an application that must parse and retrieve some data from a few thousands large docx files. It will run on a high-performance production server with many CPUs, large amount of RAM and fast SSDs in RAID arrays, so obviously I want to fully use all available performance capabilities.
I found out that my application successfully do any other job in many concurrent threads, but it fails to concurrently parse many docx files using docx4j library. Moreover, this library can't safely support in many separate threads more than one instance of WordprocessingMLPackage class that contains a data from a docx file.
Googling and examination of a source code of the library confirm that it is totally not thread-safe (its classes, for example, contain many static fields and instances that cannot be used concurrently).
So I have some questions to ask:
- Is there any other libraries with the same capabilities that are guaranteed to be thread-safe?
- Can I launch my workers in some separate processes instead of separate threads to workaround this issue? If so, how badly will it decrease a performance of my application?
Source: (StackOverflow)
I have been trying to convert my docX files to a XML I have custom-made. My users want their data converted to this XML for easier content query in their web app and they want the input to be from their docX.
I have tried looking for converter API in Java but none seem to fit my requirement. I have looked into docx4j but realized that it only converts to HTML and PDF. I am thinking if there exists a converter API to which I can input, say, an intermediate translator (XSLT) and the output would be my custom XML complete with the data from my docX.
Is there an existing tool for this? If there is none, any suggestions on the approach I have to take in coding my own converter e.g. from openXML, convert to XSL-FO first before the custom XML?
Would love to hear from the community.
Thank you very much.
Source: (StackOverflow)
So here we go again. My head is banging on my PC about few hours, I can't figured out what to do. On my local PC I run the java code from Intellij Idea. It works. Now I have to create jar
file to make it able to use on some remote server. I added all libraries, jars that my program needs at project settings (Added libraries at Artifacts section). But it doesn't work running at remote server. What imports my program needs:
import org.docx4j.dml.CTBlip;
import org.docx4j.jaxb.XPathBinderAssociationIsPartialException;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.PartName;
import org.docx4j.openpackaging.parts.relationships.RelationshipsPart;
import org.docx4j.relationships.Relationship;
import javax.xml.bind.JAXBException;
import java.io.File;
import java.util.List;
Error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/docx4j/openpackaging/exceptions/Docx4JException
Caused by: java.lang.ClassNotFoundException: org.docx4j.openpackaging.exceptions.Docx4JException
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: Main. Program will exit.
So is the problem in creating the jar? I missed something?
Source: (StackOverflow)
I have docx document with some placeholders. Now I should replace them with other content and save new docx document. I started with docx4j and found this method:
public static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
List<Object> result = new ArrayList<Object>();
if (obj instanceof JAXBElement) obj = ((JAXBElement<?>) obj).getValue();
if (obj.getClass().equals(toSearch))
result.add(obj);
else if (obj instanceof ContentAccessor) {
List<?> children = ((ContentAccessor) obj).getContent();
for (Object child : children) {
result.addAll(getAllElementFromObject(child, toSearch));
}
}
return result;
}
public static void findAndReplace(WordprocessingMLPackage doc, String toFind, String replacer){
List<Object> paragraphs = getAllElementFromObject(doc.getMainDocumentPart(), P.class);
for(Object par : paragraphs){
P p = (P) par;
List<Object> texts = getAllElementFromObject(p, Text.class);
for(Object text : texts){
Text t = (Text)text;
if(t.getValue().contains(toFind)){
t.setValue(t.getValue().replace(toFind, replacer));
}
}
}
}
But that only work rarely because usually the placeholders splits across multiple texts runs.
I tried UnmarshallFromTemplate but it work rarely too.
How this problem could be solved?
Source: (StackOverflow)
I currrently try to run Docx4j in WebLogic Server 12c. WebLogic Server 12c comes with EclipseLink 2.3.2.
There is a similar Post describing the situation which unfortunately yield no answer.
Docx4j does not work with the JAXB (MOXy) implementation which is part of EclipseLink 2.3.2. I got Docx4j running standalone with EclipseLink 2.5. So I am very confident that using EclipseLink 2.5 with Weblogic Server 12c will solve the issue with Docx4j.
How can I replace the EclipseLink Vesion 2.3.2 the WebLogic Server 12c is running on with EclipseLink Version 2.5?
Source: (StackOverflow)
I'd like to remove all the comments from a docx file using docx4j.
I can remove the actual comments with a piece of code like is shown below, but I think I also need to remove the comment references from the main document part as well (otherwise the document is corrupted), but I can't figure out how to do that.
CommentsPart cmtsPart = wordMLPackage.getMainDocumentPart().getCommentsPart();
org.docx4j.wml.Comments cmts = cpart.getJaxbElement();
List<Comments.Comment> coms = cmts.getComment();
coms.clear();
Any guidance appreciated!
I also posted this question on the docx4j forum: http://www.docx4java.org/forums/docx-java-f6/how-to-remove-all-comments-from-docx-file-t1329.html.
Thanks.
Source: (StackOverflow)
i got this sample code to replace variables with text and it works perfect.
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File("c:/template.docx"));
VariablePrepare.prepare(wordMLPackage);
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
HashMap<String, String> mappings = new HashMap<String, String>();
mappings.put("firstname", "Name"); //${firstname}
mappings.put("lastname", "Name"); //${lastname}
documentPart.variableReplace(mappings);
wordMLPackage.save(new java.io.File("c:/replace.docx"));
but now i have to replace the variables with html. I tried something like this. but of cause it does not work
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File("c:/template.docx"));
VariablePrepare.prepare(wordMLPackage);
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
String html = "<html><head><title>Import me</title></head><body><p style='color:#ff0000;'>Hello World!</p></body></html>";
AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/hw.html"));
afiPart.setBinaryData(html.toString().getBytes());
afiPart.setContentType(new ContentType("text/html"));
Relationship altChunkRel = documentPart.addTargetPart(afiPart);
CTAltChunk ac = Context.getWmlObjectFactory().createCTAltChunk();
ac.setId(altChunkRel.getId());
HashMap<String, String> mappings = new HashMap<String, String>();
mappings.put("firstname", ac.toString()); //${firstname}
mappings.put("lastname", "Name"); //${lastname}
documentPart.variableReplace(mappings);
wordMLPackage.save(new java.io.File("c:/replace.docx"));
Is there any way to achieve this?
Source: (StackOverflow)
I have a paragraph of text which I would like to appear in the center of the document. How can I do this in docx4j? I am currently using:
PPr paragraphProperties = factory.createPPr();
//creating the alignment
TextAlignment align = new TextAlignment();
align.setVal("center");
paragraphProperties.setTextAlignment(align);
//centering the paragraph
paragraph.setPPr(paragraphProperties);
but it isn't working.
Source: (StackOverflow)
I am trying to generate PDF/Docx in android.
I tried with a lot of libraries: apache poi, docx4j and pdf box but always have this message in the console.
Any idea?
For example for this example code for docx4j:
public class ExportNotebookToWordTask extends RoboAsyncTask<Void> {
private ProgressDialog exportProgress;
private Activity activity;
protected ExportNotebookToWordTask (Context context, Activity activity) {
super(context);
this.activity = activity;
exportProgress = new ProgressDialog(activity);
exportProgress.setIndeterminate(true);
exportProgress.setCancelable(false);
exportProgress.setCanceledOnTouchOutside(false);
exportProgress.setMessage(context.getString(R.string.export_notebook_to_pdf_progress));
}
@Override
protected void onPreExecute() throws Exception {
super.onPreExecute();
exportProgress.show();
}
@Override
public Void call() throws Exception {
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
wordMLPackage.getMainDocumentPart().addParagraphOfText("Hello Word!");
File notebookDir = new File(Environment.getExternalStorageDirectory().getAbsolutePath() + File.separator + Constants.NOTEBOOKS_DIR);
if(!notebookDir.exists()) {
notebookDir.mkdir();
}
wordMLPackage.save(new File(notebookDir, course.getName() + Constants.DOCX_EXTENSION_FILE));
return null;
}
@Override
protected void onSuccess(Void result) throws Exception {
super.onSuccess(result);
DigitalNotebookActivity.this.finish();
}
@Override
protected void onFinally() throws RuntimeException {
super.onFinally();
if (exportProgress != null && exportProgress.isShowing()) {
exportProgress.dismiss();
}
}
}
Log:
05-25 22:41:42.927 29302-31419/com.digitalnotebook I/art﹕ Rejecting
re-init on previously-failed class
java.lang.Class<org.docx4j.jaxb.JaxbValidationEventHandler> 05-25
22:41:42.927 29302-31419/com.digitalnotebook I/art﹕ Rejecting re-init
on previously-failed class
java.lang.Class<org.docx4j.jaxb.JaxbValidationEventHandler> 05-25
22:41:42.957 29302-31419/com.digitalnotebook I/art﹕ Rejecting re-init
on previously-failed class
java.lang.Class<org.docx4j.jaxb.JaxbValidationEventHandler> 05-25
22:41:42.977 29302-31419/com.digitalnotebook I/art﹕ Rejecting re-init
on previously-failed class
java.lang.Class<org.docx4j.jaxb.JaxbValidationEventHandler> 05-25
22:41:43.027 29302-31419/com.digitalnotebook I/System.out﹕
22:41:43.041 [pool-4-thread-8] INFO org.docx4j.jaxb.Context -
java.vendor=The Android Project 05-25 22:41:43.027
29302-31419/com.digitalnotebook I/System.out﹕ 22:41:43.043
[pool-4-thread-8] INFO org.docx4j.jaxb.Context - java.version=0 05-25
22:41:43.137 29302-31419/com.digitalnotebook I/System.out﹕
22:41:43.152 [pool-4-thread-8] DEBUG org.docx4j.utils.ResourceUtils -
Attempting to load: org/docx4j/wml/jaxb.properties 05-25 22:41:43.147
29302-31419/com.digitalnotebook I/System.out﹕ 22:41:43.160
[pool-4-thread-8] DEBUG org.docx4j.utils.ResourceUtils - Not using
MOXy, since no resource: org/docx4j/wml/jaxb.properties 05-25
22:41:43.147 29302-31419/com.digitalnotebook I/System.out﹕
22:41:43.161 [pool-4-thread-8] INFO org.docx4j.jaxb.Context - No MOXy
JAXB config found; assume not intended.. 05-25 22:41:43.147
29302-31419/com.digitalnotebook I/System.out﹕ 22:41:43.161
[pool-4-thread-8] DEBUG org.docx4j.jaxb.Context -
org/docx4j/wml/jaxb.properties not found via classloader. 05-25
22:41:43.147 29302-31419/com.digitalnotebook I/art﹕ Rejecting re-init
on previously-failed class
java.lang.Class<org.docx4j.jaxb.NamespacePrefixMapperSunInternal>
05-25 22:41:43.157 29302-31419/com.digitalnotebook I/art﹕ Rejecting
re-init on previously-failed class
java.lang.Class<org.docx4j.jaxb.NamespacePrefixMapperSunInternal>
05-25 22:41:43.157 29302-31419/com.digitalnotebook I/art﹕ Rejecting
re-init on previously-failed class
java.lang.Class<org.docx4j.jaxb.NamespacePrefixMapper> 05-25
22:41:43.157 29302-31419/com.digitalnotebook I/art﹕ Rejecting re-init
on previously-failed class
java.lang.Class<org.docx4j.jaxb.NamespacePrefixMapper> 05-25
22:41:43.157 29302-31419/com.digitalnotebook I/art﹕ Rejecting re-init
on previously-failed class
java.lang.Class<org.docx4j.jaxb.NamespacePrefixMapperRelationshipsPartSunInternal>
05-25 22:41:43.157 29302-31419/com.digitalnotebook I/art﹕ Rejecting
re-init on previously-failed class
java.lang.Class<org.docx4j.jaxb.NamespacePrefixMapperRelationshipsPartSunInternal>
05-25 22:41:43.177 29302-31419/com.digitalnotebook I/art﹕ Rejecting
re-init on previously-failed class
java.lang.Class<org.docx4j.jaxb.NamespacePrefixMapperRelationshipsPart>
05-25 22:41:43.177 29302-31419/com.digitalnotebook I/art﹕ Rejecting
re-init on previously-failed class
java.lang.Class<org.docx4j.jaxb.NamespacePrefixMapperRelationshipsPart>
05-25 22:41:43.177 29302-31419/com.digitalnotebook I/art﹕ Rejecting
re-init on previously-failed class
java.lang.Class<org.docx4j.jaxb.NamespacePrefixMapper>
Source: (StackOverflow)
In my project I have one requirement to show the number of pages in Word documents (.doc, .docx) files and number of sheets in Excel documents (.xls, .xlsx). I have tried to read the .docx file using Docx4j but the performance is very poor but I need just the word count and tried using Apache POI. I am getting an error, something like:
"trouble writing output: Too many methods: 94086; max is 65536. By package:"
I want to know whether there is any paid/open source library available for android.
Source: (StackOverflow)