Generate Office 2007 documents using OpenXML SDK
Overview
With Office 2007, the format in which Office documents are
stored is no more proprietary. Office 2007 documents are stored in an xml
format which is based on Open XML specification.
The content of the documents together with resources is
packaged in ZIP format.
This makes other third party tools to be able to read and
write Office 2007 documents.
Recently Microsoft released the first version of managed
OpenXML sdk which helps in managing Office 2007 documents on the fly using a
managed code.
OpenXML sdk helps in creating new documents and also modify
the existing ones by search and replace, insert new sections, delete existing
content, etc.
Office 2007 document package format
As mentioned earlier, Office 2007 documents are packaged in
zip format. To better understand this, create a new Word 2007 document with
file name as ‘Hello World.docx’ and type ‘Hello World’ in it and save it.
Change the file extension from .docx to zip and extract the
file in default folder.
You will see the following directory structure:

As you can see, a single Word 2007 file actually consists of
multiple files and folders packaged into ZIP format.
The contents of ‘word’ folder will be something like this:

Open the ‘document.xml’ and you can see the contents you
typed inside the document.
OpenXML SDK
OpenXML sdk provides a rich set of managed API which can be
used to generate Office 2007 documents at runtime.
Provided the content as seen in the ‘document.xml’ file
above, sdk APIs can create the new docx which is packaged according to the
specification.
Walkthrough: Create a Word document using OpenXML SDK
·
Make sure you have Visual Studio installed on your machine and
Word 2007 to view the generated document.
·
Download
OpenXML sdk 1.0
·
After installation of sdk, you will see following folder
structure created at installed location.

·
The documentation for sdk can be found in ‘doc’ folder.
·
Create a new console application.
·
Add reference to DocumentFormat.OpenXML assembly.

·
Add following namespaces:
using System.IO;
using DocumentFormat.OpenXml;
using
DocumentFormat.OpenXml.Packaging;
·
Modify the Main method as below:
static void
Main(string[] args)
{
string document = "VikasGoyal.docx";
using (WordprocessingDocument
wordDoc = WordprocessingDocument.Create(document,
WordprocessingDocumentType.Document))
{
MainDocumentPart part =
wordDoc.AddMainDocumentPart();
const string
docXml =
@"<?xml version=""1.0""
encoding=""UTF-8""
standalone=""yes""?>
<w:document
xmlns:w=""http://schemas.openxmlformats.org/wordprocessingml/2006/main"">
<w:body>
<w:p><w:r><w:rPr><w:b/></w:rPr><w:t>Hello
world</w:t></w:r></w:p>
</w:body>
</w:document>";
using (Stream
stream = part.GetStream())
{
byte[] buf = (new
UTF8Encoding()).GetBytes(docXml);
stream.Write(buf, 0, buf.Length);
}
}
}
·
WordprocessingDocument is the main class for creating
Word Documents. Create method is called and filename is passed as argument.
·
Doucment is composed of many parts. After creating the main
document we have added the main document part using AddMainDocumentPart
method.
·
docXML contains the content in xml format similar to the one in
document.xml above. The format of the document is based on OpenXML format
specification.
·
Few important tags here :
o
p – paragraph
o
r – run
o
b – bold
·
Run the above program and docx will be generated. You can open it
in Word and can unzip it after changing its extension. The package will have
same structure as discussed above and content will be in document.xml file.
Summary
With OpenXML, Office documents provide maximum
interoperability on various platforms. Content can always be stored as plain
xml and can be packaged as docx at the time of distribution using OpenXML sdk.
OpenOffice has already started supporting Office 2007 docs.
http://marketing.openoffice.org/3.0/announcementbeta.html
Useful Links
·
Download source code of above program :
http://www.codeplex.com/Samples/Release/ProjectReleases.aspx?ReleaseId=14719
·
http://www.openxml4j.org/
sdk for Java
·
http://openxmldeveloper.org