Contact me  
Home Architecture Patterns BizTalk OBA OSLO Security Sharepoint Visual Studio WCF ASP.NET Workflow Tools Tutorials

Generate Office 2007 documents using OpenXML SDK

Overview

With Office 2007, the format in which Office documents are stored is no more proprietary.  Office 2007 documents are stored in an xml format which is based on Open XML specification.

The content of the documents together with resources is packaged in ZIP format.

This makes other third party tools to be able to read and write Office 2007 documents.

Recently Microsoft released the first version of managed OpenXML sdk which helps in managing Office 2007 documents on the fly using a managed code.

OpenXML sdk helps in creating new documents and also modify the existing ones by search and replace, insert new sections, delete existing content, etc.

Office 2007 document package format

As mentioned earlier, Office 2007 documents are packaged in zip format. To better understand this, create a new Word 2007 document with file name as ‘Hello World.docx’ and type ‘Hello World’ in it and save it.

Change the file extension from .docx to zip and extract the file in default folder.

You will see the following directory structure:

As you can see, a single Word 2007 file actually consists of multiple files and folders packaged into ZIP format.

The contents of ‘word’ folder will be something like this:

 

Open the ‘document.xml’ and you can see the contents you typed inside the document.

OpenXML SDK

OpenXML sdk provides a rich set of managed API which can be used to generate Office 2007 documents at runtime.

Provided the content as seen in the ‘document.xml’ file above, sdk APIs can create the new docx which is packaged according to the specification.

Walkthrough: Create a Word document using OpenXML SDK

·         Make sure you have Visual Studio installed on your machine and Word 2007 to view the generated document.

·         Download OpenXML sdk 1.0

·         After installation of sdk, you will see following folder structure created at installed location.

·         The documentation for sdk can be found in ‘doc’ folder.

·         Create a new console application.

·         Add reference to DocumentFormat.OpenXML assembly.

·         Add following namespaces:

using System.IO;

using DocumentFormat.OpenXml;

using DocumentFormat.OpenXml.Packaging;

·         Modify the Main method as below:

        static void Main(string[] args)

        {

            string document = "VikasGoyal.docx";

            using (WordprocessingDocument wordDoc = WordprocessingDocument.Create(document, WordprocessingDocumentType.Document))

            {

                MainDocumentPart part = wordDoc.AddMainDocumentPart();

                const string docXml =

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>

<w:document xmlns:w=""http://schemas.openxmlformats.org/wordprocessingml/2006/main"">

<w:body>

<w:p><w:r><w:rPr><w:b/></w:rPr><w:t>Hello world</w:t></w:r></w:p>

</w:body>

</w:document>";

 

                using (Stream stream = part.GetStream())

                {

                    byte[] buf = (new UTF8Encoding()).GetBytes(docXml);

                    stream.Write(buf, 0, buf.Length);

                }

 

            }

        }

·         WordprocessingDocument is the main class for creating Word Documents. Create method is called and filename is passed as argument.

·         Doucment is composed of many parts. After creating the main document we have added the main document part using AddMainDocumentPart method.

·         docXML contains the content in xml format similar to the one in document.xml above. The format of the document is based on OpenXML format specification.

·         Few important tags here :

o   p – paragraph

o   r – run

o   b – bold

·         Run the above program and docx will be generated. You can open it in Word and can unzip it after changing its extension. The package will have same structure as discussed above and content will be in document.xml file.

Summary

With OpenXML, Office documents provide maximum interoperability on various platforms. Content can always be stored as plain xml and can be packaged as docx at the time of distribution using OpenXML sdk.

OpenOffice has already started supporting Office 2007 docs. http://marketing.openoffice.org/3.0/announcementbeta.html

 

Useful Links

·         Download source code of above program : http://www.codeplex.com/Samples/Release/ProjectReleases.aspx?ReleaseId=14719

·         http://www.openxml4j.org/  sdk for Java

·         http://openxmldeveloper.org

 

 

 

AddThis Social Bookmark Button


Most Popular Articles

 

Most Recent Articles