Microsoft has released Open XML SDK Roadmap and is getting ready to ship version 1.0 in May, with Version 2.0 scheduled for later this year. Although a CTP version has been available since last June, the final release includes a number of updates that will make it easier to write more efficient XML processing applications with less effort, said Doug Mahugh, Senior Product Manager Interoperability Topics for Office Clients at Microsoft.
Anthony Oliver, CEO of MadCap Software, which makes an XML-based help authoring toolset, noted, "One of the big challenges up to this point is that most things you did in XML were completely 'roll your own' solutions. We are leveraging .NET considerably and can do things in an afternoon that used to take weeks."
Some of the key elements of Version 1.0 include strongly typed access to parts within Open XML documents, simple access to parts within Open XML documents, and LINQ-friendly annotation capabilities.
When the CTP was first released, the core goal was the introduction of strongly typed parts. But Mahugh said that it was not complete and it was missing some types, and did not use the same methods for all of the parts. Those loose ends are now cleaned up, and it is now a complete API for the Open XML format.
In addition there are some capabilities like annotation demonstration. Developers can add annotation to an object, which allows you to dynamically change the object properties, events, and methods at runtime. Mahugh said, "You can take a part in the package and add an annotation that is an XML DOM version of the part. Then whenever you open that part, you have immediate access to the XML inside of the DOM. That is part of the larger strategy of making the SDK work better with the latest changes of the C# and VB languages."
Part of the feedback that Microsoft got on the CTP was that developers wanted to write fewer lines of code to get at the content. In the CTP version there was no capability for doing anything with the XML inside the parts. There was a mechanism for getting the part, but then the developer was on his own with processing the XML.
Mahugh said, "Developers want capabilities more like the Office object model, where you think about the document and paragraphs, and the syntax reflects that rather than just a chunk of XML. We want to make those logical concepts. In Version 1.0 of the SDK, you are seeing concepts that are more in line with the concepts of documents structure, and not just the underlying plumbing of XML. We see both of them as relevant in the long term."
In the long run, both ways of looking at XML are relevant. The packaging is useful when you want more control than you get with the SDK, but then you have to write more code to get at it.
Version 2.0 will add a content object model, search functionality, validation functionality, high-level scenario functionality, and shared ML functionality when released later this year.
The content object model includes additional classes and methods for simplifying developer work within parts. For example, it will include methods for retrieving or modifying a specific paragraph, style, cell, or shape within a part, making it easier to work with content like end-users see it, rather than the underlying document information.
The search functionality will enable simple searching of content in all document types.
With the CTP, it is possible to validate a part against a schema, but that is only one part of validation. The final validation support is waiting for possible modifications to the final standard. Although Open XML has been accepted as an ECMA standard, it is still in the process of going through ISO standardization, which Peter O'Kelly, an industry analyst with The Burton Group said was important for certain large scale deployments. This standardization piece means there might be some minor modifications to the final Open XML standard. Once finalized, the SDK will provide a tool for the developer to be sure that the document is valid and conforms to standards.
The high-level scenario-based functionality will make it easier to do things like creating a document from a template or accept all revisions in a document. Mahugh said this would make it easier to deal with things like track changes and comments, which have been nesty with the binary format.
The shared ML functionality will improve developer support for functionality that is shared between Word, Excel, and PowerPoint, such as classes for DrawingML, which allows drawing objects to appear in multiple document types. Mahugh said, "The SDK as it currently stands does not support parts of the Shared ML. We are taking those things that don't fall into the three-tidy categories and adding that functionality to the API." For example, with DrawingML, you can now take a shape and create it in a word processing document and plug it into a spreadsheet or presentation document.