Home > Samples > Update > July 2005
  Office Document Processing Platform Emerging     
   

[bio]

The following is the full text of an article published by Directions on Microsoft, an independent research firm focused exclusively on Microsoft strategy & technology. Each month we make one or more key articles available to non-subscribers.

Digital document processing is one major focus of Visual Studio 2005 Tools for Office (VSTO 2005), currently in beta testing. Digital document processing speeds business processes and reduces errors by replacing paper forms and documents with electronic ones. VSTO 2005 delivers an Office platform and tools that extend document processing beyond forms solutions offered by vendors such as Adobe, the current leader in the field. However, the VSTO 2005 technology is missing some pieces and will have to make its case against competing forms technologies both inside and outside of Microsoft.

A Method for Gradual Automation

Digital document processing is a method of automating business processes by replacing paper documents with digital documents that incorporate data from an organization's key applications. Instead of filing and distributing paper documents, an organization distributes digital documents, especially forms, and creates automated solutions to generate documents from application data, extract data from documents, and route documents to the correct people for approval and input. The resulting solutions can reduce errors and shorten turnaround time for business processes. (See the illustration "A Digital Document Processing Scenario".)

Adobe has gotten an early foothold in digital document processing, especially in government, where use of particular forms and documents is often prescribed by regulation. Adobe's Portable Document Format (PDF) enables digital forms that closely match the appearance of existing paper ones, with data fields for input and scripts to validate input. PDF-based forms are displayed and printed by the Acrobat Reader, a free multiplatform client, and Adobe provides server products for tasks such as extracting data from PDF forms, automatically routing forms and documents through a business process (workflow), and restricting access to documents to authorized users (rights management). Microsoft and partners offer similar capabilities to PDF through the InfoPath forms design and input product, which is also supported by some features of VSTO 2005. (See the sidebar "Forms Processing with InfoPath".)

Making Office the Platform

However, digital document processing is still nascent, and Microsoft believes it can take the lead by expanding digital document processing beyond data entry in forms to capturing data from Office documents. Specifically, VSTO 2005 enables solutions that capture and display data in Word and Excel documents, and synchronize the data between documents and enterprise resource planning, customer relationship management, and line-of-business applications. A solution for the building permit scenario, for example, might process Word documents, enabling users such as builders and planners to enter data and generate form letters from within the Word interface.

Word and Excel documents give users capabilities that simple data entry forms do not, such as the ability to add new sections to a document, reformat text, add calculations based on data in the document, and so on. For example, a builder might insert an Excel-based table of traffic calculations into a Word-based building permit application, something that would be difficult to do in a fixed form. In general, an Office-based document processing system can offer greater flexibility to users. Office document processing solutions could also be simpler to understand and use than forms solutions, because they exploit the normal user interface of Office.

For Microsoft and partners, Office document processing solutions could drive upgrades of the Office suite and help sales of related server products such as SharePoint Portal Server and Windows Rights Management Services. Office-based document processing solutions have one natural advantage: 200 million Office users, and 8 million developers that create Office solutions.

Document Processing with VSTO 2005

VSTO 2005, currently in beta with release scheduled for late 2005, delivers a development environment based on Visual Studio and APIs for the digital processing of Word and Excel documents. In a typical document processing system built with VSTO 2005, users work with documents in Word or Excel, while behind-the-scenes components provided by the developer manage data entry and other aspects of the overall business process.

A VSTO 2005 document processing system would normally consist of a client-side user interface for simplifying and validating data input in documents, and a server-side component for managing workflow and synchronizing data between documents and the organization's business applications. Application data are stored in XML data islands within Word and Excel document files. (See the illustration "A VSTO 2005 Document Processing Solution".)

Client-Side User Interface

VSTO 2005 delivers a client-side engine that installs on top of Word 2003 and Excel 2003 and supports document processing solutions. Major features of the engine include the following:

Host controls. The engine delivers "host controls," managed components that display, accept, and validate data in Word and Excel documents. For example, an XMLNode control in Word enables text data in an embedded form field to be captured as XML and validated. The controls support data binding to XML data islands, letting the data island be viewed and updated automatically through a control with relatively little code from the developer.

Windows Forms control hosting. The VSTO 2005 engine enables Word and Excel to host Windows Forms user interface controls. This enables the developer to create Word and Excel documents that contain buttons, drop-down menus, calendar views for selecting dates, and other user interface controls from the extensive Windows Forms library and third parties.

Simplified Office APIs. The client engine provides simplified APIs for important Office programming tasks such as displaying the Office task pane and managing Smart Tags.

Automatic updating. The VSTO 2005 client engine supports the automatic execution and updating of the client components of a solution. Client components are deployed in the form of a .NET Framework assembly, the standard executable code format for VB.NET and C# applications. When the user opens a Word 2003 or Excel 2003 document, the VSTO 2005 client engine automatically loads and runs the most recent version of any assembly associated with the document. It is worth noting, however, that VSTO's updating mechanism is different from the "ClickOnce" deployment technologies used elsewhere in Visual Studio.

VSTO 2005 also delivers a customized version of the Visual Studio development environment for Office programming. Developers write their code in VB.NET or C#. They can run Word and Excel inside the VSTO 2005 development environment to test and debug their client-side code.

The VSTO 2005 client APIs and tools require Office 2003. Microsoft has said that these APIs will carry forward to Office 12, the version of Office currently planned for 2006. That means that document processing systems built with VSTO 2005 and Office 2003 will probably continue to work on Office 12.

Server-Side Data Synchronization

VSTO 2005 also delivers server APIs for accessing XML data islands in Word and Excel binary files and creating Word and Excel files that contain data islands. The developer can use these APIs to pull data from documents and enter them into the organization's applications and databases, or create new documents that are populated with application data.

The VSTO 2005 server APIs do not require Word or Excel to be run on the server, which avoids performance and configuration problems that have hampered server-side processing of Office documents in the past. Microsoft intends for the server APIs to carry forward to Office 12, just like the client APIs.

A First Step, More to Come

VSTO 2005 represents a first step toward document processing with Office, but additional work is required. Future improvements will probably come in the following areas:

Workflow. VSTO 2005 does not provide any server components for automatically routing documents or tracking them through a defined business process, an ability that has proved key in current digital document processing solutions. However, document processing solutions based on VSTO 2005 could use some of the same third-party workflow solutions currently used with InfoPath. (See the sidebar "Forms Processing with InfoPath".)

Management. The VSTO 2005 platform lacks a number of capabilities for application management. In particular, it lacks client scanning capabilities to check the health of application components on clients, or determine on which client computers the application needs patching. It also cannot push application patches to clients, although clients will get patched versions of an application's components from the server each time they run the application and are online.

In the short run, developers who need these capabilities will have to build or buy them on their own, although Microsoft intends to provide documentation and sample code for the most important capabilities.

Tough Competition Inside and Outside Microsoft

Regardless of its capabilities, VSTO 2005 faces tough competition as a document processing platform from forms technologies both inside and outside Microsoft. VSTO 2005 also will have to fight for mindshare with a plethora of other Office client development platforms, although code built on VSTO 2005 seems to have some of the best long-term prospects. (See the chart "Platforms for Office-Based Clients".)

The main argument for VSTO 2005 document processing is the ability to deliver documents to users that they can work with in Office: changing formats, adding sections and comments, and performing calculations. However, many document processing solutions today have made do with less powerful forms, in order to reach more clients through a simpler client platform. Adobe's PDF is first out of the starting gate owing to the large installed base of its Reader, its multiplatform support, and its ability to replicate existing forms. In particular, customers could find that any additional capabilities of an Office 2003-based document processing solution are outweighed by the cost of deploying Office 2003 to all of their clients.

In addition, Microsoft itself plans to ship a forms technology that has high graphics fidelity similar to PDF. Called XAML (pronounced "Zamel" and rhymes with "camel"), the technology will have a forms engine that is preinstalled on the next version of Windows, code-named Longhorn, and that can be retrofitted onto Windows XP. Either PDF or XAML will be more suitable for digital document processing than Office 2003 and VSTO 2005 when an organization can't readily deploy Office 2003 to all its clients.

To compete, the VSTO team will have to demonstrate solutions where the familiarity of the Office suite, and the ability to change formatting, add custom calculations, and make other ad hoc changes to Office documents, outweigh the simpler management, higher graphics fidelity, and broader reach of XAML and PDF.

Resources

VSTO 2005 Beta 2 is available as part of the Visual Studio 2005 Team System beta program; see lab.msdn.microsoft.com/vs2005/get/default.aspx#vsto.

An overview of VSTO 2005 for developers appeared at msdn.microsoft.com/library/default.asp?url=/library/en-us/odc_vsto2005_ta/html/OfficeWhatsNewInVSTO2005.asp.

Microsoft's recommendations for smart client application technologies are at msdn.microsoft.com/smartclient/understanding/default.aspx?pull=/library/en-us/odc_ip2003_ta/html/odc_ipoffice2003smartclient.asp.

A developer starting point for XAML is msdn.microsoft.com/msdntv/episode.aspx?xml=episodes/en/20031218xamldb/manifest.xml.

An overview of InfoPath for developers appeared in the Dec. 2004 Microsoft Developer Platform Roadmap.

BizTalk Server workflow functions were outlined in the Oct. 2004 Research Report, "BizTalk Server 2004 Drives Microsoft Integration Strategy."

K2.net is at www.k2workflow.com.

Captaris is at www.captaris.com.

Adobe's forms processing products are outlined at www.adobe.com/products/server/main.html.