Sometimes an application needs to interact with the user interface (UI) of a second application.

The first application might be a test application that drives the UI of the target to run through some automated tests. It might describe the UI out loud, as an aid to users that are blind. It might be a speech application that allows users to give vocal commands. In each of these cases, the application needs a way to inspect and interact with the UI of the system and other running applications.

Microsoft® UI Automation is a tool for doing just that. It provides an abstracted model of the UI, and allows a client application to both investigate and manipulate the UI of all running applications on the system. In this article, I explain how you can use UI Automation for automated testing, and I walk through code samples that demonstrate the basic features of UI Automation.

Using UI Automation for Automated Testing

The goal of many automated test tools and scenarios is the consistent and repeatable manipulation of the user interface. This manipulation can involve simple unit testing of specific controls, or complex recording and playback of test scripts that iterate through a series of generic actions on a group of controls.

Programmatic access provides the ability to imitate, through code, any interaction and experience exposed by traditional mouse and keyboard input.

One complication that arises from automated applications is the difficulty of synchronizing a test with a dynamic target. As an example, consider a list box control-such as one contained in the Windows Task Manager-that displays a list of currently running applications. Since the items in the list box are dynamically updated outside the control of the test application, consistently repeating the selection of a specific item in the list box is impossible. Similar problems arise when attempting to repeat simple focus changes in a UI that is outside the control of the test application.

Programmatic access provides the ability to imitate, through code, any interaction and experience exposed by traditional mouse and keyboard input. UI Automation enables programmatic access through five components:

  • The UI Automation tree facilitates navigation through the structure of the UI. The tree is built from the collection of window handles (HWNDs).
  • Automation elements are individual components in the UI. These can often be more granular than a window handle.
  • Automation properties provide specific information about UI elements.
  • Control patterns define a particular aspect of a control's functionality. They can consist of information about properties, methods, events, and structures.
  • Automation events provide event notifications and information.

Basics of a UI Automation Client Application

For the Windows® 7 operating system, UI Automation has a new COM client API for both unmanaged and managed clients. Unmanaged applications can now use UI Automation without requiring a change in languages or loading the common language runtime (CLR).

Perhaps the best way to understand UI Automation in general, and the COM client specifically, is to see them in action. Listing 1 demonstrates a complete program that gets the UI element at the current cursor position, and then prints its name and control type.

Next I’ll walk through the various pieces that make up this example.

Creating the CUIAutomation Object

The heart of the new COM API is the IUIAutomation interface, which lets clients get automation elements, register event handlers, create various helper objects, and access other helper methods.

To get started using UI Automation in your application, do the following:

  1. Include Uiautomation.h in your project headers. This header will bring in the other headers that define the API.
  2. Declare a global pointer to an IUIAutomation interface.
  3. Initialize COM.
  4. Create an instance of the CUIAutomation class and retrieve the IUIAutomation interface in your global pointer.

The following example function creates the object instance and stores the retrieved interface address in the global pointer g_pAutomation:

HRESULT InitializeUIAutomation()
{
    CoInitialize(NULL);
    HRESULT hr =
        CoCreateInstance(__uuidof(CUIAutomation),
            NULL, CLSCTX_INPROC_SERVER,
            __uuidof(IUIAutomation),
            (void**)&g_pAutomation);
    return (hr);
}

Directly Obtaining UI Automation Elements

Once you have the UI Automation object, you can discover the entire UI. The UI is modeled as a tree of automation elements (IUIAutomationElement interface objects), where each element represents a single piece of UI. The IUIAutomationElement interface has methods relevant to all controls, such as checking properties or setting focus.

The root element of the UI Automation tree is the desktop. You can obtain this element by calling either the IUIAutomation::GetRootElement or IUIAutomation::GetRootElementBuildCache method. Each method retrieves an IUIAutomationElement interface pointer, from which you can search or navigate the rest of tree, as described later in this article.

If you have screen coordinates-such as the cursor position in this example-you can retrieve an IUIAutomationElement interface by calling the IUIAutomation::ElementFromPoint method.

To retrieve an IUIAutomationElement interface from a window handle (HWND), call the IUIAutomation::ElementFromHandle method.

You can retrieve an IUIAutomationElement interface that represents the focused control by calling the IUIAutomation::GetFocusedElement method.

UI Automation Properties for Clients

Properties of IUIAutomationElement objects contain information about UI elements, usually controls. The properties of an element are generic to all elements and not specific to a control type. Control patterns, discussed later, expose control-specific properties.

If a UI Automation provider does not implement a property, UI Automation is able to supply a default value for that property.

UI Automation properties are read-only. To set properties of a control, you must use the methods of the appropriate control pattern; for example, use the IScrollProvider::Scroll method to change the position values of a scrolling window.

To improve performance, you can cache property values of controls and control patterns when you retrieve elements.

Some generic properties, and all control pattern properties, are available as properties on the IUIAutomationElement interface or control pattern interface, and you can retrieve them with accessor methods.

When using the generic accessors, you must specify properties with the property identifiers defined in UIAutomationClient.h. You use them to specify properties when retrieving property values, constructing property conditions, and subscribing to property-changed events.

If a UI Automation provider does not implement a property, UI Automation is able to supply a default value for that property. For example, if the provider does not support the property identified by UIA_IsDockPatternAvailablePropertyId, UI Automation returns FALSE.

Control Patterns

Control patterns complement properties. Control patterns are collections of associated properties, events, and methods that describe an element. More than one pattern can apply to a single element.

A control pattern represents a collected group of capabilities. The pattern might be something simple like the Invoke pattern, which lets clients invoke a control. In contrast, the more complicated Value pattern supports getting and setting a control’s value, and then checking whether the value is read-only.

You can obtain a control pattern by calling the IUIAutomationElement::GetCurrentPattern or IUIAutomationElement::GetCurrentPatternAs method, or their cached versions. Once you get a control pattern interface, use it as you would the element itself, either by directly calling control pattern methods or by accessing control pattern properties.

The TurnItUp method in Listing 2 uses the RangeValue pattern to set a control (such as a volume slider) to its maximum value.

In addition to checking for errors when getting the current pattern, this example also checks for NULL because NULL is a valid return value when a control does not support the pattern requested.

Control Types

While a control type is mechanically a simple enumeration property, on a conceptual level it is much more important. An element’s control type is a broad classification of the control, such as button, window, or list box. Each control type has certain expected control patterns. For example, a button control should support either the Invoke pattern or the Toggle pattern, and a hyperlink should support the Invoke Pattern and possibly the Value pattern.

Some controls have conditional support for several control patterns. For example, the menu item control has conditional support for the Invoke, Expand Collapse, Toggle, and SelectionItem control patterns, depending on the menu item control’s function within the menu control.

Searching and Navigation

There are two major ways of getting to elements. The first is to use a Find method, which lets the UI Automation core minimize cross-process calls and improve searching time. The second is to use a tree walker to walk a UI tree.

Using Find Methods

The Find methods require three things: a parent element, a scope to search on, and (most importantly) a condition to search for.

You create conditions by calling various APIs on the CUIAutomation object. You can create simple Property conditions, or combine them with And, Or, and Not operators to produce more complex conditions.

Once you create a condition, you need to specify the element to start searching from and the search scope. The element you want to start from is the element on which you call the FindFirst or FindAll method. The search scope is typically one of three choices: the element itself, its children, or its descendants.

You usually want to limit search scope as much as possible to improve performance and to reduce the chance of finding an element you weren’t looking for.

Listing 3 is a simple example of searching for a named application window using FindFirst methods. In the example, the search condition (a particular name) is quite simple. However, conditions can be considerably more complex.

Using Tree Walkers

Another method for navigating through the tree is the use of tree walkers. Tree walkers allow use of direct-navigation methods such as moving between parent, child, and sibling to walk through a filtered view of the tree. There are several built-in filtered views:

  • Raw or unfiltered, which shows all elements.
  • Control view (the default), which filters out elements that either are redundant or are just used for layout.
  • Content view, which filters controls even more selectively than Control view does.

Users can also create their own custom views with Conditions.

The following simple example shows how to walk to the first child of the control referenced by pElement:

// Get the control view walker
IUIAutomationTreeWalker * pWalk;
g_pAutomation->get_ControlViewWalker(&pWalk);
    
// Go to the element's first child
IUIAutomationElement * pFirst;
pWalk->GetFirstChildElement(pElement, &pFirst);

UI Automation Events for Clients

UI Automation lets clients subscribe to events of interest. This capability improves performance by eliminating the need to continually poll all UI elements in the system to see if any information, structure, or state has changed.

The properties of an element are generic to all elements and not specific to a control type. Control patterns, discussed later, expose control-specific properties.

Efficiency is also improved by the ability to listen for events only within a defined scope. For example, a client can listen for focus change events on all UI Automation elements in the tree, or on just one element and its descendants.

Client applications subscribe to events of a particular kind by using methods such as AddAutomationEventHandler or AddFocusChangedEventHandler to register an event handler These methods take a callback interface, which should be implemented in an object created by the client. A method such as IUIAutomationEventHandler::HandleEvent will then be called on the appropriate thread whenever the event occurs.

On shutdown, or when UI Automation events are no longer of interest to the application, UI Automation clients should remove the events. They do so either by calling a removal method for specific events such as RemoveAutomationEventHandler or RemoveFocusChangedEventHandler, or by calling the RemoveAllEventHandlers method to remove all events.

UI Automation and Screen Scaling

Starting with Windows Vista®, users can change the screen resolution (in dots per inch or dpi) so that most UI elements appear larger. Previously, the scaling had to be implemented by applications. Now, the Desktop Window Manager performs default scaling for all applications that do not handle their own scaling. UI Automation client applications must take this feature into account.

The UI Automation API does not use logical coordinates. Methods and properties either return physical coordinates or take them as parameters.

By default, UI Automation applications that run at a screen resolution other than 96 dpi will get incorrect results from these methods and properties. For example, because the cursor position is in logical coordinates, the client cannot simply pass these coordinates to the IUIAutomation::ElementFromPoint method to get the element that is under the cursor. In addition, the application will not be able to correctly place windows outside its client area.

The solution to this situation has two parts:

  • Make your client application aware of screen resolution by calling the Win32 function SetProcessDPIAware at startup. This function makes the entire process aware of screen resolution, so that no windows belonging to the process are scaled.
  • Call GetPhysicalCursorPos to get the current cursor coordinates.

User Account Control and User Rights

Finally, you need to consider User Account Control and user rights to ensure your client can access the information it needs. Before a user launches an administrative application the operating system prompts the user for consent to run the application with elevated user rights. Similarly, when a user attempts to perform a task that requires administrative user rights, Windows presents a dialog box that asks the user for consent to continue. This dialog box is protected from cross-process communication, so that malicious software cannot simulate user input.

Only signed client applications are trusted to communicate with applications running at a higher user-rights level. To gain access to the protected system UI, applications must be built with a manifest file that includes a special attribute. This UIAccess attribute is included in the <requestedExecutionLevel> tag where the value of the level attribute is an example only, as follows:

<trustInfo xmlns=
    "urn:0073chemas-microsoft-com:asm.v3">
    <security>
        <requestedPrivileges>
        <requestedExecutionLevel
            level="highestAvailable"
            UIAccess="true" />
        </requestedPrivileges>
    </security>
</trustInfo>

The UIAccess attribute value is "false" by default; that is, if the attribute is omitted, or if there is no manifest for the assembly, the application will not be able to gain access to protected UI.

With access to the system and target application’s UI, your client applications can navigate the UI, retrieve properties and control patterns for UI elements, and listen to relevant UI events.