Want to Fix a System? Break It First

Want to see what happens to your trading desk or matching engines when you shut down a part of the electronic systems that support them? Try it.

That, in effect, is what the U.S. equities industry’s post-trade processing utility, the Depository Trust and Clearing Corp., does. At the Securities and Exchange Commission’s Market Technology Roundtable Tuesday, DTCC managing director and chief development officer Albert Gambale detailed a tactic his firm routinely uses to test its systems.

The clearer selects a branch office and has it halt operations and stand-down to see how the rest of the company’s nationwide system handles the shut-down.

Such testing of how well a system can keep working without one of its parts in place could have helped, for instance, Knight Capital on August 1 when it unleashed its flood of erroneous orders onto stock exchanges. Shutting down the misfiring part, in theory, would not necessarily have meant shutting down all its market making operations.

Crisis preparation such as these shutdowns and the tests that go with them are important because in the moment of crisis responsive actions must be handled with “military precision,” according to Saro Jahani, Chief Information Officer of Direct Edge, operators of the EDGA and EDGX exchanges. “Often, there is no time to find the absolute best person to handle it—rather, you have to focus on doing the right thing for the market.”

Direct Edge has its own experience with this type of crisis. Last October, the SEC cited Direct Edge for two system failures that allegedly caused millions of dollars in trading losses.

The first incident occurred in November 2010 when untested computer code changes caused the two exchanges to overfill orders. The unwanted trades involved an estimated 27 million shares in about 1,000 stocks, totaling roughly $773 million.

A second incident occurred in April 2011 when an exchange employee accidentally disabled database connections and disrupted the exchange’s ability to process orders and cancellations. EDGX members filed claims for more than $668,000 in losses.

Following the SEC’s citation, Direct Edge undertook “a comprehensive plan” of regulatory compliance, which included significant investments in technology and personnel, to harden its systems. Indeed, yesterday Direct Edge partnered with software tester Coverity, Inc., to improve how it tests software before it gets used by its exchanges. The “Edgile’’ approach automatically tests code and identifies issues as code is being written.

Dr. M. Lynne Markus, Professor of Information and Process Management at Bentley University in Waltham, Mass., outlined four basic elements that exchanges and trading firms should follow, to prevent glitches or limit their impact.

First, reduce complexity of systems, so there are fewer steps and fewer opportunities for errors, Prof. Markus explained.

Second, test systems thoroughly, to make sure they work on their own and in conjunction with systems with which they interact.

Third, monitor activity, at all times, so difficulties or aberrant actions are spotted instantly.

Fourth, keep backup capacity in place, to take over trading or speed recovery from a shutdown, so normal trading can resume as soon as possible.

“To ensure a robust system, these four strategies have to be addressed and utilized together,” explained Prof. Markus. “Otherwise, I am afraid that the crises we’ve seen over the past year or so may become the new normal.”

Communication is also key in managing a crisis. Anna Ewing, Executive Vice President and Chief Information Officer of NASDAQ OMX, said that during any market event, her firm has teams to address direct outreach to both regulators and customers, often while continually updating the exchange’s status alerts.

Throughout a crisis, it is essential that an exchange communicate with these groups—something that cannot be left to computers or software, she added. “In this, people matter,” Ewing said. “This is a part of the culture where people’s common sense is needed.”

Nasdaq OMX has set aside $62 million to compensate its customers for problems experience in handling orders in its IPO Cross system on the day Facebook shares went public; and in transferring those orders to the Nasdaq Stock Market itself.

Experts on the SEC roundtable also recommended segregating key duties so that any single dependency on any one piece of software or one person can be avoided.

They also recommended identifying whether problems are internal or external to begin with. And then having responses planned out, where possible.

“It’s not only about identifying the trigger points that lead to problems, but you have to identify what customers need during this crisis and how those obligations can be fulfilled,” said David Bloom, Head of UBS Group Technology at UBS.

Of course, fixing trading software systems during a market malfunction has its own particular problems.

“In most cases, you’d still be trying to triage it,” explained Lou Steinberg, Chief Technology Officer at TD Ameritrade, the online brokerage. “At that moment, you may not know what is happening and whether to fix it, or try to leverage the capacity you need to recover from it.”