Document created: 15 October 2003
Air University Review, November-December 1974
In July 1970 the Blue Ribbon Defense Panel passed the following severe judgment on Department of Defense Operational Test and Evaluation (OT&E) efforts:
There has been an increasing desire, particularly at OSD level, to use data from OT&E to assist in the decision-making process. Unquestionably, it would be extremely useful to replace or support critical assumptions and educated guesses with quantitative data obtained from realistic and relevant operational testing.
Unfortunately, it has been almost impossible to obtain test results which are directly applicable to decisions or useful for analyses. Often test data do not exist. When they do, they frequently are derived from tests which were poorly designed or conducted under insufficiently controlled conditions to permit valid comparisons. It is especially difficult to obtain test data in time to assist in decision-making. Significant changes are essential if OT&E is to realize its potential for contributing to important decisions, particularly where the tests and the decisions must cross Service lines.1
Since that time there have been important policy changes that significantly increase the role of OT&E in the systems acquisition process. On 13 July 1971 the Department of Defense (DOD) decisively linked OT&E to the important decisions to buy large-scale production quantities.
Test and evaluation shall commence as early as possible. A determination of operational suitability, including logistic support requirements, will be made prior to large-scale production commitments, making use of the most realistic test environment possible and the best representation of the future operational system available. The results of this operational testing will be evaluated and presented to the DSARC at the time of the production decision.2
On 19 January 1973, DOD took further steps to assure that OT&E is responsive to the decision process.3 This directive stressed that OT&E should be independent of the developer, timely, and realistic.
In September 1974 the United States Air Force began using a Special Operating Agency, the Air Force Test and Evaluation Center, to carry out service OT&E management functions.
The defense policy and management structure for OT&E is well advanced, but what of the execution of the tests themselves? Will their quality rise above the condition reported by the Blue Ribbon Defense Panel in 1970? To some extent OT&E has already improved, simply because there is now a feeling that the results are needed at a level where important decisions are made. It is the premise of this article that further improvement can be had by careful attention to some fundamental considerations. The mechanism now exists to use OT&E results as inputs to decision-making. The work that remains is to make sure that OT&E quality is worthy of this important purpose.
In the USAF, the test and evaluation process for systems acquisition has been divided into two types. The first, called development test, is concerned primarily with the engineering function of the design. Development test may also be thought of as one of the later refining steps in the design process, where the entire design or its components are subjected to selected test conditions that have been chosen to qualify or pass the engineering design. The development test is largely quantitative and may also be linked to the development contract as an incentive to contractor performance.
Another type of systems acquisition test, which is the topic of this article, is called operational test. The focus of the operational test is on the intended operation or use of the system. The dominant consideration for operational test is the relationship of the system to other enemy and friendly systems with which it may operate. The operational test will be active, will involve people, support, communications, and tactics, and will try to judge the contribution of the test system to the overall military effectiveness of the forces in which it will operate.
Another aspect that may need clarification is the use of the term “evaluation.” In current USAF usage, “test” refers to physical activities designed to secure data, while “evaluation” refers to the mental activity used in processing the test results and other relevant information to get useful conclusions. From this usage have evolved the terms “development test and evaluation” (DT&E) and “operational test and evaluation” (OT&E).
The proper conduct of OT&E, in my opinion, requires that the OT&E tester give attention to some basic considerations that are derived from the purposes served by his test. He must be attuned to his role in the larger context of systems acquisition and be able to direct his efforts toward the assigned task.
OT&E serves two main purposes. As previously noted, it provides information about the system for decisions in the systems acquisition process. OT&E also provides detailed information to support operational introduction of the weapon system. This second function has been carried out over several years without significant controversy and has not been the subject of recent OT&E policy changes. In the second function OT &E information supports the development of training programs, logistic planning, verification of manning levels and operating rates, and employment planning. All these uses require information about the expected characteristics of the system when employed in an actual operating situation. In contrast to early OT&E efforts that support production decisions, information for operational introduction can be served by later, more extensive OT&E conducted with production equipment in an environment more closely resembling actual operations.
Both procurement decisions and operational introduction require two kinds of operational information, one relating to effectiveness, the other to suitability. Operational effectiveness refers to the ability of the system to perform its intended military task; operational suitability refers to the compatibility of the weapon system with its surroundings. These are not completely separate questions since suitability factors (i.e., how well the system can be supported) may also indirectly influence combat effectiveness. Still, these classifications provide a useful way to think about test objectives, and they are commonly used.
There are several vital considerations that must be addressed in planning and conducting an OT&E. These considerations are basic and fundamental to a sound test that will convincingly answer the critical questions. These points may seem basic and obvious, but the importance to the USAF of a strong OT&E program, one that produces high-quality results, warrants continuing attention to fundamentals.
The situation in OT&E may be compared to that of a football team. No matter how sophisticated the game plan becomes, everything rests on the execution of fundamental skills. Also, it is important to realize that these are “considerations,” not shortcut methods to “get a handle” on the problem. In these quickly changing times, each OT&E is a new event, and the burden of proof must be on those who would bypass these basic considerations and treat a new OT&E as a repetition of any past OT&E.
The starting point for an operational test must be a definition of the operational mission, preferably in as much detail as possible. This definition should consider all intended missions, including combat, training, and other uses of the system. The definition should also include the likely range of operating conditions for each mission. Also needed is a full and complete description of enemy threats that may be encountered, with expected capabilities and characteristics. Finally, the definition must consider the friendly supporting systems with which the system will operate.
This mission definition should be as thorough and detailed as possible, for consideration of specific questions makes the critical test factors more readily identifiable. For example, consider the questions “What kind of runways will an aircraft normally use?” or “How much loiter time is needed in the target area?” These questions are important to the evaluation of close air support systems, and a complete evaluation requires some answers. To cite another case, in the counterair mission much depends on the enemy defensive capabilities in the intended operating area, and complete evaluation of an air-to-air fighter system cannot be made until this hostile environment is defined. These cases briefly illustrate the importance of trying to answer specific questions about the intended use of the system as a fundamental starting point in planning an OT&E. It must also be recognized that many of the detailed questions cannot be resolved with a definitive quantitative answer. A question concerning range or loiter-time requirements may be answered by trade-off analysis to show that there is a range of “acceptable” value, each with associated penalties in other capabilities. Also, not every specific question that may be raised need be answered. The value of the procedure is realized if a judgment is made as to which factors are important enough to define clearly and which are not.
The test reference mission may be derived from the same source that provided the basis for the development program, but it cannot be identical. A number of years will have passed since the requirement studies were done, and significant updating changes may need to be made due to changes in the threat, supporting system, logistics, deployment posture, or even added new missions. The essential point is that there must be a reference (operational mission definition) in order to make a comparison (operational evaluation).
The mission definition is inevitable. Even if this mission definition is not written and carefully considered, it will nevertheless exist in the minds of the evaluators, where it may be erroneous, fuzzy, or incomplete. This informal, personally held mission definition might be correct, but it is not readily available for review by decision-makers.
It is almost self-evident that an adequate mission definition must exist as a standard against which to measure the weapon system.
Spelling out test objectives may be straightforward if two things are known: first, the mission definition is needed; and, second, there must be a definitive statement of the information wanted from the OT&E. These information needs are largely a management function. If the test supports a production decision, the key factors in that decision should be identified so that they may be purposefully satisfied by the test from its inception. In the current DOD directives these may be derived from the “critical questions and issues” that are pertinent to the specific decision.4 These key factors must be understood before preparing a test plan because an operational test that supports a production decision will usually use limited quantities of development hardware. With limited time and resources, the test must specifically address the questions in the minds of decision-makers. Such specific management questions are the primary reason that the early OT&E exists, and the capability to answer such questions was the primary incentive to the recent OT&E policy changes. Later OT&E that supports operational introduction also requires specific information, but these needs are more varied. Varied and diverse information needs may never come to focus in a single key event like the production decision, but they are no less important. Operational data are the lubricating knowledge that should make the introduction of the system smooth and avoid the slow and painful process of relearning in actual operations the techniques and procedures that have been learned by others in a test program.
Operational testing must be designed to reflect adequately the conditions that will exist in actual operation. The answers provided by OT&E must deliberately be made relevant to the real employment of the system because there are many obstacles to realism. The test cannot possibly have total realism, for the only full measure of combat reality is combat itself. Furthermore, each instance of combat has been unique, and it is impossible to predict the future unique combat situation that a new system will experience. Yet, if the purposes of OT&E are to be met responsibly, someone must create an acceptable description of this unknown future reality.
Realism is vital to keep the OT&E from becoming a repeat of earlier development analysis. Some analysis and evaluation will always be needed to convert test results into a usable form that can be projected into the future; but if the test itself has few elements of realism, then a greater amount of analysis and judgment (or guessing) is needed to bridge the gap between test and reality. The basic reason for performing a test is to confirm the utility of a design resulting from earlier analysis. It therefore follows that the test should take as large a step as possible away from analysis and toward full operational reality. An active effort is needed to achieve realism. If realism is not earnestly sought and operational tests are conducted in the test environment that just “naturally happens” at a test site, the test situation will be primarily oriented to the restraints imposed by engineers, range and traffic controllers, safety supervisors, data collectors, and many others whose support is needed. The dominant factor will then be convenience, not realism. Although total realism is not possible, there are some steps that can be taken to introduce this vitally needed realism into the test situation:
Use of two-sided tests. War is a two-sided affair. Move and countermove come in an endless stream. Sometimes the action is fast-paced, and sometimes events move slowly as each side thinks about the situation and devises new approaches to the contest. The human gifts of ingenuity and adaptation are constantly in use as military tacticians try to employ men and materials in a more advantageous way. This innovative process has an uncanny way of quickly exposing and exploiting the strengths and weaknesses of weapon systems. These same desirable effects can be realized in a test situation simply by making the test “two-sided.” Even a small amount of two-sidedness is helpful. For example, one-on-one engagements between tactical fighters are a useful way to bring out critical design features for evaluation, even though it is recognized that the real world is usually larger than one-on-one. Limited two-sided tests are valuable to the extent that they represent key competing elements of the larger situation.
In the past, one of the problems with two-sided tests has been organizational. For example, the resources needed for a two-sided test of bombers versus fighters were in different Air Force commands, while the forces needed to conduct an air-versus-ground engagement were in different services. While no intentional bias has existed against two-sided tests, the various organizations naturally tended to focus attention on their own pressing problems to the neglect of objective two-sided operational tests. Recently some favorable changes have come about, and one excellent two-sided test, COMBAT HUNTER, was conducted in 1972 using Army and Air Force resources. Further two-sided tests are now being planned, and this trend may be expected to continue, in view of expressed DOD support for joint tests.5 Also, recent emphasis on coordinated efforts at the service level should help remove this obstacle to two-sided tests in the organization.
Increased scale. As football coaches know, a partial two-sided drill is not as helpful in assessing a team as a full-dress scrimmage against a competent team. Likewise, the larger the scale of the test, the more likely that it will include all the important force elements. In the example of a one-on-one fighter engagement, the test becomes more comprehensive when other aircraft are introduced (perhaps four-on-four) and elements of the ground environment are added, e.g., radar sites, surface-to-air missiles, etc. With the scale of the test increased in this way, the results may reveal deficiencies in communications links or in pilot-to-aircraft interface problems that occur only when the pilot workload becomes high. The major obstacle to large-scale tests is their increased cost and complexity. The operating cost of each element in active test time may be small, but these same resources will, in all probability, be lost to other uses for a greater period of time because of the inherent difficulties in coordinating and scheduling a large and complex test operation. One must therefore approach increases in the scale of a test in a selective way, choosing those elements which experience or analysis shows to be important while omitting for the sake of economy those which are expected to have a minor influence on the results.
Removal of unnecessary constraints. Realism may also be improved simply by removal of the unrealistic and unnecessary restraints of the normal test environment that will not exist in the expected employment situation. The key word is “unnecessary,” and if a restraint is to be kept, one must ask, “Why is the condition necessary?” Often a closer look at the restraints will reveal ways that they can be avoided. Following are typical test restraints:
Data systems. The requirement for engineering data will usually result in limits on altitude or operating area, to remain within the instrumented range area. Telemetry reception, photo coverage, and positioning information all have their own characteristics that may limit the way a test is conducted.
Weather. The tactically difficult “bad weather” needed for an operationally realistic mission may simply not be readily available at the test site. In another case, safety or data considerations may require clear weather when the existing weather is actually realistically bad. In most cases, operational realism will call for consideration of a wide range of weather conditions while the typical test restraints will normally favor good weather tests during daylight hours.
Airspace. Airspace for operational testing is often smaller than desired and located in places where the earth below is used for a totally unrelated purpose, such as farming, residence, game preserves, or national parks, thus ruling out supersonic flight and the dropping or firing of various objects from an aircraft. Unfortunately, little can be done about these restrictions in existing operating areas. Recognizing this difficulty, the USAF has initiated the Continental Operation Range program, which seeks to make larger, more useful airspace areas available for testing and operational training.
Safety. The most difficult limitations to relieve are related to safety. Safety limitations are usually imposed for good reason, based on experience with accidents. The desire for safety may have an even more compelling reason during a test program than would normally exist because the test resources may be “one of a kind” prototypes, the loss of which would have serious consequences to the entire program. It is very difficult to press for test realism in the face of a potentially hazardous situation. The elements of realism that are sought at the expense of safety must be essential to a convincing test that will answer important questions.
Use of representative hardware. Realism is enhanced when the most representative test and supporting items available are used. In the past, most newly developed systems have used development hardware for operational testing. Under present systems acquisitions policy, the basic structure of a development program is designed to provide a reasonably mature system for operational evaluation. Representative test supporting items are also important. In recent years one of the most difficult test problems has been encountered with targets supporting air-to-air missile tests. Target drones are often destroyed during air-to-air missile tests, and the development of drones has therefore emphasized a low-cost vehicle. At the same time, a target drone that can adequately reflect the speed, maneuver, and radar and infrared signature of an aircraft tends to be almost as large as an aircraft. In fact, one solution has been to convert aircraft that have been retired from active service into unpiloted targets. This approach has provided more representative targets, but with these large targets there has been a tendency to conserve target aircraft. It is very difficult to design a fully realistic missile test and at the same time conserve the target. There is a basic conflict between the objectives of the missile test and the desire to conserve targets. The difficulties in obtaining fully representative test support items suggest the need for a continuing effort to develop improved test techniques and supporting hardware as a part of the overall OT&E capabilities program.
point of view
It seems self-evident that a test should be objective and should represent the situation as viewed by a prospective user, but there is a strong human tendency against objectivity when one is personally involved in a project. This tendency, which might be called the “success syndrome,” occurs when the tester desires to be associated with a successful weapon system program rather than an unsuccessful one. This attitude, which stems from a desire for personal career success, will inevitably creep into the selection of test conditions and the subjective interpretation of results.
In contrast to a successful weapon system program, a successful test program does not depend on the test outcome. A successful test program may have any result if it is valuable to the decision process. A successful test program might very well spell the end of a weapon system program and save production funds from being spent on a lemon.
The tester must be neither success-oriented nor excessively critical, for by his actions in test planning and evaluation he can influence the outcome for the weapon system. The tester must be objective and faithful to his purpose, which is to provide reliable, accurate facts and considered judgments as a basis for good decisions. The decision-maker must also take care that he does not inadvertently encourage the success syndrome by praising the tester for the successful system. Plaudits for a successful weapon system belong to those who participated in its development. Testers, by contrast, must be rewarded for sound test execution, thoughtful evaluation, and honest reporting.
The tangible outputs of a test are the reports it provides. These reports support key decisions or other events and must meet the schedule of the events they support. From the outset, a test must be organized based on knowledge of which organization needs what information, when, and for what purpose. If these things are not known, the test tends to serve itself and its internally generated ends, and one might properly ask, “Why is this test being done?”
The frequency, format, style, and communication of test reports should be specifically adapted to the test at hand and not simply patterned after precedents. Interim reports, TV or film reports, briefing reports, letter and message or telephone reports should all be considered as possible means to get needed information into the proper hands on time.
Tests alone do not provide simple answers totally applicable to operational reality. Evaluation is needed to apply reasoning and judgment to the test results and answer the operational questions about a weapon system’s effectiveness and suitability. In considering this process, it is important to remember that judgment is a personal, subjective quality. It resides with individual people and reflects their knowledge, attitudes, and experiences. For an operational evaluation, this background resides with individuals who possess significant military experience of a kind most closely related to the projected military environment.
But experience alone does not insure an adequate evaluation. These same individuals, while possessing relevant experience, must then apply themselves with an eye to the future. Their task is not to measure tomorrow’s weapons against yesterday’s battlefield but to envision the conditions of the future and evaluate test results against that future. Evaluators must not take for granted that any particular aspect of past experience will apply in full measure to the future, but at the same time they must make full use of the insights gained from this experience to produce an operational evaluation oriented to the future.
There are two perspectives that may be used to view OT&E costs. One viewpoint stresses the program cost implications or those costs associated with arranging a weapon system development and production program so that adequate OT&E may be conducted before committing funds for production equipment. The other viewpoint could be called a preventive costs approach, for it stresses the use of adequate OT&E as a means to minimize the probability of a serious mistake.
In a somewhat oversimplified explanation, these two viewpoints may be related to the systems acquisition concepts of “concurrency” versus “fly before buy.” In a fully concurrent program, the decision to design, develop, and produce the weapon system is made at the outset. All activities proceed together so that the time to complete the full program is minimized and efficient use of design and production resources is possible. This is undoubtedly the preferred approach—if there are no mistakes. But people do make mistakes, and in a concurrent development program the only way to rectify a mistake is to stretch out the program, slow down the planned production, and then retrofit the defective items already produced. To avoid these very significant consequences of a mistake, the “fly before buy” concept plans for an orderly “stretched out” program, which uses OT&E to reduce the probability of buying weapon systems that must later be fixed. A detailed consideration of program costs related to OT&E is not really necessary here because a “fly before buy” policy has been adopted, and the somewhat higher initial program costs associated with that decision are accepted, both to achieve a better product and to control risks.
On the other hand, the direct costs of OT&E are not a closed question. These costs will remain vulnerable to the financial pressures that may exist in a weapon system program. In such circumstances, an OT&E program, like a safety program, should be considered in relation to the disasters it prevents. It is penny-wise and pound-foolish to cut corners on a test program that is intended to answer major questions in support of a production decision. Test resources must, of course, be managed efficiently to get the most from each test dollar. However, when allocating test resources, it is better to err on the side of a more-than-adequate test than to risk a significant error in a production decision. A production decision error may result in the purchase of large quantities of ineffective or unsupportable systems, causing expensive retrofit programs and substantial delay in reaching a combat capability. It is this sobering possibility that should be balanced against the direct costs of an OT&E program.
Operational testing is now firmly established as a part of the systems acquisition process. In the future, new systems exploiting expanding technology will continue to create possibilities for operational employment that cannot be closely linked to our previous experience. This situation will, in turn, demand more careful consideration and greater ingenuity in the design of operational tests and will require greater management skill to carry out these new tests. The emphasis must shift away from the routine use of established test procedures and toward developing methods of test problem analysis. Such analyses should include the basic considerations discussed here and stress a tailor-made OT&E for each application.
The tester must keep one thought constantly in mind: the purpose of the test. He must plan, execute, evaluate, and report with a concern for producing the information needed by others. He must conduct a deliberate, orderly, and well-considered program that addresses the fundamental considerations which may be important in his situation. He must avoid the complacent view which holds that all OT&E is basically alike and one may simply pattern the current test after a convenient precedent. He must recognize that each test has a unique set of circumstances and may demand a near-total reconsideration of the answers which applied to any previous test. All this he must do with a feeling for the combat operations of the past and an eye to the possibilities of the future.
Air War College
1. Blue Ribbon Defense Panel, Gilbert W. Fitzhugh, Chairman, Report to the President and the Secretary of Defense on the Department of Defense, Washington, GPO, 1970, p. 89. Also called the Fitzhugh Report.
2. Department of Defense, DOD Directive Number 5000.1, 13 July 1971, para III C 6.
3. Department of Defense, DOD Directive Number 5000.3, 19 July 1973, para III C 4.
4. Ibid., para III H 1.
5. Ibid., para VII E.
Lieutenant Colonel Clyde R. Robbins (M.S., University of Southern California) is Assistant Chief of Staff, Operations and Plans (J-3), Iceland Defense Force. His tactical fighter experience includes operational duty in the F-84F, F-100D, F-101A, and F-4C, and he flew 100 combat missions over North Vietnam. He has been an operational pilot with F-4E Joint Test Team and an operations staff officer and division chief in DCS/Requirements, Hq Tactical Air Command. Colonel Robbins is a 1974 graduate of Air War College.
The conclusions and opinions expressed in this document are those of the author cultivated in the freedom of expression, academic environment of Air University. They do not reflect the official position of the U.S. Government, Department of Defense, the United States Air Force or the Air University.