Please Note: You must be logged in to edit this wiki and your account must be assigned "editor" rights (set by administrator).
Volume 2 - Use Case Document
- 1 Context
- 2 Structural Test (ST)
- 3 Configuration / Tuning / Instrumentation (CTI)
- 4 Software Debug (SD)
- 5 Built-In Self Test (BIST)
- 6 Fault Injection (FI)
- 7 Programming / Updates (PU)
- 8 Root Cause Analysis / Failure Mode Analysis (RCA/FMA)
- 9 Power-on Self Test (POST)
- 10 Environmental Stress Test (EST)
- 11 Device Versioning (DV)
- 12 References
About this Document
This document, Volume 2 of 5 volumes, provides an introduction to the System Level Use Cases identified for SJTAG. Each Use Case, or application field, is described and potential benefits and penalties are set out to allow the reader to make an assessment of the appropriateness of the SJTAG Use Cases within their own operations.
Readers seeking further information on specific topics are directed to the following volumes:
- Volume 1 – Overview
- Scope of SJTAG
- Purpose of SJTAG
- Primary Constraints
- Relationship to Other Standards
- Volume 2 – Use Cases
- Structural Test
- Software Debug
- Built-In Self Test
- Fault Injection
- Root Cause Analysis/Failure Mode Analysis
- Power-on Self Test
- Environmental Stress Test
- Device Versioning
- Volume 3 – Hardware Architectures
- Hardware Topologies
- Device Under Test Connectivity Schemes
- XBST - External BST
- Built-in Self-Test
- In-System Programming/Configuration (ISP/ISC)
- EBST - Embedded BST
- Tooling Requirements
- Conformance Levels
- Volume 4 – Languages and Data Formats
- Language Mapping to Circuit Model
- Role of Languages
- Dynamic and Static Information
- Implementation Issues
- Volume 5 – Business Case
- Topic 1
- Abbreviations and Glossary
Structural Test (ST)
Arguably the most common and most mature use of the JTAG interface and the Boundary Scan resources implemented in IEEE 1149.x compliant devices is what many refer to as "Structural Test".
ST Application Fields
"Structural Test" in this sense refers to the verification of interconnects between digital I/O pins (in the case of IEEE 1149.1) and/or analog or mixed-signal I/O pins (in the case of IEEE 1149.4) on a printed circuit board (PCB) or between PCB's/modules within a system. The purpose of most structural test applications is the detection and diagnosis of quasi-static faults, such as stuck-at-0/1, shorted signals, or open pins.
ST Detailed Description
Automated Test Pattern Generation (ATPG) tools are widely available for the generation of test vectors that verify connections between Boundary Scan enabled device pins. Various tool sets available on the market also provide ATPG tools that include non-Boundary Scan circuits (such as memory devices, glue logic, and other digital components) in such connectivity tests. The use of Boundary Scan for structural testing can be extended to include analog circuits, such as Digital-Analog-Converters (DAC) by providing stimulus on the digital side via Boundary Scan and by measuring the analog value of the DAC output directly or via an ADC interface that may be part of the circuitry. One could argue that this kind of cluster testing has less to do with structural test than with semi-functional test. Still, Boundary Scan access to the signals under test drastically simplifies the test development for this kind of cluster test applications (no complicated CPU access cycles or DSP control cycles are needed to set up the DAC or the ADC devices, for example).
For the dynamic test of high-speed signals one can utilize IEEE 1149.6 test resources if devices in the design provide respective capabilities. This would provide additional testability not achievable with IEEE 1149.1 alone (e.g. improved fault detection and diagnostics on differential signals, or detection of an Open fault along AC-coupled signals). IEEE 1149.6 is most useful in testing board-to-board interconnects (e.g. the fabric interconnects in ATCA), but it requires the synchronization between two or more the scan chains on the boards. This imposes a constraint on the system scan chain topology and gateway devices that needs to be accounted for: At a board test level with an external test controller it's not uncommon to have two or more TAPs that can be run synchronously, but in a system we seem to generally expect that only a single chain is presented to the controller (ignoring any dual-redundant architectures).
The major benefit of using IEEE 1149.x / Boundary Scan for structural test is the elimination of the need of external test access (beyond the JTAG test bus signals required to transmit the test pattern). This allows the application of structural tests for boards/modules plugged in to a system chassis and even the verification of connections between system modules (provided a respective test bus infrastructure is implemented). This also enables the execution of structural test during HALT/HASS testing.
SJTAG related special considerations:
- Slots are dynamically populated with different types of boards and therefore change the topology of the system model (static system data models do not work);
- Multi-drop architectures create problems for vector management unless the UUT manages its own vectors;
- Board-to-board testing is dependent on the current population of the system; not all telecom systems are configured the same.
- How to identify the attributes of a board, for example board ID, JTAG chain numbers, length, order, chip location, and so on?
- How to find the test vector files related to boards? (Maybe stored in a flash chip on board or on a external storage medium. Access path might have to be specified.)
- Unify methods how to identify board attributes. For example access JTAG chips ID, access a special signature of a board or specifying signal lines to be pulled up or pull down → opportunity to define a BSDL like file for a board.
- Unify the vector format of a board and accessing methods. for instance, adopting uniform STAPL or SVF; and specifying how to name vector file and uniform stored location.
- System MTC software can reconfigure test topology structure after original hardware configuration is changed.
ST Alternative Techniques
There are no real alternatives to Boundary Scan for structural test at system level. One can consider functional test as an alternative method, but diagnostics in functional test are typically not as granular and detailed as with Boundary Scan, and developing functional tests, especially with a somewhat useful level of diagnostics, is a time consuming, manual process, compared to the quick, automated generation of Boundary Scan tests.
ST Tooling Requirements
- Ability to import board/module net list files
- Ability to merge board/module net list files to create hierarchical description of the system
- Import of BSDL files
- Modeling of system structure (including BScan and non-BScan pin functions) in preparation for test generation
- Automated test pattern generation
- Test vectors should be able to detect open, short, and stuck-at 0/1 faults at board/module level and provide for pin level diagnostics
- Automated handing of system level scan chain infrastructure
- If tool is embedded in the system, remote control and test pattern / result transfer is beneficial
- Control of number of simultaneously switched nets
- Control and management of board-to-board interfaces
ST Value Proposition
- Re-use of board-level tests
- Lower cost/faster development of system level tests
- Better diagnostic granularity
- Deterministic knowledge of fault coverage
Ground bounce can be a problem in structural test because one could be exercising many more nets simultaneously than one might do during mission operation.
Boards may interact with each other, and test design needs to take this into consideration to avoid unexpected behaviour.
Configuration / Tuning / Instrumentation (CTI)
This topic is similar to the role of the Programming/Updates use case, but is targeting a very different set of system features. For Programming/Updates, the target is a programmable device to change its logical behavior (e.g., its design structure).
CTI Application Fields
Still often being very underutilized, Configuration and Tuning via Boundary Scan is starting to grow (via special registers), making Boundary Scan be a real-time component for system maintenance. With the advent of voltage monitors and other instrumentation devices, more people are leveraging the 1149.1 bus as an interface to the instrumentation since it is available for testing purposes already. As more analog devices with settable registers are coming around we are seeing the use of programmable resistors and other programmable aspects showing up.
CTI Detailed Description
The Configuration/Tuning/Instrumentation use case involves programming of functional features within the system. This primarily consists of various forms of instruments built into devices which control functions like power conditioning, temperature monitoring, and even overlaps aspects of BIST control. The IEEE P1687 working group is actively pursuing IEEE 1149.1 solutions for interfacing to these instruments in a standardized form to allow for more automation of instrument control in the future. Once this standard is adopted, this use case has the potential to become more important for the system level as many of these instruments are also accessible from the system interface. Moreover, since these instruments are managing more functional aspects of the system, this use case might be required for normal system operation and setup/configuration.
Some of the applications IEEE P1687 covers, however, will require a parallel access mechanism in addition to the IEEE 1149.1 TAP. Such requirements would then need to be considered at the system level, too.
An example of instrumenting devices using current technologies is given in the Xilinx article "Using the JTAG Interface as a General-Purpose Communication Port", which makes use of the Silicon and Software Systems' General-purpose Native jtAg Tester (GNAT) interface.
CTI Alternative Techniques
Often, functional interfaces / protocols (such as I2C) are used to tune/configure such devices.
CTI Tooling Requirements
Most device vendor tools work well with that vendor's devices but not with other devices integrated into the communications bus. Especially at the system level, it is important that different devices from different vendors can coexist and be accessed on the same communication interface. Tools need to be aware of Chain Management requirements, for example power management devices with a JTAG interface may need to be excluded from the chain at certain times.
CTI Value Proposition
Traditionally, people have been using many other mechanisms in their system to provide the same interface capability to the instruments. The most prominent one is the I2C interface. The justification for IEEE P1687 is that, since most of the devices containing instruments already feature IEEE 1149.1 resources to support manufacturing test, it makes sense to support a single interface access mechanism to the features inside a device. Thus, the designer would not be required to support both an 1149.1 interface and an I2C interface to the device.
Also essential for the Configuration/Tuning/Instrumentation use case is the description of the capabilities (i.e. how to describe what resources are available for configuration and tuning and how to use them, what is the meaning of those resources, etc.); such description is needed to support automated tooling. Description of ports, algorithms, and protocol communications seem to be the key elements missing in order to better support configuration and tuning at the SJTAG level.
Many configurable devices will not support direct JTAG access. Indirect access, e.g. emulation of an I2C interface using JTAG controlled I/O pins may be too onerous for many applications.
Indirect access may be time consuming if the scan chains are of significant length. In some cases it may be necessary to create short chains within e.g. an FPGA.
It may be problematic if coordination is needed between two devices, especially if these are on separate chains, e.g. tuning SERDES links between boards.
Software Debug (SD)
This use case is in its infancy in regard to system level applications, for reasons outlined in this section. The Software Debug use case has many overlaps with other use case examples and provides the opportunity to begin to bridge the gap between structural test and functional test.
SD Application Fields
The primary goal of Software Debug is to gain access to a processor which is hung in some unknown code and be able to interrogate the PC, stack, registers, and possibly be able to get a handle on the state of the processor and system to better understand where in the software it has gone to. Software Debug would greatly assist the root cause analysis of the software failure. It would be nice to be able to do this type of troubleshooting at the system level and perform the analysis with the system covers on, without the need to use extender boards or open up the system to run the experiment.
SD Detailed Description
Emulation interfaces between the various processors are not compatible and for many tools there seems to be a difficulty supporting more than one processor at a time with the emulation interfaces. Furthermore, current emulation tools do not support any kind of system level JTAG connection protocols (e.g., SCANBRIDGE, ASP, etc.). Current emulator interfaces, except for possibly the Nexus interface, require the processor to use other signals in addition to the standard JTAG test bus signals to force the processor into an emulation mode. To do this at the system level, one needs to also route these additional signals to the outside and have a way of controlling their access to the processor under emulation through the SJTAG connection process. This makes it difficult to gain access to these signals from a system level. This is especially true if there are multiple processors on a board that need to be accessible through an emulation interface (DSPs, CPU cores in FPGAs, multiple CPUs). Entering emulation mode often also wipes out the state of the processor (the emulation interface is typically only able to be activated following a reset of the processor, although there are some processor designs appearing which allow read only access to the state of the registers within the processor while not in direct emulation mode). Once ratified, IEEE P1149.7 will resolve some of these problems.
One activity where the emulation environment is playing a larger role in manufacturing test is the area of performing supplemental functional test via the emulation interface. Some 1149.1 tool vendors are now beginning to provide integrated solutions with their tools to provide a set of canned functional tests that cover portions of the circuit which are not covered with straight Boundary Scan tests. In addition to emulation functions being utilized for connectivity tests, certain types of emulation based tests may also be able to test a circuit at speed - something that IEEE 1149.1 is unable to solve. In addition to testing and debugging, on-chip programming can also be done through the JTAG port and emulation functions, especially for programming FLASH.
An important issue is also system security. We have to differentiate between security issues related to someone trying to gain access to the physical system, and concerns related to IP (intelligent property) inside the systems, which may be accessible even with covers-on. Getting access to the system could also allow someone to interrupt the service the system provides; safeguards are needed to avoid accidental Boundary Scan access to boards that are in service when troubleshooting other parts of the system (this is a general problem whenever entering emulation / debug modes).
SD Alternative Techniques
The IEEE-ISTO "NEXUS 5001" program attempts to introduce standardization of the emulation/debug ports of embedded processors. This is slightly different from the Use Case presented here for SJTAG in that the SJTAG case can incorporate the drive and sense of signals external to the processor(s).
SD Tooling Requirements
Coordinated scan operations to handle different processors simultaneously via a common interface;
SD Value Proposition
Access can be from the board edge, avoiding the attachment of emulators and the need to fit the board on an extender, thereby maintaining the operational signal paths. Some newer gateway device provide digital I/O lines as part of the multi-drop interface, which may be utilized to control signals such as HRESET and SRESET.
JTAG can offer access to data indicating system states that are not available to conventional debugging methods, e.g. signals or registers external to the processor.
NEXUS illustrates something of the capabilities that will become available through 1149.7, where the processor state can be accessed at any time, including where the processor is hung, and avoiding the need to have control of HRESET or SRESET signals.
Unable to support at-speed operation, due to reltively slow TCK and length of scan chains.
It is not currently possible to have different interfaces coexisting within a single chain, e.g. processors and DSPs: 1149.7 will make provision for such configurations providing that support is also present within the tooling.
In the case where devices have multiple cores, e.g. SoC or MCM, having a single TAP on the device as described by 1149.7 can offer significant advantages.
Built-In Self Test (BIST)
Built-In Self Test (BIST) becomes more popular these days, with device level implementations becoming more and more complex. BIST is not a full functional test, but rather is a functional test of sub-blocks, focused on structural defects, often times operating at functional speed.
BIST Application Fields
Functional Test of integrated logic functions; At-speed testing
BIST Detailed Description
SOC (System-On-Chip) applications implementing several CPU, DSP, Logic and Memory blocks on the same die are not uncommon today. To test these circuits efficiently, device internal test access is required. BIST addresses functional test rather than structural test, helping to determine if the logical function of a device (or group of devices) is correct. Furthermore, it becomes more and more important to test at least a few segments of modern circuits at speed. Boundary Scan does not provide for at-speed testing. However, the TAP defined in IEEE 1149.1 can be used to access internal test structures. Such internal test structures could be used to stimulate and/or observe parts of the circuitry implemented in an IC or they may even provide access to circuitry outside that device. After completion, the test results (e.g. a signature pattern) can be read out via the test bus interface.
BIST allows self-contained testing; its modular structure makes a hierarchy of BIST blocks possible, where the definition of BIST depends on the role of the test and where it is located/applied. BIST may be driven with IC internal registers, or it may be driven by stimulus on external device pins. Also to be considered is a concurrent execution of BIST (vs. sequential tests).
SOC type components may include hundreds if not thousands of BIST blocks. BIST, at a board level, may exploit internal BIST features of a device, and similarly BIST, at a system level, may exploit the BIST features of the boards in that system. During BIST, the board/FRU boundaries must remain in a "safe" state to ensure that false stimuli are not propagated through the remainder of the system. Does this imply that board level BIST is compromised within a system?
BIST Alternative Techniques
- System functional code (software/firmware enabled register access sequence) to initiate BIST
- I2C or other bus used to transact BIST request and response, e.g. to command MIL-STD-1553 transceiver to perform local loop-back test
- A CPLD that initiates BIST blocks and monitors returns
- Proprietary protocols
- BIST may not give as detailed a diagnostic as some other tests
BIST Tooling Requirements
Tools must provide that, during BIST, the board/FRU boundaries remain in a "safe" state to ensure that false stimuli are not propagated through the remainder of the system.
BIST Value Proposition
As designs adopt more SoC type architectures, SJTAG will offer a standardized access mechanism for the control and execution of device BIST routines.
BIST can be considered self-contained testing. Modern BIST can offer gains from concurrency and modularity, as a lot of testing is traditionally done sequentially. In a System-On-Chip that has 50 or more Memory BIST (MBIST) blocks a number of those can be kicked off together, running them concurrently (it may not be possible to run them all at once due to power limitations).
For FPGA based architectures, it has been demonstrated that vendor foundry test loads can be re-used within a BIST scheme to verify the internal structures of the device. For speed of execution, parallel loading may be used to apply the test loads, although the results can be recovered using JTAG.
- Power loading may be high.
- Typically gives less detailed diagnostics compared to a full structural test.
- BIST execution time may be constrained by other design requirements. This will consequently limit the testing that is possible.
Fault Injection (FI)
At the system level, JTAG presents the opportunity to inject faults without disturbing the system configuration by offering a non-physical means of setting pin/net states to simulate fault conditions.
FI Application Fields
- Setting of pin/net states to simulate fault conditions.
- Comprehensive design validation under environmental test conditions (thermal, vibration, EMC).
- Fault Injection can be used to show that software gracefully recovers from a fault condition.
FI Detailed Description
JTAG presents the opportunity to support hardware and software design verification activities, offering a physically non-disruptive means of setting pin/net states to simulate fault conditions. At a system level, this means that faults may be injected without disturbing the system configuration, e.g. introducing extender cards or attaching temporary "wire links".
In turn, this increases the scope for more comprehensive design validation under environmental test conditions (thermal, vibration, EMC), an area which is traditionally considered difficult.
One problem with the IEEE 1149.1 EXTEST instruction is that putting a device into EXTEST takes all of its pins out of functional mode - this may limit usability, although there are some methods that have been used with FPGAs to allow selected pins to be used for fault injection. Also, JTAG does not readily allow for simulation of AC/transient faults, only DC/"stuck at" faults, although IEEE 1149.6 may open up some possibilities here. However, JTAG does offer the opportunity to re-test using new or updated fault models without the need to modify the target system.
If a device supports the HIGHZ instruction, then this can be used to effectively "remove" that part from the board, without physically doing so, to experience a device failure or missing part. This can be useful in proving in the diagnostics for functional tests.
Overlaps with Configuration and Tuning, Root Cause Analysis/Failure Mode Analysis.
FI Alternative Techniques
Physical fault injection may result in degradation of the Unit Under Test and/or influence the test environment enough to cause side-effects or introduce related but undesired faults. For example, the alternative generally means attaching wires to the board, which means the technique is not really usable under environmental test conditions. The soldering operations can stress the board, and for EMC, the attached wires would essentially represent extra antennae.
The effects of faults may be modeled by simulation of the system behaviour. Often this will be limited to only the hardware and firmware, but co-verification, incorporating both hardware and software simulations, has been demonstrated to be effective although it is difficult to set up and time-consuming to do.
Another technique is to reprogram CPLDs or FPGAs with a fault induced image to replace the normal one while the board is alive, but before it is activated/on-line. Synchronization here can be difficult to achieve.
A further technique is to program the FPGA image with a "time bomb" that after a period of time or system clock ticks, will use an alternate logic path instead of the normal path. The original FPGA image is replaced by the time bomb image during the boot sequence. The concern is to ensure that the time bomb version is not accidentally shipped to the customer.
FI Tooling Requirements
Provision of pin-level control.
FI Value Proposition
By avoiding physical disturbance to the UUT, design proving activities can be undertaken without the need to set aside dedicated assets for test, and the "virtual" application of fault conditions avoids the risk of applying physical stresses (both electrical and mechanical) to the test subject.
- Injected signals may propagate through the system in an unexpected manner, and could conceivably result in damage.
- It may be necessary to add hardware such as additional gating or switching to accommodate Fault Injection if the necessary resources are not inherent within the design, thereby adding to board cost.
Programming / Updates (PU)
In a low-volume production environment, the use of in-system programming for first time programming of devices (including some large Flash memories) during manufacture is common practice. Furthermore, it is common to change preloaded firmware as boards progress though assembly stages (Module then System) to support different types of functional test and the final "mission" firmware.
PU Application Fields
A set of items that are able to be updated using boundary-scan:
- FPGA, CPLD, and PLD family of logic devices
- FPGA Configuration PROMs
- Parallel FLASH*
- Serial FLASH* [Most often I2C based]
- Embedded memory* (in DSP, microprocessor, microcontroller, etc.)
- Configuration Parameters (temperature/voltage thresholds, etc.)
- Programmable operational parameter (programmable resistors, etc.)
- Any kind of non-volatile storage
* Indirectly programmable using JTAG accessible ports/resources on adjoining devices (clusters)
PU Detailed Description
Building boards using pre-programmed devices may be preferred by sub-contract assemblers (may work for high-volume production), but managing pre-programmed parts, including the consequent changes to supplier's drawing sets and amendments to purchase orders to incorporate new programmed parts, can be quite costly. In some environments it may be common to change the loaded firmware as boards progress though assembly stages (Module then System) to support different types of functional test prior to loading the final "mission" firmware. In a low-volume production environment, the use of in-system programming for first time programming of devices (including some large Flash memories) during manufacture is common practice.
Large Flash memories (and some large FPGAs) may give rise to JTAG programming times that are considered unacceptable in many volume manufacturing scenarios, and faster, (parallel) methods will generally be preferred - to some extent, technology has moved on beyond what JTAG can realistically support in this area. Several JTAG "tricks" have been developed to speed up Flash programming over JTAG (see examples below), but these really just reduce the scale of the problem rather than providing a cure; yet, for in-the-field updates, or for recovering a system that may have suffered gross corruption, JTAG remains a viable (or even essential) fall-back option. Lab development activities will likely continue to rely on JTAG as a means of programming boards, or at least certain devices.
Categorization of programmable targets:
- Predictable areas to change (aka, Software)
- Application level programming
- Mission data
- Non-predictable areas to change (aka, Firmware)
- PLDs (FPGAs, CPLDs) and FPGA Configuration PROMs
- FLASH (including FPGA configuration via uP/Flash)
The "predictable areas" are those that either by explicit statement or by inference from the requirement for the product may be expected to be updated a number of times during the life of the product. Consequently, it is likely that a provision will be made in one or more of the mission buses to facilitate these updates. The "non-predictable areas" cover the remainder of the re-programmable entities: There is no requirement for these to be reprogrammable, it's just the way the design has been implemented. Updating via a mission bus is probably unsupported (if not actually impossible), and which may well require the use of a dedicated Test Connector.
In some cases, an FPGA may need to be configured (for a specific I/O voltage family) prior to programming FLASH devices relying on a certain I/O voltage interface. That also means, that a post-configuration BSDL file is required for that BSDL file.
PU Alternative Techniques
"Application Software" is usually loadable using a high speed bus, such as Ethernet, but this rarely applies to embedded controllers, PLDs or FPGAs within the system and for these JTAG is the norm. JTAG/On-Chip Emulation access to FLASH devices (to program faster than with JTAG/Boundary Scan)
PU Tooling Requirements
Some considerations for tooling:
- Special software to control scan chain(s)
- Models for FLASH devices
- Some programming targets may require setup/configuration of other devices or module resources prior to the programming/update of the actual target device
- Support for concurrent programming of several similar targets in parallel.
- Local vs. Remote JTAG controllers
- Support for Flash programming "speed-up" techniques
PU Value Proposition
Opportunity for in-the-field reprogramming, especially for those elements which not have been designed to be routinely reprogrammable as part of a mission requirement. Avoids disassembly and/or return to factory/depot for upgrades.
"Back-door" facility to recover boards or systems that have become corrupt either through electrical disturbance or failure of primary programming method.
In some cases, an FPGA may need to be configured (for a specific I/O voltage family) prior to programming FLASH devices relying on a certain I/O voltage interface. That also means, that a post-configuration BSDL file is required for that BSDL file.
Programming times for large flash devices may be quite long, even to the extent that they may be considered unacceptable for routine production activities. The following are some of the JTAG "tricks" that have been developed to partially alleviate this concern:
- Skipping 0xFF datablocks in blank sectors of FLASH devices
- Shorten the scan chain during programming
- Provide a parallel access write pulse (rather than toggling a write signal using Boundary Scan access) - may be difficult to do in an embedded environment utilizing pure JTAG access
Root Cause Analysis / Failure Mode Analysis (RCA/FMA)
At first glance, this use case does not seem to be appropriate for 1149.1 since failures which occur in the field are usually cleared by replacing the field replaceable unit (FRU) and sending it to the repair depot for repair. In reality, this process only adds to the problem of No Trouble Found/No Faults Found (NTF/NFF) cases at the repair depot. There are classes of faults which only manifest themselves in the environment they occur in. Many of these are thermal related and occur under specific system conditions. Thus, it is best to identify failures in the environment where the failure is experienced. Once captured in the failing environment, NTFs would be able to be associated with a previously recorded failing condition to narrow down the problem space in the repair depot. This use case is an advanced use case and is probably more useful for high-availability systems then others.
RCA/FMA Application Fields
- Identify failures in the environment where the failure is experienced; once captured in the failing environment, NTFs would be able to be associated with a previously recorded failing condition to narrow down the problem space in the repair depot.
- Trending same failures on boards indicating a manufacturing problem, design problem, or possibly thermal hot spot problem in a chassis design;
- Same failure of a device – especially the same code response from BIST operations (this is good to identify if a failure is isolated to a particular lot of devices);
- Dump the contents of a device configuration to identify if a configuration changed, causing the failure.
One aspect IEEE 1149.1 that provides value for this use case is with regard to the SAMPLE instruction available in all boundary-scan devices. The SAMPLE instruction is able to capture the state of the boundary-scan register (BSR) without applying changes to the device pins. Where this is useful is to capture a snapshot of the state of the system signals at a point in time. This is quite useful when needing to identify what alarm signal(s) has(have) indicated a problem and the normal event reporting system in the architecture no longer is working. From this state snapshot, the data can be analyzed to understand what caused the failure to trigger and changed the state of the board causing it to go out of service. This information is important for the software developers to identify events which could affect the state of their software model of the system and thus give insight into why the software responded the way that it did. It is important to note that while the quasi-static SAMPLE can help identify the state of a board, giving clues of where a failure is when looking at signal states, it cannot provide the granularity that may be needed to capture dynamic problems / failures.
RCA/FMA Detailed Description
The importance of applying tests in the system when a failure condition occurs is to record the failures with a granularity of diagnostics to pinpoint the location of the failure on the board instead of just a PASS/FAIL result. Many functional tests are able to identify a failing functional block, but the granularity as to where in the circuit a failure exists is poor. This is because functional tests typically target function features and not structural features. An 1149.1 based test is able to target specific structural features, such as open pins, and to identify devices that are not operating properly due to some environmental condition, such as over-temperature or under-voltage where a device is not responding properly to the scan operations. By keeping track of failures at the net and device pin level, designers are able to identify trends of similar failures in a circuit which could indicate a design problem requiring rework to improve a product's reliability. For example, if the same device exhibits open pins over time, this could indicate a thermal problem for that location of the board or not enough heat dissipation features applied to the circuit. It could also indicate a mechanical clearance problem during installation and removal of the board.
RCA/FMA Alternative Techniques
- Functional registers read by software.
- Periodic (scheduled) execution of functional tests.
RCA/FMA Tooling Requirements
While there are tools available for device level failure analysis, availability of complementary tools at the system level is limited. Some instrumentation interfaces may lend themselves to provide access to device level functions that support board level failure analysis (e.g. iBIST and some BERT embedded tooling).
RCA/FMA Value Proposition
Some of these features may be useful during design prove-in and testing of prototype systems in other system types.
- Reduce "No Trouble Found/No Faults Found (NTF/NFF)" cases at the repair depot.
- Diagnostic information can be made available (and used for troubleshooting / fault analysis) prior to module/board being received at repair depot.
- Trending: Problems (e.g. thermal issues) may be identified before they actually cause failures / break the system.
- More storage needed to capture data recording / failure information.
- Format of that information.
- Method of access if module/board is destroyed or cannot power up or boot up.
Power-on Self Test (POST)
Power-On Self Test (POST) has been a basic staple for system level test since systems were first manufactured. POST traditionally comprised of a set of functional tests that would target key modules of a system that were able to be functionally separated from the whole of the system to try to verify that the circuit module was still functioning according to written requirements.
POST Application Fields
Verification of circuit modules and system functions during system power-up
POST Detailed Description
Power-On Self-Test (POST) traditionally has been comprised of a set of functional tests for functionally separate circuit modules. In most cases, a failure of the functional test meant nothing more than the module did not perform as expected with little to no insight into what caused the test to fail (e.g., GO/NO-GO diagnostics). For field service calls, where a board was required to be pulled if a failure occurred, down time to diagnose where a failure occurred needed to be kept to a minimum. Unfortunately, functional testing was not able to isolate a circuit module down to a single board, e.g. Field Replaceable Unit (FRU), but rather isolated a fault down to a set of boards. The craft operator would replace the entire set of boards (FRUs) to get the system back into operation. This led to a problem of No Trouble Found (NTF) or No Fault Found (NFF) conditions of boards while they were being tested at the Repair Depot. As boards become more complex, the cost associated with a board increases significantly. Thus, pulling out a board that is suspect is a costly inventory proposition and wastes testing resources at the Repair Depot that further increases the cost of maintenance of a product.
To improve the granularity of test coverage, more detailed functional tests may be written that target finer granularity of the system. Unfortunately, not all systems lend themselves to this partitioning for test. Further, to write a functional test that targets specific failure models is both time consuming and requires a special expertise in the circuit targeted by the test. Therefore, writing detailed functional tests for circuit boards tend to be cost prohibitive as market windows shrink and the overall life span of a product diminishes. To aid in the development of tests that provide a finer granularity of diagnostics, the use of Boundary-Scan technology is being introduced into the system as part of system test.
Since 1149.1 targets the structure of the circuit and not the functional behavior, tests may be automatically generated from the design CAD data. The ability to automatically generate these tests significantly reduces the cost of test generation for a board. Further, these tests provide very precise test coverage metrics as to which parts of the circuit are tested and which are not. More importantly, the 1149.1 based tests are able to diagnose a failure down to a device pin and board net level giving the ability to isolate a failure to a single FRU instead of a set of FRUs. It is this latter case that may be leveraged to reduce the number of NTFs ending up at the Repair Depot for a system by implementing Boundary Scan Enhanced (BSE) POST in a system.
Boundary Scan has long been established as a test methodology for digital boards, and so is commonly used as part of the board production process. The opportunity exists to leverage some or all of these pre-existing tests as part of the on-board POST. This need not be a replacement for a functional POST but rather a complement to it, providing a "Boundary Scan Enhanced POST".
Once the infrastructure is in place on a board, tests that were designed for manufacturing may be migrated to the system test environment directly or further constrained to eliminate the stimulation of signals that propagate off the board during testing. BSE POST would be run prior to or as an initial part of the boot process for the board. The simplest diagnostic results provided by BSE POST would be a PASS/FAIL indication or could be as detailed as identifying the failing device pin and net information.
To achieve Boundary Scan Enhanced POST, the design must provide an 1149.1 test controller on the board to be tested that is either under software control by a hosting processor or incorporated into a test co-processor in an FPGA, CPLD, or dedicated IC. The test controller hardware interface may be a dedicated device or may leverage 4 or 5 spare general purpose input/output (GPIO) pins on the hosting microcontroller.
Some issues that need to be addressed are:
- What diagnostic resolution is required for POST?
- What information should be reported?
- How does BSE POST integrate into the usual POST process?
- The value BSE POST brings to the POST process?
POST Alternative Techniques
Product-specific Functional Test.
In some designs, POST may be omitted altogether in favor of an on-demand BIST, followed by a system reset. This approach avoids many of the issues associated with indeterminate system states potentially existing on completion of POST.
POST Tooling Requirements
The design must provide an 1149.1 test controller on the board to be tested that is either under software control by a hosting processor or incorporated into a test co-processor.
Tooling requirements include:
- The type of test actions that should be performed during POST
- The interface boundary between external test tools and embedded tooling
- Real-time diagnostic results vs. Off-line diagnostics from stored failure information
- Formats of information passed from external test tools to embedded test tools
POST Value Proposition
- Functional Test usually provides go/no-go
- With Functional Test we usually cannot determine specific test coverage
- BScan tests have already been created for manufacturing test, why not reuse them?
- BScan may be able to test more in the same amount of time compared to functional test
- Infrastructure is already there, why not use it?
The additional hardware cost to support BSE POST, can range from $0 to $20 depending on the complexity of the test controller and the performance requirements.
The time allowed for POST to complete may be restricted by a requirement of the product specification and this may mean that some tests have to be omitted from POST, possibly being run at some point after boot-up.
Some tests may only be possible after firware has loaded or devices have been configured; this will tend to mitigate against those tests being incorporated within POST.
BScan tests can disturb the core logic states of devices, so some form of reset is usually required on completion of POST (note that this reset should take care not to re-initiate the POST!).
Environmental Stress Test (EST)
Environmental Stress Test (EST), also known as Environmental Stress Screening (ESS) is a commonly used within the manufacturing process to reveal latent defects, mainly those originating within the assembly operations. Typically, EST will consist of thermal cycling of the test subject and may also include vibration cycles or humidity cycles, depending on the product type.
EST Application Fields
While EST is predominantly a test used within the production process, the same techniques are often used much earlier in the life cycle as part of the product design verification, although the applied stresses in this case will usually be more extreme than those used during production.
Within the scope of "Design Verification" (which is generally focused on the functional compliance with the Design Requirement) there is also the concept of "Qualification Testing". This establishes that the product continues to function over the range of temperature, vibration, shock, humidity, etc., stated in the requirement. Qualification Testing is usually mandated for products supplied for automotive, railway, marine, airborne or military use.
EST Detailed Description
Environmental Stress Test includes the following subjects:
- Design Verification Testing (functional proving, typically done before a product release)
- Qualification Testing (environmental endurance proving, typically done before a product release)
- Highly Accelerated Life Test (HALT) (used in design for product ruggedization)
- Highly Accelerated Stress Screening (HASS) (used in production for process monitoring)
What is HALT?:
- HALT is used to find the weak links in the design and fabrication processes of a product during the design phase.
- The stresses are not meant to simulate the field environments at all, but to find the weak links in the design and processes using only a few units. The stresses are stepped up to well beyond the expected field environment until the "fundamental limit of the technology" is reached.
What is HASS?:
- HASS is a screening process that uses accelerated techniques to uncover manufactured product weakness and flaws. The process requires the use of HALT results, and other product specific information to design the initial profile, and then tune it for optimal effectiveness.
Since EST is a test of comparatively long duration, it is often run with minimal supervision, and it is also common to have several UUTs in the same chamber. This can lead to some dilemmas:
- Are there some critical failures which should cause the whole test to be abandoned without operator intervention (e.g. gross over-current)?
- If UUTs progressively fail during test, is one working UUT sufficient reason to continue the test?
- Can tests be coordinated in such a way that failed units can be swapped for new ones between cycles?
Some EST arrangements may not exactly replicate the system configuration, but instead will bring several similar boards together in a manner that allows an enhanced test through the configuration of external loop backs or loop overs. This however requires additional modeling of the external paths that would not be required simply for a board test.
The 5-wire JTAG TAP does not lend itself to use with the long cableforms usually associated with EST. As a result, repeaters/buffers may be required for externally controlled BScan testing and these can themselves become points of failure. Embedded testing (either with a fully embedded Test Manager or with just an embedded TAP Controller) may alleviate this by allowing tests to be controlled and return results over a more robust medium such as Ethernet.
An additional advantage of using an embedded Test Manager is that it becomes easier to arrange for parallel execution of board-level tests.
EST Alternative Techniques
- Functional test: Typically a cut-down version of the Acceptance Test for the unit.
- Passive test: Unpowered environmental cycling. The unit is normally tested before and after cycling under normal ambient conditions.
In production, ESS may be either "passive" or "active". In passive EST, the UUT is unpowered while it is subject to the environmental conditions. This approach may be taken for a number of reasons, generally driven by cost:
- The cost of supporting functional test equipment is deemed to be too great.
- External functional test cabling degrades quickly within the environment making the tests unreliable (this is more often true for vibration testing).
- Supplying suitable stimulus within the environmental facility is deemed to be too difficult to arrange (e.g. optical or RF excitation).
In these cases, JTAG offers a relatively low cost alternative to a functional EST, making an active EST more attractive.
EST Tooling Requirements
- Create loops of tests
- Synchronize or be able to correlate test execution with external events, such as the start of a thermal ramp.
- Testing of external data loops
- Stop testing in event of gross failure
EST Value Proposition
Active or powered EST is conventionally functional. However, EST cycle times can be quite short compared to the execution time of the functional test. For example, a typical Thermal EST may require the UUT to be powered and tested during the warm-up ramp and hot dwell; the warm-up may only take 20 minutes with a functional test run time of maybe seven minutes, so less than three tests can be conducted during the ramp. This provides very poor granularity for detecting thermal failures. JTAG tests will usually have a much shorter execution time allowing more test cycles to be run during each ramp. In addition, functional testing will tend to test sections of the UUT in a sequential manner, so sections of the design will not actually be tested over most of the temperature range. Part of the optimization of generated JTAG structural test vectors will typically result in multiple discrete data paths being exercised simultaneously, providing a more comprehensive thermal coverage.
Qualification Testing (Design Proving) can be quite a lengthy process, so a typical scenario is that several facilities (and UUTs) may be used in parallel to expedite the testing (e.g. one for Vibration Qualification, one for Thermal Qualification, another for Salt Spray, etc.). Equipping each facility with a set of functional test equipment is costly and the setups become largely redundant on completion of the Qualification activity. While some of the testing is necessarily functional, the ability to conduct the majority of testing at a structural level allows the more expensive functional test assets to be shared amongst the various facilities.
- Embedded testing requires that some resources are made available within the UUT and that these are functional.
- May need additional ports to synchronize with external equipment
Device Versioning (DV)
By obtaining information accessible at the device level it becomes possible to establish information relating to the build standard and configuration of the boards and the system comprising those boards.
DV Application Fields
The Device Versioning Use Case was developed originally for the ability to discover the IDCODE of all the devices assembled on the board in order to validate that the order and type of devices assembled on the board were indeed what was expected to be there. This ordered set of defined devices might have an impact on what software must be installed on a board.
This base application may be extended to verification of the system composition: Iteration through possible scan path verification tests can help identify each board installed in a system (the passing test indentifies the UUT). This may be further extended to encompass the verification of installed mezzanines. It is worth noting that since the scan path verification test is non-intrusive, it may be run at any time.
However, device IDCODES alone may not always be enough to uniquely identify the type or version of a board, and it may be necessary to read some additional feature, whether it be a directly accessible boundary-scan register or an indirectly accessible memory location which is read using boundary-scan controlled signals to stimulate the device to provide the information (e.g., FLASH memory contents).
Most programmable devices provide a USERCODE register or some user defined accessible register that may be programmed by the firmware designer. The firmware may use this register to encode the revision information of the firmware used in the device. This information may be accessed with boundary-scan to aid in determining the firmware revision and configuration of a device in the field.
DV Detailed Description
Device Versioning is perhaps a misleading term; it may be more accurate to describe this Use Case as using device characteristics or data stored in a devices in order to ascertain information regarding the board types, build standard, installed firmware and system composition. This information may be used for a number of puposes:
- To determine the system configuration, e.g. for verification purposes.
- To ascertain if a unit contains a device or assembly that is subject to a recall or service action.
- To establish whether a particular update is required (or possible) for that system.
- To determine which tests may be run on the system.
Most JTAG capable devices can report manufacturer and device ID directly, and indirect methods can usually be employed to get similar information from other parts such as Flash memories. However this may not always be enough to to unambiguously identify a board, and some supplementary data may need to be held in some dedicated store: This could be a simple as a few hardwired signals, or it may be some dedicated "Module ID" device. In addition, the UserCode registers in some devices, such as FPGAs, may be used to store some form of identification for the firmware.
Any register or signal that can be accessed either directly or indirectly by JTAG may be useful in this context; for example the GNAT protocol may be used to provide a half duplex port to output boot messages from firmware on mezzanine cards.
DV Alternative Techniques
Traditionally, version information was managed through the use of documentation: "As Built Records", "Master Mod. Records" are terms sometimes used. This is obviously prone to error and records may not be maintained if the system is repaired or modified once in the field.
It may be possible to obtain version information from boot messages or by querying the system over a mission bus. This is often used to report software or firmware versions, and may be extended to include reporting of board types and revision states, but it is unlikely to be able to identify the presence of specific devices on the boards, e.g. identifying the brand of flash memory fitted.
Physical inspection of a board is not always possible - component identification marks may be obscured by heatsinks or similar structures, or conformal coatings. Additionally, removal from the system may not be desirable.
In some cases an "inventory PROM" may used to store device information, however this adds costs and may affect overall reliability.
For Board versioning one option is to use external pull-up or pull-down resistors. An advantage here is that it may be possible to recover that information without powering the board. This will still need a look-up for the coded ID, which again may not have been properly maintained.
Bar code or RFID tagging or other labelling forms are often used but may be subject to confusion arising from different stages of manufacture each adding their own labels.
DV Tooling Requirements
JTAG tooling often masks some bits of the IDCODE register that contain, e.g., the silicon revision, to avoid spurious test fails. For Device Versioning it may be necessary to interrogate those additional bits.
The current BSDL really only supports a single IDCODE and USERCODE for each device. Tooling needs to support multiple mappings for these values to accommodate optional vendors and different device configurations.
Many actions require to operate with knowledge of the contents of individual registers within a device; current vector level operations can make this difficult. STAPL provides some access to sub-sets of vectors, but still loses the context of the data.
Ability to manipulate signals to control and observe non-JTAG devices to read persistent version information.
DV Value Proposition
- JTAG may be the only way to access device versioning information if the board / system cannot boot.
- System Configuration can be identified in order to determine if subsystems can work together. JTAG may also be the only way to determine if a specific version or brand of device has been fitted to a board; this may be essential information to determine if a particular service action is required.
- Conformal coating or the fitment of heatsinks may obscure part numbers making physical verification of parts difficult or impossible.
- It may be possible to perform the verification remotely, avoiding the need to send a service agent into the field.
- Device Versioning is non-intrusive, meaning that the board or system can remain in service during interrogation.
- Provides more definitive and up-to-date information than physical labelling or other paper systems will achieve.
Board designs may be sufficiently similar that the IDCODES and the scan chain topology do not uniquely identify the board type: It may be necessary to add circuit elements, not otherwise required for the design functions, in order to provide sufficient fidelity in board or system identification. A PROM or pull-up/pull-down links or other similar feature may be required to store a board type number or build revision. In this case a standardized method of accessing similar types of information, such as board revisions, may be beneficial.
An update of a device's USERCODE is rarely automatically generated by the vendor tooling when the firmware is revised, so if these are being used to indicate firmware build states, then care must be taken to ensure that any USERCODE is correctly maintained on each re-programming of a board.
- , Simulation-Based System Level Fault Insertion Using Co-Verification Tools, Eklow, Toai, Vo, Chi Khuong, Shyam, Pullela, Anoosh, Hosseini, Hien Chau, BTW 2003.