In most microcontroller architectures, there is a ‘smart' CPU surrounded by a set of relatively ‘dumb' peripherals. Peripherals have limited functions; converting data from one form to another. For example, an I2C peripheral basically converts data between serial and parallel formats while an ADC converts signals between analog and digital. The CPU has to perform all the work to process the data and actually do something useful with it. This, plus close management of the peripherals, can result in great complexity in the CPU's firmware and may require a fast and powerful CPU to execute that firmware within real-time timing constraints causing more obscure bugs, more complex and expensive debugging equipment. If the peripherals were complex enough, flexible enough, and ultimately "smart" enough to effectively relieve the CPU of many of its tasks, a complex design could then be restructured as a group of simple designs distributed among the CPU and the peripherals, giving the CPU fewer tasks and fewer interrupts to handle, making bugs easier to find and fix. The overall design would become more robust, and portions of the design more easily reused. A CPU with less to do may be run at a slower speed to save power, or that available bandwidth could be used for additional tasks. However, the peripherals would still need to be designed in a cost-effective manner or the overall microcontroller might become too expensive. There are two general ways to construct a smart configurable peripheral. The first is to use a Programmable Logic Device (PLD). As shown in Figure 1, a PLD has a sum-of-products logic gate array driving a number of macrocells. The "T" and "C" notations indicate that each product term can generate either a true or complement (inverted) output, so that both positive and negative logic can be supported.
Large-scale PLDs can be used to form complex logic functions. However, a lot of gates may be needed to implement even simple logic functions and it can become expensive to scale up a PLD-based solution for more complex functions. At some point, it makes more sense to just use an actual CPU. A very simple form of such a CPU is a datapath based on an arithmetic logic unit (ALU), also known as a "nano-processor". A datapath implements just a few common functions but does so more efficiently than an implementation using PLDs. Figure 2 shows a simple datapath with an ALU. A typical ALU can do a variety of operations, usually on 8-bit operands: count up (increment), count down (decrement), add, subtract, logical AND, logical OR, logical XOR, shift left, and shift right. There are two 8-bit accumulators that can act as either input data registers or storage for ALU output. A single operation takes place on the edge of an input clock signal. A function select register is used to control what operation takes place, the source register(s) for that operation and the destination register for the output. Depending on the specific design of the datapath, it is possible to do a series of complex operations. A function select block can actually be a small SRAM, preloaded with the desired function select bits, and the SRAM's address lines can be used to select which operation is to be done. Finally, multiple datapaths can be chained together with carry and shift signals so that operations can be done on multi-byte operands. Since a datapath does only a few specific functions, it is possible to optimize its design so that it is inexpensive to build. However, a datapath is not nearly as flexible as a PLD for implementing complex logic. Separately neither one works well but together they can work very well.
Although UDBs have a lot of features in common with both the PLDs and datapath, signals can be routed among the PLDs and datapaths throughout the entire set of UDBs. Elsewhere in the device, to form a complex fabric called the Digital System Interconnect (DSI). There are similar routing features for analog signals, as well as interfaces between the analog and digital domains and the device pins. In a basic example, we can use one UDB datapath to create an 8-bit counter with reload capability. To do this we connect one status condition back to a control store SRAM address line (Figure 3) In this design, A0 is the counter register and D0 is the reload register. We need two functions, one to decrement the counter and one to reload the counter from the period register; these functions are preloaded in the Control Store RAM. The logic is as follows. When A0 is not zero, the condition output will be low and the decrement operation at address 0 will be executed. When A0 is zero, the condition output will be high and the reload operation at address 1 will be executed. All operations take place on the rising edge of the clock input, allowing the number of clock edges to be counted. The clock input can be routed from a variety of sources. The condition output can be routed throughout the DSI, including to DMA and interrupt request inputs. Using datapath chaining and a mask block, the size of this counter can be any number of bits, and is not limited to a multiple of eight bits. This simple design can be expanded, with the use of PLDs, to create a more complex application. Consider a traffic light controller. A traffic light controller cycles through three states, green, yellow and red, so a state machine is required. Each state lasts for a certain amount of time before changing to the next state, so a counter is also required. For simplicity, assume that the "green" time is the same as the "red" time but that the "yellow" time is different. Only one datapath is needed (assuming an 8-bit count value) to implement this timing structure, and three of the datapath registers are used. A0 is the count register, D0 contains the counter reload value for the "green" and "red" states, and D1 contains the counter reload value for the "yellow" state. The operations to be saved in the Control Store RAM are: