Maciej FRANKIEWICZ Ryszard GAŁ Adam GOŁDA Ireneusz BRZOZOWSKI Andrzej KOS

# ASIC IMPLEMENTATION OF HIGH EFFICIENCY 8-BIT 'OCTALYNX' RISC MICROPROCESSOR

**ABSTRACT** The paper presents structure of 8-bit RISC microcontroller with 16-bit address bus called OctaLynx. The processor behavior is described by Verilog hardware description language and was fabricated as ASIC in CMOS LF 0.15  $\mu$ m (1.8 V) technology. Before fabrication FPGA tests were run. The integrated circuit consists of the core and some peripherals (8-bit general purpose input-output ports, timers/counters, USART, SPI). The controller was designed for tests of the dynamic power management systems.

**Keywords**: *Microcontroller, RISC, ASIC, CMOS, ALU, Timers/Counters, USART, SPI, Verilog* 

# 1. INTRODUCTION

High efficiency microcontrollers are the key elements for modern measurement and data processing systems [1]. As a consequence – good

Maciej FRANKIEWICZ, M.Sc., Eng., Ryszard GAŁ, M.Sc., Eng., e-mail: frankiew@agh.edu.pl, ryszard.gal@gmail.com

Adam GOŁDA, D.Sc., Ireneusz BRZOZOWSKI, D.Sc., prof. Andrzej KOS, D.Sc. hab. e-mail: golda@agh.edu.pl, brzoza@agh.edu.pl, kos@agh.edu.pl

AGH University of Science and Technology, Department of Electronics

PROCEEDINGS OF ELECTROTECHNICAL INSTITUTE, Issue 260, 2012

understanding of their work is crucial for designing any application. While designing the microprocessor two features play especially important role: high operating speed and low power consumption.

To apply system efficiently the proper controller structure has to be chosen. The idea of RISC (Reduced Instruction Set Computer) controller is to reduce number of instructions and unify number of clock cycles need for each instruction to be executed [2]. As an effect the instructions can be pipelined and the number of connections between functional blocks can be reduced [3]. Additionally, the instruction decoder can be simplified. That gives a smaller and faster processor.

Understanding of used microprocessor structure and good knowledge of data processing path are especially important if one want to implement and test circuits cooperating with the controller core. This need is even more urgent if designed circuit is implemented in the same die as the processor. As a result there is a need for thorough understanding of used microcontroller. In presented case authors decided to implement their own project for better control of the tasks executed by the processor.

## 2. MOTIVATION AND REQUIREMENTS

Authors' team interests include synthesis of digital VLSI circuits and their usage in Dynamic Power Management (DPM) systems. Presented controller, called OctaLynx, is prepared for implementation in Application Specified Integrated Circuit (ASIC) and designed with Verilog hardware description language [4]. Before fabrication of ASIC prototype, verification of the project was necessary. Therefore, an evaluation board with FPGA device has been designed in order to implement the prototype of the designed microcontroller. After verification of the structure it was implemented in LF CMOS 0.15  $\mu$ m technology with 1.8 V supply voltage.

As it has been showed, it is possible to measure temperature of the standard integrated circuit [5] and it is also possible to create software solutions of the DPM methods using existing processors [6]. The situation is different when new hardware solution is created, like in described case. If one want to create a DPM system on hardware-level it is necessary to place the management circuit in the same silicon die with the processor itself to be able to measure chip temperature and produce control signals. It is impossible to do that with existing standard processors available on the market. Authors decided to design their own architecture of the microcontroller instead of using existing

IP-cores for better understanding and control of the processes inside the processor core. Structure of the Dynamic Management system which is producing control signals for the OctaLynx processor was described by authors in [7].

# 3. CONTROLLER STRUCTURE

Designed microcontroller is quite complex structure. The system consists of the four fundamental units: programmer unit, memory driver unit (processor

communicates with external memory), core and peripherals. The most important element of the microcontroller is an internal 16-bit main data bus. It consists of 8-bit data bus, 6-bit internal address bus and two control lines. The reason for creating such structure was to improve communication between blocks. All of them are connected to one main bus which is controlled by part of the system core. Presented structure enables easy addition of new peripherals



Fig. 1. Block diagram of the OctaLynx microcontroller architecture

and blocks which are not included in this design but can be necessary while further development of the controller. Block diagram of the controller architecture is presented in Figure 1 [8]. All functional blocks will be described consecutively.

The programmer unit function is to communicate with PC computer using existing SPI (Serial Peripheral Interface) unit. As an effect, it is possible to write program code to program memory, and also read and verify it. Additionally, erasing program memory and reading chip signature are possible. Programming mode is entered by setting logical "0" on the reset line.

Memory Driver unit connects microcontroller with external memory. This memory consists of the RAM (Random Access Memory) and PM (Program Memory). In addition internal main data bus is leaded out, so connecting external devices (for example additional IO ports) is possible. The controller has 16-bit-width RAM address bus and 16-bit program memory address bus. It is possible to address up to 64kB of RAM memory and 128kB of PM (program memory



Fig. 2. Block diagram of the OctaLynx core

is organized in 16-bit words). To reduce number of needed IO lines, multiplexing was used. As a result controller uses one 16-bit output bus for addresses, one 16-bit bidirectional, multifunctional bus and 4 memory control lines.

The most important unit of every microcontroller is its core. In this block all instructions are decoded and executed. All additional devices, internal main bus and peripherals are driven by this unit. The block diagram of the core is presented in Figure 2. The core consists of the GPRU (General Purpose

Register Unit), SP (Stack Pointer) counter, control unit with instruction decoder and ALU (Arithmetic Logic Unit) with SREG (Status REGister).

Arithmetic Logic Unit is the most important part of the controller core. Inside it all instructions that modify the data are executed. These instructions can be divided into three basic groups:

- arithmetical operations including addition, subtraction and multiplication,
- logical operations including functions OR, XOR, AND, NOT as well as clear byte and set byte operations (write respectively 0x00 or 0xFF),
- bit operations including bit Shift, clear and set single bit in byte, SWAP instruction (replace less and more significant half of the byte), MIR (mirror) instruction (change order of the bits in the byte b<sub>n</sub>↔b<sub>7-n</sub>).

In order to decrease power consumption of the ALU block it has been implemented as a set of independent units executing each group of operations. While executing specified instruction only corresponding block is active, the rest is disabled and disconnected from the main data bus. As a result minimization of number of connections between transistors is obtained which is the most significant cause of power losses in CMOS circuits.

GPRU consists of the 32 8-bit registers. This registers are used as a source for all arithmetic and logic operations. As a result accumulator register is unnecessary and amount of data transmissions with RAM was reduced. Two 8-bit output buses are connected to ALU, so both of functions arguments can be transmitted in the same time. Result of ALU operation is transmitted by 8-bit input bus and stored in one of the registers. Additionally one 16-bit output bus and one 16-bit input bus are connected to ALU for 16 bit operations. In addition 3 pairs of the registers can be used as 16-bit address pointers for indirect addressing. This registers are named as X, Y and Z. For this reason, 16-bit output bus can be switched to RAM address bus.

Control Unit is very important since it controls all operations in microcontroller. It consists of ID (Instruction Decoder), IC (Interrupt Controller) and PC (Program Counter). Instruction decoder performs two functions: reads instruction code from program memory and decodes it as well as sets proper control lines. Most of instructions are executed in one clock cycle, but some of them require few clock cycles (for example subroutine call). For this reason ID is realized as a state machine. Instructions are pipelined – it means that in one clock cycle one instruction is executed and next instruction is decoded. Because instruction decoder is complicated and slow (propagation time over it is comparable with propagation time over ALU), this solution improves timing of the controller.

OctaLynx controller has ability to call 11 interrupts, so interrupt control (IC) unit is necessary. List of implemented interrupts is presented in Table 1. IC is connected with all units, which can trigger interrupts, and with instruction decoder. When one of the units sends request to IC and interrupts are enabled, IC sends request and vector to ID. When return from interrupt has been executed, IC receives acknowledgment from ID and sends it to proper unit. After that unit clears request. When more than one interrupt is requested, IC decides which interrupt will be executed firstly.

| Vector | Program   | Interrupt Name                           |  |
|--------|-----------|------------------------------------------|--|
| No     | address   |                                          |  |
| 1      | 0x00      | RESET                                    |  |
| 2      | 0x01      | External interrupt 0                     |  |
| 3      | 0x02      | External interrupt 1                     |  |
| 4      | 0x03      | Timer0 input capture                     |  |
| 5      | 0x04      | Timer0 output compare A                  |  |
| 6      | 0x05      | Timer0 output compare B                  |  |
| 7      | 0x06      | Timer0 overflow                          |  |
| 8      | 0x07      | Timer1 output compare                    |  |
| 9      | 0x08      | Timer1 overflow                          |  |
| 10     | 0x09      | Timer2 output compare                    |  |
| 11     | 0x0A      | Timer2 overflow                          |  |
| 12     | 0x0B      | SPI transfer complete                    |  |
| 13     | 0x0C      | USART receive complete                   |  |
| 14     | 0x0D      | USART buffer empty                       |  |
| 15     | 0x0E      | USART transmit complete                  |  |
| 16-32  | 0x0E-0x1F | Reserved for external and future devices |  |

| TABLE 1           |
|-------------------|
| Interrupt vectors |



Interrupt vectors are located in the beginning of the program memory. This memory is formed into 16-bit words in the 16-bit address space. The data memory is in fact three independent address spaces. First is the 6-bit space reserved for 8-bit control registers and it ends with Stack Pointer and Status Regis-

Fig. 3. Map of address space of the OctaLynx processor

ter. The second space is 16-bit address space reserved for RAM. Additionally, in the third space the 32 General Purpose Registers are located. The map of address space of the OctaLynx processor is presented in Figure 3.

The peripheral unit is grouping all units that are responsible for executing all additional functionalities which are not related to the basic work of microcontroller. All peripheral units are also connected to internal main data bus, so reading and writing control and status registers in these units is possible. Structure of the peripheral unit is presented in Figure 4. After reset signal all peripherals are disabled to decrease power consumption. This block consists of three GPIO (General Purpose Input/Output) ports, SPI (Serial Peripheral Interface) and USART (Universal Synchronous and Asynchronous Receiver and Transmitter). Another very useful block is the timers/counters unit. It allows accurate timing of the program execution. It consists of one 16-bit T/C0 (Timer/Counter) and two 8-bit timers/counters T/C1 and T/C2. All counters can be triggered from external pin (A2) or from internal clock by configurable 10-bit



prescaler. The counters can work until overflow until or previously selected value (after that they cleared). are Additionally T/C0 has input capture function implemented. As a result it is possible to set timer

Fig. 4. Block diagram of the OctaLynx peripheral unit

into different modes: CTO (Clear Timer on Overflow), CTC (Clear Timer on Compare) or PWM (Pulse Width Modulation).

Architecture of the microcontroller enables implementation of additional peripherals, both internal (in future designs) and external (through main bus leaded out).

# 4. PREFABRICATION TESTS

In order to verify if the structure was designed properly some tests were done before fabrication. At the first stage of testing each functional block was tested at simulation level Active-HDL using environment. Second stage included FPGA implementation of the processor. For these reason the evaluation board for tests of the OctaLynx controller



Fig. 5. Octalynx verification board with FPGA

was designed and created [5]. In the Figure 5, a photography of the evaluation board with measurement station is presented. Main unit is a Microsemi IGLOO nano AGL250 FPGA device. The whole controller was synthesized and implemented in it. Programming FPGA was realized by JTAG interface using 10-pin connector. FPGA chip was connected with two DRAM chips. One of them was used as RAM and it has 64k times 8-bit capability. Second chip function is to storage program memory and it has 64k x 16-bit capability. Both memories are connected by common 16-bit address bus. One 24-bit connector was used to connect and test external peripherals. Three SPI lines (MISO, MOSI and SCK) are also connected to 6-pin programmer socket. Reset wire and power supply lines are connected to this socket, too.

FPGA tests included programs which consisted of all instructions from the list (arithmetical, logical, jumps etc.) and all interrupts from the list. After repairing all detected problems the controller was ready for implementation as an ASIC prototype.

## 5. ASIC IMPLEMENTATION

The last stage in OctaLynx design was its implementation as ASIC. The prototype was fabricated in LF CMOS 0.15  $\mu$ m technology with 1.8 V supply voltage. Structure of the controller was divided into several blocks and layout of each was designed separately. The reason was to enable easy modification of the processor topography in next versions of the controller. The project was realized with usage of two different techniques. Most of the blocks (core, peripherals and programmer) were synthesized from the Verilog hardware description language code using bottom-up technique. In this way, a lot of design time was saved. General Purpose Register Unit (GPRU) and Memory Multiplexer were designed using full-custom technique. The reason to do so was to save area of the chip by good organization of the registers. The task was possible to do manually because of good repeatability of memory structures. Manual design of memory multiplexer enabled possibly short and area-saving connection of all functional blocks and addition of internal memory block in next version of the controller. All created blocks were connected to each other manually with full-custom technique.



Fig. 6. OctaLynx layout with main functional blocks marked

Lavout of created microcontroller is presented in Figure 6. The structure is divided into several separate blocks: CORE, PERIPHERALS. PROG (programmer), MMUX (memory multiplexer) and CMUX (clock multiplexer). The reason was to enable easy change of the processor structure, especially future change of peripherals and addition of internal memory. Additionally one, not described before, functional block is presented on the circuit topography. The clock multiplexer (CMUX) block function is to select source of clock signal for the processor. It is possible to use external source of clock or use internal generator or internal Dynamic Power Management (DPM) system.

The blank space in the middle of the layout is left free for those internal devices (generators, sensors, control logic etc.). The important fact which decided about placing these additional circuits in this place was its enclosure to the Arithmetic and Logic Unit which is expected to be one of the most heating blocks of the microcontroller. As an effect information produced by temperature sensors placed in this part of layout will probably refer to the hottest spot in chip topography. Described feature is especially significant in Dynamic Power Management system which is planned to be used with described processor.

The DPM system which is cooperating with the processor was described more precisely in [7]. Generally, the idea of DPM methods is to dynamically manage power dissipated in the circuit by means of adapting present work conditions to the workload. It can be done, for example, by scaling the supply voltage  $V_{DD}$ , scaling the operating frequency  $f_0$  or gating the clock signal, according to (1) where P is dissipated dynamic power losses, K is switching factor and  $C_L$  is load capacitance.

$$P = K C_L f_0 V_{\rm DD}^2 \tag{1}$$

Presented circuit uses a combination of Dynamic Frequency Scaling (DFS) and Clock Gating (CG) methods. As a result the most important requirement of DPM which the processor has to meet is to be able to operate properly with all frequencies from the generator tuning range as well as to able to maintain stopped until it will be cooled.

| Topography<br>block   | Height<br>[µm] | Width<br>[µm] | Area<br>[μm <sup>2</sup> ] |
|-----------------------|----------------|---------------|----------------------------|
| Total                 | 736            | 462           | 340032                     |
| Core                  | 212            | 374           | 79288                      |
| GPRU                  | 109            | 323           | 35207                      |
| Peripherals           | 236            | 374           | 88264                      |
| Programmer            | 114            | 138           | 15732                      |
| Clock<br>multiplexer  | 30             | 44            | 132                        |
| Memory<br>multiplexer | 720            | 80            | 576                        |

 TABLE 2

 Chip topography area used by the controller blocks

The layout covers quite large area of about 0.7 mm x 0.45 mm. Most of the area is used by the core and peripherals blocks. Presented layout do not include chip ring with bonding pads. Pad cells include bidirectional buffers and pull-up resistors which are necessary for proper work of the microcontroller. Area covered by each part of the circuit layout is described in Table 2. Areas of the single blocks do not sum up to the total area because of the blank gaps between them but usage of the chip area is still very efficient.

An analysis of the propagation times inside the structure has been a very significant part of design of the microcontroller layout. The reason was to ensure synchronization between the fastest and slowest blocks of the processor as well as to improve its operation speed. To obtain that goal the longest propagation paths had to be identified and optimized. Most of work in this part of the design was focused on arithmetical unit (block responsible for addition and multiplication) [9] [10]. In most cases improvement of the propagation time was obtained by cost of the more complex structure of the circuit and bigger area usage.

Another problematic block was General Purpose Register Unit. First (time-saving) attempt included generation from the Verilog code. The result appeared unsatisfactory because of too long connections (large propagation time) and too big area coverage. Consequently this block was designed manually. Good result was possible because of the repeatability of the parts of the memory block topography.

Presented in Figure 6, final design of the chip topography meets all requirements needed for this microcontroller. Its structure was prepared and send for fabrication. ASIC tests of the presented microcontroller are planned in near future.

## 6. SUMMARY

Presented paper gives concise description of 8-bit RISC microcontroller implementation named OctaLynx. Structure of consecutive functional blocks has been described.

Microcontroller has been implemented firstly in FPGA chip. Some tests have been done and experiments proved proper work of the controller. Consequently, AISC implementation has been presented. The circuit was synthesized in CMS LF 0.15  $\mu$ m technology with 1.8 V supply voltage. Circuit structure and topography has been presented.

The processor is planned to be used in tests of authors dynamic power management system and other measurement boards.

#### **Acknolegements**

The work has been supported by the National Science Center (Narodowe Centrum Nauki), research project NCN N N515 500340.

### LITERATURE

- Megalingam R.K., Mohan A., Thavalengal S.H.: Low power Single Core CPU for a Dual Core Microcontroller. Proceedings of the 3rd International Conference on Emerging Trends in Engineerring and Technology ICETET'2010, pp. 791-796.
- 2. Ye P., Ling C.: A RISC CPU IP Core. Proceedings of the 2nd International Conference on Anti-counterfeiting, Security and Identification ASID'2008, pp. 356-359.
- Lee R., Mahon M., Morris D.: Pathlength Reduction Features in the PA-RISC Architecture. Digest Papers of the 37th IEEE Computer Society International Conference Copcon'92, pp. 129-135.
- 4. Lilja D.J., Sapatnekar S.S.: Designing Digital Computer Systems with Verilog. Cambridge University Press, 2005.
- 5. Frankiewicz M., Kos A.: Measurement of the Temperature Inside Standard Integrated Circuits, Proceedings of Electrotechnical Institute, Iss. 251, 2011, pp. 109-116.
- 6. Hanson H., Keckler S.W.: Power and Performance Optimization: A Case Study with the Pentium M Processor, Texas University IBM Austin Austin Research Laboratory, 2005.
- Frankiewicz M., Gołda A., Kos A.: Overheat Security System for High Speed Embedded Systems, Proceedings of the 19th International Conference Mixed Design of Integrated Circuits and Systems MIXDES'2012, pp. 309-312.
- Gał R., Gołda A., Frankiewicz M., Kos A.: FPGA implementation of 8-bit RISC microcontroller for embedded systems. Proceedings of the 18th International Conference Mixed Design of Integrated Circuits and Systems MIXDES'2011, pp. 323-328.
- Singh R.P.R., Kumar P., Singh B.: Performance Analysis of Fast Adders Using VHDL. Proceedings of the 2009 International Conference on Advances in Recent Technologies in Communication and Computing, pp. 189-193.
- 10. Yousuf R., Najeeb-ud-din: Synthesis of Carry Select Adder in 65 nm FPGA. Proceedings of the IEEE Region 10 Conference TENCON'2008, pp. 1-6.

#### IMPLEMENTACJA 8-BITOWEGO MIKROPROCESORA "OCTALYNX" TYPU RISC W UKŁADZIE ASIC

#### Maciej FRANKIEWICZ, Ryszard GAŁ, Adam GOŁDA Ireneusz BRZOZOWSKI, Andrzej KOS

**STRESZCZENIE** Artykuł prezentuje strukturę 8-bitowego mikrokontrolera typu RISC z 16-bitową magistralą adresową nazwanego OctaLynx. Procesor został zaprojektowany z użyciem języka opisu sprzętu Verilog oraz sfabrykowany jako układ ASIC w technologii CMOS LF 0,15 μm (1,8 V). Przed fabrykacją wykonane zostały testy w układzie FPGA. Zbudowany układ scalony składa się z jądra i peryferiów (8-bitowych portów I/O, liczników, SPI, USART). Kontroler przeznaczony jest do testów systemów dynamicznego zarządzania mocą w układzie.

**Słowa kluczowe**: mikrokontroler, RISC, ASIC, CMOS, ALU, liczniki, USART, SPI, Verilog



**Maciej FRANKIEWICZ, M.Sc., Eng.** Received M.Sc. in electronics at AGH University of Science and Technology in Cracow, Poland in 2010. Currently Ph.D. student at Department of Electronics, AGH University of Science and Technology in Cracow, Poland. Author of over 10 papers published in journals and international conferences proceedings. Area of interest includes temperature-aware designing of integrated circuits and systems.



**Ryszard GAŁ, M.Sc., Eng.** Graduated electronics at AGH University of Science and Technology in Cracow, Poland in 2012. Co-worker in research projects run in Department of Electronics, AGH University of Science and Technology. His master thesis and scientific interests include design of digital electronic systems.



Adam GOŁDA, D.Sc. Received his BSc and M.Sc. degrees in Electrical Engineering from AGH University of Science and Technology in 2001, and his Ph.D. degree from AGH University of Science and Technology in 2008. Since 2001 he has been employed at the Department of Electronics at AGH-UST. His research interests are in the general area of energy losses minimization in integrated circuits, with an emphasis on temperature influence and include modeling and estimation of power loses of VLSI circuits, thermal phenomena in microelectronics, artificial intelligence, and distance learning.

**Ireneusz BRZOZOWSKI, D.Sc.** Received his M.Sc. degree in Electronic Engineering from the AGH University of Science and Technology, Krakow, Poland in 1997, his Ph.D. also in Electronics from the AGHUST in 2006. Since 1998 he works in the Group of Microsystems Design at the AGH-UST, Krakow, Poland on low-power circuits design and logic synthesis. His research interests also concentrates on thermal issues in microsystems.





**Prof. Andrzej KOS, D.Sc. hab.** Received Ph.D. in 1983 at AGH University of Science and Technology in Cracow, Poland in electronics, professor title since 2001. Since 1995 head of the Micro- and Nanoelectronics Systems Team in Department of Electronics, AGH University of Science and Technology. Author of over 190 articles, international conference papers and patents, author of 3 books including one printed in United Kingdom. Scientific interests focus on thermal issues in integrated circuits design and testing. Member of the Committee of Electronics and Telecommunication of Polish Academy of Sciences, many scientific committees. European Commission and Polish Ministry of Science and Higher Education expert.