## A New Quantum-Dot Cellular Automata Construct for the Future of Nano-Scale Computing L Hook and S Lee Department of Electrical and Computer Engineering, University of Oklahoma, Norman, OK, 73019, USA lhook@ou.edu and samlee@ou.edu #### **ABSTRACT** By the time CMOS reaches its physical scaling limits in the next several years, computing will have very likely evolved to meet the demands imposed with increases in application complexity and the processing of enormous amount of digital information. This evolution has been rapid and constant and can currently be seen in computing architectures in the form of multi-core processors, GPU based parallel computing, interconnected computing clusters, and hybrid architectures which utilize parallel and scalar processors to increase computing efficiencies. Many designs, which have been proposed to replace CMOS and continue the scaling of computing devices, have largely underappreciated the recent move to this type of computing. This work seeks to aid in the development of a new architecture suited for the future of nano-scale computing by creating a new construct capable of high speed, low power, molecularly scalable, fault-tolerant, reconfigurable, reversible, and parallel computing. *Keywords*: quantum-dot cellular automata, reversible, fault-tolerant, parallel ### 1 INTRODUCTION Over 60 years after the invention of the transistor, and after over 40 years of semiconductor improvements, the integrated circuit (IC) has become one of the most successfully proliferated devices of our time. With an ever widening range of uses in fields such as control, communications, and computing, the sustained explosion of IC improvements is rivaled only by the expansion of uses brought forth through the imagination of designers who utilize their ever growing capabilities. In particular, the CMOS paradigm of scaling has allowed the sustained growth of the IC industry and led to a marked shift in culture, and connectivity. As the physical limits of CMOS scaling [1] approach, and the cost of continuing scaling becomes prohibitive, new techniques, devices, and architectures to continue the advance of IC functionality must be developed and it is important that these advances occur in lockstep with each other. This will prevent new devices and materials from becoming functionally obsolete even before their actual fabrication begins. Therefore, a review of the current state and future direction of computing technologies should be accomplished before any new devices and architectures are developed to allow for these directions to be incorporated in the new designs. This will then allow for new devices to be easily integrated into the existing computing environment when and if there implementation becomes a reality. # 1.1 Current State and Future Direction of Computing Technologies Computing directions have continuously changed throughout the years to keep up with technological advances, constraints, and the requirements of the computer users. The winds of change are again blowing through the computing industry. For instance, just as the mainframe computer, in time, gave way to the desktop computer, the single scalar processor computing paradigm is giving way to more parallel schemes. This shift has occurred in stages; going from serial techniques such as instruction pipelining to multiple processors usually on a single chip. This shift continues with cluster computing, which allows multiple desktops to be connected via local or wide area networks. The resulting parallel execution occurs concurrently in computers which are all in the same vicinity, or as much as several thousands of miles away. The success of cluster computing has led institutions to begin replacing expensive supercomputers, with many cheap connected desktop computers. However, cluster computing is not without its issues. In particular, network latencies drive the speed, and therefore scalability, of the applications running on a cluster of computers [2]. Along with this move to parallel computing schemes, another trend which should be considered is the move to more specialized hardware. Current computers typically contain separate ICs or sections of ICs specialized to perform specific computationally intensive tasks faster than traditional general purpose microprocessors. traditional Von Neumann architecture made possible through the universality and generality of traditional CPUs is being supplemented by more efficient hardware implementations. This trend also seems to be leading to more local processing to alleviate I/O and communication bottlenecks resulting from the increase in functional units inside the architecture. Device specialization takes on its most extreme form in the reconfigurable ICs, such as FPGAs. These devices can be "programmed" to implement a wide range of hardware, and the addition of processor cores into them has further strengthened the marriage between general purpose and specialized computing elements. Ideally new architectures would be developed which integrate parallelism, specialization, and/or reconfigurability into their designs. Other important attributes, including fault tolerance and low power needs, should be designed in as well due to the different nature of devices on the size scales that might replace CMOS. Fault tolerant schemes have been a part of computer design for many decades. Due to the relative rarity of fault occurrences to date, these methods usually rely on the vast majority of the information being correct. However, it is well understood that as devices become smaller, are more densely packed, have faster clock speeds, and are more complex, the probability of faults and errors corrupting data will increase. Also, traditional lithographic techniques may not be able to fabricate new designs, even if they are twodimensional. For these designs, a bottom-up self-assembly process will have to be developed which has inherent probabilistic nature. A dramatic increase in complexity currently must be employed to become fault tolerant in the presence of a large number of faults or errors. particularly useful attribute would be if architectures would be able to self-repair or work around incorrectly assembled sections and permanent faults. Also, due to the size scale of these devices, and resulting susceptibility to noise, tolerance to transient faults will have to play a major role in the implementation of these architectures. For these reasons, research to provide fault tolerance in the presence of multiple errors without dramatically increasing complexity is extremely important for the current deterministic computing paradigms and should be factored in to the designs of new architectures for future computing. Additionally, at these size scales, power requirements and in particular heat dissipation will pose a significant challenge. It can already be seen in current processor chips that the dissipation of energy is of paramount concern considering the amount of area taken up with devices to dissipate heat. Future ICs will have to contend with increased density and higher clock rates than current ICs and will thus need new ways to deal with this problem. Reversibility may provide a means to reduce logic device power requirements by removing the fundamental limits to the amount of energy loss that such a device will incur. Together these directions paint a challenging picture to the design of new device architectures. For instance, in order to integrate a highly parallel architecture into very dense and scalable ICs, the connectivity required to the individual computing elements may end up dominating the floor plan. Novel designs, based on cellular automata (CA) have been proposed to alleviate these concerns and have established requirements of local-only interaction. Implementation of CA based designs such as quantum-dot cellular automata (QCA) have become very popular due to their promise of high speed, low power, molecular scalable operation. However, these designs do not seem to utilize the full potential of CA for parallelism and local interaction. The need for reconfigurability, fault tolerance, and reversibility has also only briefly been explored in work on QCA. In order to develop an architecture which incorporates the attractive properties of current QCA designs while adding the attributes which will be important for future computing, a new QCA construct will be presented by this work. This construct is called the super cell QCA (SCQCA) and its operation will be discussed along with an introduction to QCA and how the SCQCA fulfills the previously mention properties. ### 2 QUANTUM-DOT CELLULAR AUTOMATA Quantum-dot Cellular Automata (QCA) [3], have been widely studied for the past 15 years as a replacement for CMOS. This research has revealed fantastic attributes of QCA including ultra fast switching, very low power, and scalability to the molecular regime. However, other considerations which have been considered don't paint the QCA architecture in such a favorable light. To explore these other considerations, the use of the single-bit full-adder (FA) circuit implementation, shown in [4] will be used. This circuit provides a good example to understand the QCA paradigm from the general viewpoint of the future of nano-scale computing. In the FA circuit, 3 inverters and 5 majority gates, or 8 total gates are required to perform the calculations. The majority gate consists of four cells and the inverter only one cell. Therefore the 8 gates are implemented by 23 QCAs. However, in the QCA implementation, nearly 200 QCA cells are required to implement the circuit. The other cells' purpose is of communications and separation of QCAs. Therefore the QCA architecture, as in the CMOS architecture, suffers from the fact that communication paths ultimately dominate the floor plans. The FA circuit is also extremely prone to logic faults. If only one of the nearly 200 cells is flipped, the operation of the entire circuit may produce an error. With a more robust clocking operation this may be improved, however the problem still remains. Because each cell, including the ones providing the actual logic operations, are equally susceptible to faults; multiple redundancy would be required for error free operation. If, for example triple modular redundancy were used a full 3 copies of this circuit would need to be generated and 600 cells would be required to carry out the FA operation. Additional nano-computing paradigms such as reconfigurability also do not fit well within the current QCA architecture. The cells must be placed in a precise relation to the other cells which locks them into their operation. This makes each chip extremely susceptible to device imperfections which will certainly be seen in any molecular self-assembly process. Due to these traits, it is clear that another construct is needed if the requirements brought by the current and future state of computing technologies are to be addressed. For these reasons this paper is proposing a new architecture utilizing the "2-dot" Quantum Cellular Automata to provide a structure in which parallel, reconfigurable, fault tolerant, and reversible computing can be developed while providing the low power, high speed, and scalability to the molecular scale promised by traditional Quantum-dot Cellular Automata designs. # 3 SUPER CELL QUANTUM-DOT CELLULAR AUTOMATA The construct which is at the core of the new architecture, is the super-cell Quantum-Dot Cellular Automata (SCQCA), is shown in figure 1. Initial analysis into the SCQCA architecture has revealed 4 equations which dictate the super cell's operation. They are: $$TO = (\overline{LI} \cdot RI) + (RI \cdot \overline{BI}) + (RI \cdot \overline{TI}) + (\overline{LI} \cdot \overline{TI} \cdot \overline{BI})$$ $$BO = (\overline{RI} \cdot LI) + (LI \cdot \overline{BI}) + (LI \cdot \overline{TI}) + (\overline{RI} \cdot \overline{TI} \cdot \overline{BI})$$ $$LO = (\overline{TI} \cdot BI) + (\overline{TI} \cdot \overline{RI}) + (\overline{TI} \cdot \overline{LI}) + (BI \cdot \overline{RI} \cdot \overline{LI})$$ $$RO = (\overline{BI} \cdot TI) + (\overline{BI} \cdot \overline{RI}) + (\overline{BI} \cdot \overline{LI}) + (TI \cdot \overline{RI} \cdot \overline{LI})$$ $$(1)$$ These equations have been verified using tools, created for the special purposes of this architecture which were based on experimentally verified simulation and design tools created for use with traditional QCA designs. They reveal very interesting features of the SCQCA including the fact that SCQCA are both reversible and logically universal. Figure 1 Representation of a Super Cell which consists of 12 two-dot QCAs, 4 inputs (shown in red), and 4 outputs (bordered in white.) Inputs/Outputs are labeled (i.e. TI stands for Top Input) ### 3.1 Logical Universality and Reversibility The SCQCA is a logically universal logic element meaning that it can create any desired logic truth table with the use of only SCQCA. This truth can be verified by looking at the equations used to describe its operation. When two variable inputs are provided to select inputs of the SCQCA, it produces NAND, NOR, AND, OR, INV, as well as combinations of the gates (i.e. AND ((INV A),B)). The output combination generated by the SCQCA depends on which input the variables are applied to and the constant inputs added to the other input locations. designs are also easily enabled using two or more SCQCA or by feeding back outputs using delay feedback loops. But, in regards to the use of more than one SCQCA, the four phase clock provides a perfect clocking scheme for feedback in circuitry between two or more SCOCA. Inputs of one SCOCA are outputs of another, and calculations occur in each half a clock cycle out of phase with the other. This feedback produces a sequential machine with the SCQCA. It can also be proven by using the equations in (1) that the SCQCA is logically reversible. This should be particularly interesting to QCA designers due to the fact that, one of the major challenges facing designers of future nanoscale computational devices is how to dissipate energy given the proposed device densities and clock rates. Although using reversible logic does not dissipate energy in itself, it removes the theoretical lower bound of kT energy dissipation per irreversible operation. # **3.2** Fault Tolerance and Reconfigurability and Parallel Designs It has been suggested that the SCQCA are reconfigurable when used as gates with less than 4 variables. This reconfigurability is an important attribute when designing systems which will be able to effectively deal with permanent faults caused by manufacturing defect or other permanent process. For example, if a particular gate implemented with SCQCA is found to have a fault the gate can be bypassed by using a different fault-free SCQCA in its place. This would of course require extra SCQCA to be present in the original design but not on the scale required for entirely redundant designs. Identification of even multiple faults is also allowed because of the reversibility of the SCQCA. Reconfigurability also would allow an architecture to dynamically allocate special hardware to data or computational intensive tasks and then turn that hardware back over to do other computing when these special tasks were complete. This would enable a shift in the way that software and hardware interact to create more versatile hardware designs. Lastly, connecting SCQCA together to form systolic arrays would enable a highly parallel architecture. This architecture would also be highly scalable and flexible due in part to the reconfigurability of a SCQCA based design. This differs from most systolic array designs which tend to be highly specialized and able to work on only specific types of data processing tasks. The new SCQCA construct offers a great deal to a future QCA architecture. It should also be pointed out that while the SCQCA was designed with Molecular QCA [5] in mind, but is also well suited to be implemented with proposed Magnetic QCA [6,7] designs as well. Additionally, traditional QCA designs, such as those which utilize metallic and semiconductor quantum dots, could also utilize the SCQCA architecture with only minor modifications to the traditional designs. #### 4 SUMMARY With the tremendous success of computing technologies to date, it may seem that there is no practical need to develop new designs for the Post-CMOS era. Current lowend computer processors are able to process information at over 2.5GHz, memory price/bit continues to plummet with no end in sight, and internet bandwidths are now well over 1 Gb. However, even with these unimaginable advances. simulation of more than a few interacting quantum particles will bring even the most high end computing system to its knees. Patterns in data, which are very easily recognizable to humans, are nearly impossible for a traditional computer to spot. The ability to recover from a mistake, continues to allude traditional computation except for the most simplest and understood errors. Data collections continue to explode, with the ability to sort, search, and reduce this data, being a constant source of research. The advances in computing over the last 40 years have only seemed to expand our imaginations as to the possibilities of these devices. If this is to continue over the next 40 years fundamental changes in the way we compute must occur. Taking direction from the current state of computing technologies, new architectures and devices are being implemented which will not only continue the scaling of components but also provide increased function and purpose of computers. It is for these purposes that the SCQCA has been developed and has been presented in this paper. The SCOCA provides a construct upon which a highly parallel, fault tolerant, reconfigurable, and reversible architecture can be developed. #### REFERENCES - [1] The International Technology Roadmap for Semiconductors, ITRS, 2009 Edition, Available: <a href="http://www.itrs.net/">http://www.itrs.net/</a> - [2] H. El-Rewini and M. Abd-El-Barr, Advanced Computer Architecture and Parallel Processing, Hoboken, NJ, Wiley, 2005 - [3] C. Lent, P. Tougaw, W. Porod, and G. Bernstein, "Quantum Cellular Automata," Nanotechnology, 4, 49-57, 1993 - [4] P. Tougaw and C. Lent, "Logical devices implemented using quantum cellular automata," Journal of Applied Physics, 75, 1818 1825, 1994 - [5] B. Isaksen and C. Lent, "An architecture for molecular computing using quantum-dot cellular - automata," Proc. IEEE 3rd Conference on Nanotechnology, IEEE-NANO'03, 402-405, 2003 - [6] R. Cowburn and M. Welland, "Room-temperature magnetic quantum cellular automata," Science, 287, 1466-1468, 2000 - [7] A. Imre, G. Csaba, L. Ji, A. Orlov, G. Bernstein, W. Porod "Majority Logic Gate for Magnetic Quantum-Dot cellular Automata," Science, 311, 205-208, 2006. - [8] R Landauer, "Irreversibility and heat generation in the computing process," IBM J Res Develop, 5, 183–191, 1961