Article - Issue 49, December 2011
Future for mobile handsets
A microprocessor incorporates the functions of a computer’s central processing unit on an integrated circuit. The increase in capacity of microprocessors has followed Moore’s law, which suggests that the number of transistors that can be fitted onto a chip doubles every two years © ARM
How microprocessors have enabled step changes to be made to mobile computing
Mobile handsets, not PCs, look set to be the future template for consumer computing. At the heart of this revolution is ARM, a UK company whose low-power processor designs provide the intelligence in over 95% of today’s smartphones. Ingenia talked to people in the company about how it has reached this position and how graphics processors will transform the world of mobile computing.
Mobile phones have evolved rapidly over the last decade from nice-to-have accessories to fixtures in our pockets alongside wallets and house keys. Over the next few years, ARM sees new types of mobile handsets emerging that will become central to our lives. They will not only be able to help us navigate when we are lost and access the internet on the go, but will also be capable of all the things we use desktop PCs for and more, including editing and viewing media, such as video.
The broadband mobile access standard, Long Term Evolution (LTE), is an important part of this changing scene because it will deliver the bandwidth and performance of home networks to mobile devices. Unlike desktop PCs, however, modern mobiles also come equipped with multiple sensing elements such as cameras, GPS sensors, compasses and accelerometers that can provide context-based information. They allow someone’s location or position to augment cloud- or handset-based applications given the right computing resources.
Since its formation in 1990 as a spin-off from the computer firm Acorn, ARM’s business has been designing and selling low-power processors (for example central processing units, graphics processors and bus systems), which it licenses as intellectual property (IP). Customers integrate the IP with design components to produce complete systems-on-chip. The company currently licenses 800 processor designs to over 250 semiconductor makers that sell these sytems-on-chip to the original equipment manufacturers who build the devices you and I use every day. It ships out 250 of its microprocessors every second, 60% of which end up in mobile phones.
If ARM had been founded in Silicon Valley as a microprocessor chipmaker, the company would probably not be around today, says Chief Executive Warren EastFREng, who has been with the company since 1994. Based in Cambridge, with no easy access to venture capital, it had to be more imaginative with its business model and looked early on for high-volume global sales. Its first licenses were with a UK company (GEC Plessey), a US company (VLSI Technology) and a Japanese company (Sharp).
ARM was perhaps also fortunate to start at the same time as mobile phones were turning from analogue to digital and when there was no established low-power microprocessor for such devices. But since then, says East, it has made its own luck, not least by working closely with technology leaders. Its microprocessors were in all the famous precursors to the smartphone such as the Nokia Communicator, the Psion Series5 Organiser (pictured) and the Apple Newton. More recently, in 2006, having seen the potential of graphics and video to transform mobiles, the company bought the Norwegian start-up Falanx, whose graphics technology, in the form of the Mali family of graphics processing units (GPUs), is now bringing PC-style graphics to the next generation of handsets.
The company’s roots go back to the early 1980s, when the BBC commissioned Acorn to make a home computer for a public education television series. The first BBC Micro used the 8-bit Rockwell 6502 microprocessor, but when it came to boosting performance, Acorn engineers Sophie Wilson FREng and Steve Furber FREng decided they would design their own chip inspired by work by IBM Research and others in reduced instruction set computing (RISC) (see RISC history).
Acorn launched the first ARM-based computer (the Archimedes) in June 1987, based around Wilson and Furber’s production processor the ARM2, which was a 32-bit, 10MHz RISC device with only 25,414transistors. As a comparison, at the time, Motorola’s popular 32-bit, 68000 processor contained 68,000 transistors. The relative simplicity of ARM’s processors has been one of the reasons the company has come to dominate the mobile market. It now has 20 different types of processor, which (putting aside optional extensions that have been developed over the years), still share the same basic instruction set.
The ARM1 was one of the world’s first production RISC processors, with only 25k transistors.
Reduced power, increased performance
While we might think of it as a communications device, the modern smartphone is mainly occupied with computing and graphics tasks, as well as the connectivity required to carry these out. The central application processor might have two CPU cores running at 1 GHz clock speeds and a GPU capable of many gigaflops (Gflops) of performance. It is this system-on-chip that allows us to post messages and photographs on social networking sites, send emails, and play computer games all day with typically only a 1200 mAh battery.
Mobile handsets will be no less power-constrained in the future, and yet they will need to do much more, according to Ian Smythe, director of the company’s Multimedia Processing Division. He says we can expect multiplayer real-time video gaming over broadband mobile networks, augmented reality applications that require image recognition and the ability to track objects in real time, photo and video editing, and even 3D gesture-based user interfaces, all presented with the beautiful, seamless video and graphics we enjoy on high-definition TV and PC screens.
A smartphone such as the Samsung Galaxy SII can already drive a high-definition 1080p (1920×1080 pixel) digital TV screen. The ability to connect handsets to higher-resolution screens of 2048×1080 pixels or 4096×2160 pixels is coming next, says Laurence Bryant, the company’s segment director for mobile applications. Mobile devices will take on multiple personalities, working with many different operating systems and offering a range of user experiences across multiple form factors. What feels like a phone while you are out and about, says Bryant, may seem more like a PC or a multimedia server at home when you connect it to and it drives a larger display.
Delivering such graphics-intensive, instantly scalable performance in a handheld battery-powered device is an immense challenge. For example, to play a state-of-the-art video game like ID Software’s RAGE on a desktop PC with a 1080p screen requires a graphics processing card (such as the Radeon HD5550) that can calculate the positions and colours of 2 million pixels, each containing 32 bits of information, at a frame rate of 60 fps in real time. Typically, such a high-end video card would feature multiple graphics engines working in parallel providing a quoted 352 Gflops, powered by the manufacturer’s recommended minimum of a 400W power supply.
Today’s smartphone GPUs such as ARM’s Mali-400 deliver tens of Gflops from a typical 1200mAh battery, moving rapidly to 100s of Gflops with devices like the Mali-T600 series in the next two years. And yet smartphones manage to provide usable operation time while remaining in a power envelope two orders of magnitude below the desktop.
Scalable processing enables high-performance, energy-efficient consumer electronics, including mobiles, tablets and HD digital TVs
Running computing tasks across multiple CPUs, GPUs, and other processors (known as multiprocessing) is generally thought to be the way to bring applications like RAGE to mobile handsets. When tasks are balanced in parallel across a number of processors, each individual processor no longer has to be used at its peak clock frequency, which turns out to be more power-efficient. This is how symmetric multiprocessing works. Other battery-preserving tricks can be also used, such as powering up only the cores working on a specific task.
Multiprocessing using CPUs and GPUs has become more viable since graphics processors have become programmable, with their functions defined using sets of software instructions called shaders, which describe the traits of a point in 2D or 3D space (known as a vertex) or of a particular pixel. Shaders give the application programmer freedom to define colours, textures, depth and lighting effects very precisely in order to produce the increasingly realistic images we are used to, as inspired by computer-generated animations such as Pixar’s Toy Story and Cars.
To efficiently use cycles from programmable GPUs for general purpose computing, ARM has been working with a number of partners (as part of a standards body called Khronos) on a cross-platform, parallel programming standard called OpenCL that allows software developers to harness a mix of multi-core CPUs, GPUs, and other processors (see box).
The company’s Mali-T604 is the first mobile multicore GPU core suitable for general purpose computing to support Full Profile OpenCL. The T604 has up to four shader cores that can work independently or in parallel. The T604 also features a ‘Job Manager’ function, which can dynamically allocate program elements among the shader cores. For instance, if the operating system decides in the middle of a graphics task that it needs to save power by turning off a core, the Job Manager will handle the request to turn it off without degrading performance.
We can expect to see Mali-T604 devices in mobile handsets in the next one to two years, working in tandem with multicore CPUs like the Cortex-A15, which can be scaled up to two clusters of four cores for non-mobile applications. To get a sense of the computing power they will deliver, the Mali-T604 has up to five times the performance of the Mali-400 at over 50 Gflops and up to 2Gpixels/s, and is scalable up to four cores. Mobile configurations of the Cortex-A15 are expected to deliver over five times the performance at the same power budget of today’s top-of-the-line smartphones such as the Motorola Atrix and Samsung Galaxy SII that use the company’s dual Cortex-A9 cores.
Meeting the systems challenge
For GPUs and CPUs to share data efficiently, the on-chip communications between them also need to be re-engineered. ARM has risen to this challenge with the AMBA Bus CoreLink CCI-400 Cache Coherent Interconnect that enables system-level cache coherency across clusters of multicore processors, such as the Cortex-A15 processor and the Mali-T604 GPU.
If you think of a system-on-chip as a city, Cache Coherent Interconnect is like a highway system that keeps interactions between CPUs and GPUs within ‘city limits’. In today’s systems, the point of coherence is the external memory, which is like traveling to another city a few hundred miles away. Historically cache maintenance has required a lot of software overhead to clean caches when moving any shared data between on-chip processing engines. Hardware coherency reduces software cache maintenance, saves processor cycles and reduces external memory accesses. ARM has worked on its new on-chip communications protocol with Arteris, Cadence, Jasper, Marvell, Mentor, Sonics, ST Ericsson, Synopsys and Xilinx.
As mobile computing devices become our primary computing platforms, further innovations will emerge, says Warren East. Today you can get a weather forecast on your smartphone, but in five to 10 years’ time, its successor might act as the server to send the weather forecast to a wireless chip embedded in a bathroom mirror or toaster in the morning. Because they have a common instruction set, the same software that runs on state-of-the-art processor-based smartphone chips like the Cortex-A15 can run on much simpler processors that only consume nanowatts embedded in household objects.
For a company whose roots started in desktop computers, what is notable two decades later is how the latest ARM processors are beginning to complete the circle. Combinations of cores like the Cortex-A15 and Mali-T604 are looking increasingly attractive for use in PCs and server farms as the most power-efficient form of computing.