Monday, September 28, 2009

GOOGLE CHROME OS

The big news this morning was that Google is to release its own Operating System during the second half of 2010.

Initially targeted at Netbooks (incredibly small laptops with relatively low specifications), Google Chrome OS will be a lightweight, open source alternative to Windows. It will be designed primarily for online use, with the entire OS essentially consisting of the Google Chrome browser running on a Linux backend.

The vision is that in the future rather than a developer producing a software package that requires a download and installation they would instead create a web based application that could be run from any Internet browser. In many ways the idea makes a great deal of sense; you wouldn’t have to worry about updating your software, transporting files from one machine to another or indeed creating backups of your files. Everything would be stored online and as such none of these usual factors would be an issue anymore.

Developers also wouldn’t have to worry about creating multiple versions of the same application for different Operating Systems because as long as the user had an up to date browser they would be able to run the software. Regardless of whether you were using a computer that ran Google Chrome OS, Mac OS or Windows, you would still have access to all your favourite online applications.

The ambition Google holds is that eventually Chrome OS will develop in to something that could be seen as a viable alternative for use on all types of computer, not just Netbooks. Personally, I both love and hate this idea of shaking up the way we use our computers in equal quantities; the possibilities are huge but the disadvantages are potentially crippling and too obvious to ignore.

First and foremost consider the fact that the whole idea is pretty much reliant on the user having a continuous connection to the Internet. For many, this isn’t a problem as most home and office based users already have an ‘always on’ broadband connection, however if you find yourself in a situation without Internet then your Operating System immediately becomes useless. While mobile broadband is becoming faster and cheaper for those that travel away from a fixed Internet connection, it won’t help if you’re stuck on a plane for eight hours!

A workaround to this would be to allow the OS to download web applications to your computer then run them as if you were connected to the Internet. Unfortunately, as soon as this becomes a consideration we neglect the primary purpose of having an online based OS in the first place.

It is also undeniable that at present most Internet applications are a little primitive. They have to be given credit for evolving incredibly quickly in recent years however they’re still a long way off representing a viable alternative to the large, installed applications most of us currently use. In the past I have discussed Internet based software such as Google Docs and in the future I plan to cover advanced online applications such as the drawing application SplashUp but these online versions still lag behind their desktop equivalents in both speed and functionality.

No one knows exactly what the future holds. It is undeniable that in the last couple of years our computing activities have become a lot more oriented around the Internet but whether we are ready for them to be entirely transferred remains to be seen.


Source: www.computerarticles.co.uk

TECHNOLOGY FAILURES

The world of technology is full of bright new ideas that promise to change the way we live; however, for every success there are numerous failures. Some technologies genuinely surprise us when they fall by the wayside, but others we realise were doomed to failure from the very beginning:

Doomsday Project – The BBC Doomsday Project was a partnership between Acorn Computers, Philips, Logica and the BBC and was designed to mark the 900th anniversary of the original doomsday book. It was compiled over a period of three years and was published in 1986 after having over one million people contribute to the project. The material included maps, colour photos, statistical data, videos, virtual reality tours of major landmarks and the entire 1981census.

This information was stored on specially adapted laserdiscs with the intention that future generations could then look back on the material in years to come, however the laserdisc standard never lasted and, as such, it is close to impossible to find a machine nowadays capable of reading the code. Eventually a project was started to emulate the old system and publish the information to the Internet however the gentleman who was reverse engineering the project suffered an untimely death and as a result the Doomsday Project website remains offline.

The Internet Connected Fridge – Although it is too early to say that such an invention will never take off, the Internet Fridge has spent ten years in the making and doesn’t show any real promise of becoming mainstream. Personally I’ve never liked the idea of my fridge managing my kitchen inventory and then automatically buying my weekly shopping online.

I’d like to think that even in this modern day society individuals would want to go out and select their own weekly produce based on what’s freshest at the time rather than having their fridge do it for them. It’s also a little surreal when a machine decides that because you had strawberries and cream after tea this evening that you would necessarily want the same thing delivered to your doorstep the following morning.

The Paperless Office – For years there has been talk of the paperless office; a world where everything is digital and printers are verging on redundancy. Fortunately for our business (however somewhat unfortunate for the environment) the real world situation is that the modern day office is far from paperless. I’m not sure what it is but there is something a little reassuring about paper; if faced with a fifty page report it is bizarrely easier to read it off sheets of bleached bark than off a state of the art liquid crystal display.

Video Phones – The longest video phone conversation that I have ever had lasted two seconds, and that was simply to test that my mobile phone SIM card supported 3G. There have been real attempts to push video calling on to the general public and it is one of the main sales pitches of the mobile operators when touting their new high speed networks however the service still struggles to find an audience. There are now no real technological barriers preventing all of us from video calling, however the simple truth of the matter is that people don’t want to see who they’re calling.


Source: www.computerarticles.co.uk

802.11N WIRELESS NETWORKING

It’s been years in development but this September it looks like 802.11n Wi-Fi will finally become a standard… well, an official standard anyway.

Presently the majority of the wireless hardware you will buy (routers, wireless network cards, printers etc) will use a networking specification called 802.11g which has a maximum speed of 54Mbps. This maximum speed is being increasingly seen as inadequate as applications become more complex and require more bandwidth.

The successor, 802.11n is being ratified to increase both the speed and range of wireless devices however it should be noted that due to the time the IEEE Task Group n have been arguing about the intricacies, equipment manufacturers got bored and decided to run with the draft specification. As a result, the fact that 802.11n is becoming ‘official’ is unlikely to change a great deal as hardware utilising the new standard has been available for some time now. Although these devices have been produced working on the draft specification, the reality is that there are very few differences between this and the anticipated final ‘official’ release.

Essentially based on the current 802.11g standard, 802.11n uses some new technology and tweaks to give Wi-Fi more speed and range. The most notable part of this technology is called ‘multiple input, multiple output’ or MIMO for short. MIMO uses several antennas to transmit multiple data streams simultaneously rather than a single antenna transmitting just one stream of data. This allows more data to be transmitted in the same period of time while also increasing the potential range of the network.

Other technologies include payload optimisation which results in more data being transmitted in each packet and channel bonding which can use two separate non-overlapping channels at the same time to transmit data. The result is all this is achievable data transmission rates of around 100Mbps and double the potential range of 802.11g.

There are no security enhancements as they simply aren’t needed; the WPA2 encryption standard provided by existing network hardware is considered by most to be ‘extremely secure’.

It’s worth checking the box of any network equipment you have purchased in the last couple of years as you may find it is already compatible with 802.11n and simply needs setting up correctly. It goes without saying that in order to benefit from the faster speed both the transmitting and receiving devices both have to support 802.11n; A 802.11n router working with a 802.11g laptop will result in slower 802.11g speeds.

At present ‘n’ rated hardware is more expensive than the older ‘g’ standard however not prohibitively so; our ‘n’ rated wireless router typically retails for around a tenner more than the £25 ‘g’ rated equivalent. Of course, if you are already happy with your wireless network and the upgrade will mean replacing perfectly functional hardware it is certainly worth considering whether your needs warrant the faster hardware.

At present 802.11n will only be required by those with blisteringly fast broadband connections or those that regularly copy large volumes of data across a wireless network however it will soon become the norm. If upgrading your hardware, therefore, it may well be worth paying a couple of extra pounds now to ensure that you remain future proof.


Source: www.computerarticles.co.uk

ISO FORMAT

One of the most common file types used in the distribution of software over the Internet is the .ISO format; these single files contain within them an exact copy of an entire CD or DVD disc. They are ideal because rather than having dozens, hundreds or even of thousands of files to transport you only have one.

There are of course other methods of achieving the same goal; some of you may be familiar with .zip or .rar files which have the added advantage of being able to not only take many files and store them temporarily as one but also compress the data, making the total file size smaller. Unfortunately, when using this method on a media disc you strip out important characteristics of the original such as boot code, disc structures and file attributes which can often prevent an application from running.

As a result of this exact copy process, the format has become a popular method of transporting pirated software – the ISO file is an exact match of the original and as such there is no reason the software would realise it had been copied and refuse to run. There are plenty of legal uses for the format, however, and it is highly likely that even the most legitimate users will come in to contact with an ISO file at some stage sooner or later.

Unfortunately they are not the most user friendly file format; you can’t simply complete the download then double click to run the application as they first have to be returned to their original format. There are typically two methods that can be used to complete this process; burn the data back to a physical disc or create a virtual drive on your computer.

The first method relies on a piece of disc burning software such as the fantastic and free CDburnerXP (www. cdburnerxp.se). Simply click ‘burn an image to disc’, point the application in the direction of your ISO file, insert a blank CD or DVD and your computer will then spit out an exact copy of the original media as if it had come direct from the manufacturer.

The second method uses a piece of software such as the free Alcohol 52% (www.alcohol-soft.com) to create a virtual drive that can simulate an actual CD or DVD disc. By asking Alcohol 52% to ‘mount’ an ISO file downloaded from the Internet, Windows will be tricked in to thinking there is a physical drive connected to your computer containing the original disc. You simply access it through Windows Explorer as you would any other regular drive connected to your computer.

This latter method prevents us from having to waste a blank CD or DVD which is especially handy in instances where it would have only been used to install the application before being discarded. We can also keep the original ISO file and mount it whenever it’s required rather than having to store and locate a physical disc when the data is required again in the future.

There are additional benefits, especially when it comes to speed. Not only does it take seconds to mount an ISO file, rather than the minutes it would take to burn it, but in addition the speed of reading the data is exponentially faster. A virtual CD for example will read at 200x speed whereas the fastest CD drives on the market are limited to 52x speed.

Some users may also appreciate the ability to create multiple virtual drives that can function simultaneously; this is useful if you have more than one disc that you need frequent access to without the need for multiple readers. You can create copies of your own discs and load them up on to multiple virtual drives in this fashion and as long as you own the original it is perfectly legal.


Source: www.computerarticles.co.uk

PAINT.NET V3.5

Paint.net is an application I’ve found myself using on a day to day basis over the last couple of years. In my opinion it’s a perfect bridge between the ridiculously simple but underpowered ‘Microsoft Paint’ that’s bundled with Windows and the powerful but expensive to buy and difficult to learn ‘Adobe Photoshop’.

The majority of users will find the features of Paint.NET more than enough for day to day photo manipulation however, also thrown in to the mix, are a number of extremely powerful tools.

Firstly, the application has layers; unless you’ve ever used layers it’s difficult to explain in words exactly what they are and how they can help you. The simplest way to explain them would be to think of them as a number of transparency slides, which when all stacked and viewed together form the basis of a whole image. If you change one of the individual slides the overall image will look different, but no changes will be made to the other slides. This means that changes that you make to one layer which are later altered won’t result in damage to the image caused by previous alterations.

Another good feature is the way that the application allows you to view multiple images all at once; rather than having a number of windows that can be minimised and reopened as and when necessary, Paint.NET has a clever style of tabbed interface. In order to navigate between open files you simply click a thumbnail of the image from a scrollable selection on the top right hand side of the screen.

I did fear that development on this fantastic free application had become stagnant – the last release (3.36) was well over a year ago and even that only offered minor improvements on earlier versions. Having obtained the beta for version 3.5 which is due to be release as a finished product next month it would appear the developers are back on track.

Notable improvements include improved memory usage, a Vista / Windows 7 style glass look, new effects along with the usual helping of bug fixes and rendering improvements.

Speaking of special effects, Paint.Net has a large number built in as standard; ink sketch, oil painting, blurs, distortions, noise control, red eye removal, sharpening, softening and so on. The image can also be manipulated by way of adjustments such as hue, saturation, level, brightness, contrast and sepia controls. The standard drawing and selection tools are provided and of course the intensity, tolerance or size of these tools can all be easily adjusted as necessary.

The size of the application has increased quite considerably since the last release which weighed in at 1.6mb but regardless the application is still a tiny 4.8mb in size which in relative terms is about the size of one MP3 music file.


Source: www.computerarticles.co.uk

Bounds Checking for C

Approaches to bounds checking

One response to this analysis is to discard C, since this lack of efficient checkability is responsible for many software failures.

A second approach is to extend the language to make checking easier. There are various proposals for doing this, and it is an opportunity to add other features such as assertion checking.

A third more-or-less workable scheme is to modify the representation of pointers to include three items: the pointer itself, and the lower and upper bounds of the object to which it is supposed to point. Experience with this has shown the benefits of bounds checking (e.g. see the bcc and rtcc compilers cited below), but there are difficulties:

  • Although some optimisation is possible, execution time of the resulting code increases by a large factor (ten or more, apparently).

    Even if the checking code can be optimised away, there remains the cost of passing triples for every pointer - which essentially prevents their being allocated to registers.

  • Because the representation of pointers has been changed, checked code is incompatible with normal code. This means that special versions of all libraries and system calls must be provided, and all the constituent modules of a program must be run with checking on. This adds to the performance problem.

    Some automatic support for interfacing checked code with normal code can be given, but this only works for straightforward cases. GUI code with call-backs, for example, is tricky.

  • Code which interfaces to hardware (e.g. a DMA controller) requires special attention since the hardware must be presented with standard addresses.

How we solved the problem

Our technique provides full checking without changing the representation of pointers. We therefore avoid most of the problems noted above. Some efficiency problems remain, but bounds checking need not be used in all of the files which make up a program, so trusted, performance-critical code can run at full speed.

The key idea is this:

  • Every pointer expression derives a new pointer from a unique original pointer.

    For example, in "p+2*k+1" we derive a new pointer from "p".

    By contrast, in "p+q" or "p-q", we derive an integer from two pointers. The integer is nonsense as a pointer.

    We call this unique original pointer the expression's "base" pointer.

  • Every pointer value is valid for just one allocated storage region.

    An allocated storage region may be a global, static, automatic or heap-allocated variable, structure or array.

  • We can check whether a pointer arithmetic expression is valid by finding its base pointer's storage region, then checking that the expression's result points into the same storage region.

  • If the base pointer appears not to refer to any valid region, then it must refer to a region originating in unchecked code. In this case we cannot check the result of the expression.

  • If the base pointer's storage region is an array, say A[100], then (according to the ANSI standard) it is valid to calculate the address of the element after the last one valid (in this example, the address of A[100]).

    This is so that a pointer can be incremented and then tested for the loop exit condition.

    To prevent false alarms, we pad the storage layout of arrays to that A[100] is a valid pointer (we still check it when it is used).

Implementation

We made some small modifications to the C front-end of gcc, the Gnu C compiler, to add code to check pointer arithmetic and use, and to maintain a table of known allocated storage regions.

We went to some trouble to ensure that gcc's optimiser could handle the added code, and employed modest inlining for efficiency.

The table of known allocated storage regions has to handle insertions, deletions and range lookups extremely fast, but since programs display a high degree of locality the access pattern is highly skewed. For these reasons a splay tree was used, in which objects are migrated to the root when accessed.

Performance

  • nfib (dumb doubly-recursive Fibonacci): no slowdown.
    • Execution time: same.
    • Compile-time: slowdown of 3 (very small)
    • Executable size: much larger due to inclusion of library.
  • Matrix multiply (ikj, using array subscripting):
    • Execution time: slowdown of around 30 compared to unoptimised.
    • Compile-time: slowdown of around 2.
    • Executable size: roughly the same.
Source: www.doc.ic.ac.uk

Tuesday, September 15, 2009

CHILD LABOUR

CHILD LABOR: 11 YEAR-OLD HALIMA SEWS CLOTHING FOR HANES

READ CHILD LABOUR ACT, OFFICIALS TOLD

Dharwad: Principal District and Sessions Judge John Michael D’Cunha has asked the government officials to properly study the law against child labour before booking cases against persons employing children for various works.

Inaugurating the one-day workshop for inspectors and heads of taskforce against child labour, in Dharwad on Friday, Mr. John Michael D’Cunha said that the officials should first get themselves acquainted with the law and then take steps to take action against the guilty and also take steps to rehabilitate the child labourers. He said officials should collect adequate information and documents against the offenders before filing cases against them for employing children. Deputy Commissioner Darpan Jain said effective implementation of Child Labour (Prohibition and Regulation) Act had not been possible as there was lack of proper understanding of the provisions of the law.

Mr. Jain asked the officials to first understand the provisions of the law and acquaint themselves with the issues concerning the implementation of it and then go ahead with discharging with their responsibilities in a proper manner.

Mr. Jain said that if requisite steps were not taken to prevent child labour, it would indirectly lead to promotion of immoral and illegal activities.

Principal Civil Judge Devendrappa N. Biradar said owners of the hotels and restaurants who should be sensitised on the issue.


Source: www.sadashivan.com

COURT CRITICISES STATE ON CHILD RESCUE SYSTEM

“Why don’t you stop all this tamasha?”

That’s what an irked Justice Bilal Nazki of the Bombay High Court told the state government on Tuesday. The remark was made over that the government’s “efforts” to show that child labourers who were rescued and sent back to their parents were being sent to schools.

A report was submitted by advocate Rebecca Gonsalves who had been appointed as amicus curie (friend of the court), after visiting four rescued children sent back to their parents.

Following the court’s order, Gonsalves and advocate Vijay Hiremath visited four children to ensure they were pursuing their education as claimed by the government.

The amicus curie’s report said three of the four children had been attending school even before they were “rescued” and sent to the Child Welfare Committee.

Two of the children had been picked up for working at a tea stall and a garage. Another was at his father’s tea stall during school vacation when he was picked up, said Gonsalves.

The fourth child had gone to learn motorcycle repair work at a garage during vacations when he was picked up, said Gonsalves.

The HC observed that such an exercise by the government did not serve any purpose. “Why spend money if it is not giving results? This propaganda has been going for so many years,” said Justice Nazki.

Additional Public Prosecutor Aruna Kamat Pai argued that rescued children were being given vocational training and those who stayed for a longer period, over two to three years, were sent to school.

Pai submitted that the state was inclined to frame rules along the lines of the Central government which state that rescued children be monitored for two years.

The HC had taken suo motu cognisance of a letter by a formed judge, highlighting the issue of child labour following news reports.


Source: www.sadashivan.com

Sunday, September 13, 2009

THE STRUCTURE OF HIV

What does HIV look like?

Outside of a human cell, HIV exists as roughly spherical particles (sometimes called virions). The surface of each particle is studded with lots of little spikes.

An HIV particle is around 100-150 billionths of a metre in diameter. That's about the same as:

  • 0.1 microns
  • 4 millionths of an inch
  • one twentieth of the length of an E. coli bacterium
  • one seventieth of the diameter of a human CD4+ white blood cell.

Unlike most bacteria, HIV particles are much too small to be seen through an ordinary microscope. However they can be seen clearly with an electron microscope.

HIV particles surround themselves with a coat of fatty material known as the viral envelope (or membrane). Projecting from this are around 72 little spikes, which are formed from the proteins gp120 and gp41. Just below the viral envelope is a layer called the matrix, which is made from the protein p17.

The viral core (or capsid) is usually bullet-shaped and is made from the protein p24. Inside the core are three enzymes required for HIV replication called reverse transcriptase, integrase and protease. Also held within the core is HIV's genetic material, which consists of two identical strands of RNA.

What is RNA?

HIV belongs to a special class of viruses called retroviruses. Within this class, HIV is placed in the subgroup of lentiviruses. Other lentiviruses include SIV, FIV, Visna and CAEV, which cause diseases in monkeys, cats, sheep and goats. Almost all organisms, including most viruses, store their genetic material on long strands of DNA. Retroviruses are the exception because their genes are composed of RNA (Ribonucleic Acid).

RNA has a very similar structure to DNA. However, small differences between the two molecules mean that HIV's replication process is a bit more complicated than that of most other viruses.

How many genes does HIV have?

HIV has just nine genes (compared to more than 500 genes in a bacterium, and around 20,000-25,000 in a human). Three of the HIV genes, called gag, pol and env, contain information needed to make structural proteins for new virus particles. The other six genes, known as tat, rev, nef, vif, vpr and vpu, code for proteins that control the ability of HIV to infect a cell, produce new copies of virus, or cause disease.

At either end of each strand of RNA is a sequence called the long terminal repeat, which helps to control HIV replication.

Entry

HIV can only replicate (make new copies of itself) inside human cells. The process typically begins when a virus particle bumps into a cell that carries on its surface a special protein called CD4. The spikes on the surface of the virus particle stick to the CD4 and allow the viral envelope to fuse with the cell membrane. The contents of the HIV particle are then released into the cell, leaving the envelope behind.

Reverse Transcription and Integration

Once inside the cell, the HIV enzyme reverse transcriptase converts the viral RNA into DNA, which is compatible with human genetic material. This DNA is transported to the cell's nucleus, where it is spliced into the human DNA by the HIV enzyme integrase. Once integrated, the HIV DNA is known as provirus.

Transcription and Translation

HIV provirus may lie dormant within a cell for a long time. But when the cell becomes activated, it treats HIV genes in much the same way as human genes. First it converts them into messenger RNA (using human enzymes). Then the messenger RNA is transported outside the nucleus, and is used as a blueprint for producing new HIV proteins and enzymes.

Assembly, Budding and Maturation

Among the strands of messenger RNA produced by the cell are complete copies of HIV genetic material. These gather together with newly made HIV proteins and enzymes to form new viral particles, which are then released from the cell. The enzyme protease plays a vital role at this stage of the HIV life cycle by chopping up long strands of protein into smaller pieces, which are used to construct mature viral cores.

The newly matured HIV particles are ready to infect another cell and begin the replication process all over again. In this way the virus quickly spreads through the human body. And once a person is infected, they can pass HIV on to others in their bodily fluids.


Source: www.avert.org

WHAT IS AN EXTERNAL HARD DRIVE?

Internet access regularly exposes computers to potential security threats like Trojan horses, viruses and spyware. It has become increasingly difficult to guard against these threats, even when employing firewalls and antivirus programs. Aside from online threats, multiple family members often use the same system, putting sensitive documents at higher risk of inadvertent corruption or loss. Additionally, the space required for multimedia storage has soared with MP3s, streaming video, DVD burning, and photo files taking up large amounts of space on the hard drive. All of these unrelated concerns can be addressed with one simple answer: an external hard drive.

An external hard drive sits outside the main computer tower in its own enclosure. The enclosure is slightly larger than the hard drive itself, and sometimes contains a cooling fan. This portable encasement allows the user to store information on a hard drive that is not inside the computer, but rests on a tabletop or surface nearby the computer. The external hard drive is connected to the computer via a high-speed interface cable. The interface cable allows the external hard drive to communicate with the computer so that data may be passed back and forth. The most common types of interfaces are USB and Firewire.

A portable or external hard drive is quite a useful piece of equipment. It allows the user to back up or store important information separate from the main internal hard drive, which could become compromised by online or offline activities. Sensitive documents, large music files, DVD images, movies, disk images, and even a backup of the contents of your main internal hard drive, can all be kept securely and safely on an external hard drive. When you are online, you can even leave the external drive turned off.

Another advantage of an external hard drive is that it is portable and operates on a plug-and-play basis. Any computer with USB or Firewire capability will recognize the external hard drive as a storage device, and assign it a letter to designate it. The drive can then be accessed like a normal internal hard drive. It's a snap to transfer huge files back and forth from work to home, to a friend's house, or between your desktop and laptop. Just plug in the interface cable to quickly reconstruct a working environment, making your favorite programs virtually portable.

If you have multiple family members using your computer system, consider an external hard drive to keep financial information and other sensitive documents secure. When you are ready to use the computer, you can plug in the external drive and have all your data and programs available. When finished, simply unplug the drive and take it with you to lock in a drawer or other secure location.

There are inexhaustible reasons to use an external hard drive. You can even buy several hard drives and swap them in and out of the same enclosure, using one for multimedia storage, one for imaging, one for backup, one for work, and so on.


Source: www.wisegeek.com


AMD IS GETTING BACK IN THE PERFORMANCE SADDLE

Ever since misery struck AMD roughly a year ago, this company has been pushing really hard to get back on its feet.

Fighting off a stigma that their processors definitely do not deserve, followed by a global financial crisis. And what was the word that nobody ever had heard about, yet everybody got to know within weeks? That's right... TLB.

But good news people, in this article we won't go into the TLB bug. It's been a year, it's been fixed, it's in the past and AMD has moved on, and so should you. Ever since the B3 revision of Phenom processors hit the market last year, AMD slowly but steadily started gaining ground again. And that's difficult, because that stigma was haunting them. There was something else though. The last generation architecture hit a limit with the Phenom processors, 2.6 / 2.7 GHz was the maximum with the Barcelona architecture, limiting the processors to an Intel created mid-range field.

As much as Intel has a Tick-Tock strategy (release new architecture one year, spin-off product the following year), AMD put their development into 6th gear, decided to skip a beat and moved forward with their 45nm products. And that's where we are today. AMD today will release their Dragon infrastructure. An infrastructure that has a motherboard, graphics cards and a processor.

Today specifically we will focus on that new processor being launched today. The "Tock" product is in fact based off the Barcelona core, yet now manufactured at a much smaller fabrication processes; 45 nanometer, and has different caches. The result... their processors can now run at 3.0 GHz fairly easy, be cool and still have enough headroom for a nice tweak or two. Pretty significant, pretty interesting.

Therefore, let me introduce to you: AMD Phenom II.

In the coming weeks and months you will see new products based off that Phenom II processor line we are reviewing today. As of today these products will be available in the storesThe products being:

  • AMD Phenom II X4 920 at $236 (2.8 GHz)
  • AMD Phenom II X4 940 at $278 (3.0 GHz)

Our primary focus today will be these two processors, which are launched at pretty spectacular prices; especially when you notice how much performance they bring to the table. AMD made a big step forward with the release of Phenom II and we wanted to make sure to show you exactly that with a large amount of benchmarks, but obviously also performance tests with games as this processor might be a really good sweet spot for high-end graphics cards.


Source: www.guru3d.com

INTRODUCTION TO SYSTEM SOFTWARE

CRT DISPLAY DEVICES

INTRODUCTION TO COMPUTER GRAPHICS

STRUCTURED QUERY LANGUAGE

RELATIONAL MODEL

RELATIONAL MODEL

CONCEPTUAL DESIGNS

INTRODUCTION TO DATABASE MANAGEMENT SYSTEM

THE CALLING CARD ALTERNATIVE

How many times have you complained about high long distance costs and phone companies overcharging you? Did you try to do anything about it? Did you search for alternate solutions? Calling cards might be the answer. Their low rates to any destination make them the perfect buy for domestic and international calls.

For a few years now, the calling cards business is booming. Everywhere you go, everywhere you search you might find one: in WallMarts, grocery stores, newspaper stands, vending machines in coffee shops. But the place you can find the most of these long distance alternatives is the internet. A quick search on Google, Yahoo or other search engines will reveal thousands of websites that sell calling cards. So, it's an easy pick, one might say. Well... not quite. According to the FCC, almost 70% of the calling card businesses are fraudulent. Meaning mostly that they get your money but you don't get the calling card. That means that you have to be very careful when choosing a website to buy from. On top of that, calling cards vary in number and features, so you have to choose the one appropriate to your needs. Their low rates however, come with a price at times. Companies selling calling cards use VoIP technology and other third party carriers to complete their calls. While not as expensive as a satellite connection (hence the low rates), this technology is at the beginning, so problems may occur from time to time. This is why calling cards are not usually recommended for emergency calls. For calls within the United States however, calls made with calling cards (also known as phone cards) have a good quality and connection rate, given that you have found a good supplier.

So here are the steps you need to take to get the best out of your calling card purchase:
- Find a reliable website (this means no weird pop-ups, no advertisement of Viagra on the website - you get my point).
- Take a look at the available calling cards and rates.
- Check out any details of calling cards: usually, next to or underneath the picture of the calling card there is a link that will take you to a "Details" page. Look for maintenance fees, rounding, any other surcharges, expiration dates.
- If you intend to make a lot of long calls over a short period of time, choose a card with a maintenance fee. This means that a certain amount will be deducted from your balance each week/month until you use up the card. But if you plan to make so many calls, you'll probably use the card up by the time the maintenance fee is deducted. Calling cards with maintenance fees also tend to have lower rates.
- If you use the card just once in a while, choose a card with no maintenance fee. These cards usually have higher rates, but you don't have to worry about your balance going down if you do not use the card.
- Look for a Customer Service number. Reliable companies have Customer Service, in case their customers have questions or problems.

After this, get the card you this is best for your needs and wait for it to arrive in the email. Unless otherwise specified, you should be able to use it immediately. Good luck!

Source: www.fresharticles.net

WHAT IS LOCAL SEARCH ENGINE OPTIMIZATION AND WHO ARE YOUR RELATIVES?

Your business website is finally up and now comes the task of getting found on the incredibly huge Internet neighborhood.

Where to begin?

Well, the first thing most people do is head for a major search engine like Google, or Yahoo. And while that is a good idea, there is much to learn before laying out those precious dollars on ads and clicks. Most of us become totally confused on the breadth of information we must learn in order to be savvy marketers using the search engines. And, if you are busy running your business 150% of the time, who has the time?

The first rule we all like is ‘keep it simple". So start off modestly and test the waters as you go.

Identify your market. Do your customers come from all over the country, or mostly from nearby areas?

For most brick and mortar businesses on the Internet, your bread and butter still comes from the local neighborhoods and adjoining states.

So try this:

After developing a strong list of keywords related to your specific business, look for other attractions that are geographically near your business as well.

For instance, is there a state park near you? If you sell hiking boots and there is a state park near you, wouldn't it make sense to target those people looking for the state park? Won't they need hiking boots? Add some ‘state parks in your area' related keywords to your list.

Say you live in Jamestown and you sell fishing gear. Wouldn't it make sense to take advantage of the presence of some of the other popular things related to the water attractions in your same area? People who are looking for fishing gear also look for boats, boat supplies, life preservers, local marinas and the like.

You won't need to use really specific keywords like ‘The Fish and Bones Marina in Jamestown' for instance, just add something general like ‘marinas near Jamestown' in your keyword list. Fishermen (and fisherwomen), who are planning to come to your area to fish, are more likely type in ‘marinas in Jamestown' than a specific marina name anyway if they are new to the area. Their results will not only show the marinas in Jamestown, but your fishing gear business as well, under the same keywords. Later, when they need fishing gear, they will already be familiar with your business name. Target more attractions and they will see your name again and again.

If you are a restaurant who depends on the tourist trade, you might want to consider just what other things your potential diner might be in your town for. Consider the attractions in your area. Museums, theme parks, entertainment centers, camping, are all destinations your potential customer might be looking for as well as a place to dine. Include those attractions and local products in your keyword list and expand your visibility exponentially.

Remember, most travelers won't be looking for your particular business name unless they are already familiar with you. But your business name will pop up again and again whenever they search for those popular attractions near you.

You don't need to pay big bucks for your search engine results, just brainstorm a little bit and consider all those ‘relatives' in your area.

You may also find that some keywords are really expensive, but for the most part you won't need them. There are plenty of inexpensive and related keywords that will bring you exactly the same results if you use them creatively.

Source: www.fresharticles.net

THE ART OF WEBSITE OPTIMIZATION

They say "diamonds are a girl's best friend". Just like a flawless diamond in the making, a website demands significant attention to detail in order to stand up and deliver and captivate its target audience.
Ever wondered what components clearly define a website as "professionally" done? It should have a polished look! You don't need to spend thousands of dollars on web design, simply make sure it loads quickly and employs a simple navigation system. Your visitors must feel that the time spent at your site was time well spent! Pay extra close attention to your sales copy. If you have all kinds of questions popping in your head as you read your own sales material, so will your customers. There is a simple approach to creating great sales copy and it works like crazy.
Ask yourself a few simple questions, "What's in it for me?, "How can you fulfill your promise?", "Why should I believe you?", "What if I don't like it?". Write your sales copy to paint a portrait of "yes, I can!" by answering each question; giving a risk free assurance by including a money back guarantee along with a good time factor. Strive to incorporate all of your answers in your sales copy and you'll convert your readers into buyers.
One of the hottest topics on the Internet today is search engine optimization, another critical and important component of a professional website. With recent surveys showing over 40% of all search engine traffic coming from Yahoo Directory, it has never been more important to pay close attention to the selection of your title and description. First and foremost, get a head start on the competition by choosing a keyword rich domain name. It's a clear advantage you may have overlooked in preparing for search engine submission.
Make sure that your title is 5 words or less and includes your most important keywords. Try to limit your site description to 25 words or less. Remember, Yahoo doesn't look at your meta tags like a typical search engine, so make your title and description count! If you are submitting to Yahoo for the first time, make sure to submit to Google, Yahoo's search partner, then Yahoo in a few days. Google creates the description of your site by looking for the first bold word so make sure you optimize your site accordingly.
Every webmaster should analyze their site's keyword count. Your most important keywords should be 5% - 8% of your keyword count. If your percentages are too high, the search engines may consider your site as "spam". If your percentages are too low, your site will never appear in the Top 100 of any search. Keyword and relevancy optimization can easily move a site from complete obscurity to instant popularity in the search engines overnight. Do it right the first time and I'll see you at the top!
It's a painstaking process but website optimization is a necessary step towards securing market position. In an age of instant gratification, ones and zeros, gigabytes and terabytes, the simple virtues of patience and perseverance are seldom nurtured. In developing your website, take time to reflect on your mission, and if you do only one thing well, share your passion with your target audience. Do it for the sheer love of it. Your audience will appreciate your efforts. It's the first step towards creating a website that speaks the language that your visitors want to hear!
With the right attention to detail, just like a long lasting relationship, you'll have no fear in the night that your website might desert you.

Source: www.fresharticles.net

PDF OPTIMIZATION: DEATH TO SEO? BY PAUL BLISS

On April 18th 2005, Adobe announced that it was going to acquire Macromedia.

Besides delivering a critical blow to competitive balance of two highly recognized and respected companies, it has inadvertently created a new form of optimization.

That's right. PDF optimization.

The main technology that Adobe wanted from Macromedia was Flash. Now that they have it, they will be able to incorporate all the power of Flash into a PDF. With one fell swoop, they have changed the face of search engine optimization.

As a site owner, I can now potentially have my entire site reside within the content of a PDF. Sure, it was textually available before, but now I can even have compressed video, dynamically generated content and visually appealing content conveniently wrapped up into the web's only cross-compatible portable platform.

No more worries about having a Flash player installed - that will be incorporated into the PDF reading software. No more worrying about needing Quicktime and Media Player versions of video clips. They'll all be in Flash.

Not only is the PDF web friendly, but it is also PDA and Kiosk ready. Now content can be delivered anywhere to any device that can read a pdf. It can also be included on CD's, DVD's and even your cell phone.

From a user perspective, this is awesome. From a search engine perspective, it is great to push boundaries, but we may also see the end of optimizing for client sites, instead a client will pay a one-time fee to optimize a pdf.

Anyone who makes a living optimizing sites can see the potential loss of revenue as companies move forward and place their marketing efforts into promoting a pdf instead of a web site.

Why would a company not embrace this? While it's a true a site like Amazon would not be able to take full advantage of this, they could embed pdf optimization for dvd's and cd's sent to your cell phone, based on previous selections you've made.

It's a marketer's dream, and it makes a buzz agent's job even easier. Word of mouth marketing will be coupled with a portable demonstration of the product or service being sold.

While the general public may not become aware of this technology for a few years, those who reside on the cutting edge will find great ways to use this in promotion.

Now instead of just watching a movie trailer, you could also have the script, actor bios and studio contact information. Maybe even after the movie gets released, you could get your pdf updated with box office results.

The benefits of storing information in a pdf are huge. Instead of storing all of that information in a database, you have everything you need as a portable document. No worries about server stability, access to the database or even an internet connection.

By embracing this new development, it will be another service you can add to your seo repertoire and allow for your business to adapt to this emerging technology.

Source: www.fresharticles.net

CHANGING YOUR LIFE BY THERAPY INSPIRED OF BIRMINGHAM

How Hypnotherapy Can Help

Hypnotherapy is the process of using hypnosis to unlock the capacity of the unconscious mind to bring about therapeutic changes by modifying deeply-held assumptions, fears and misconceptions.

The issues that prompt people to turn to hypnotherapy for assistance include:

· Phobias
· Pain management
· Panic attacks
· Performance enhancement
· Habits - e.g. smoking
· Obsessive Compulsive Disorders
· Stress management
· Performance anxiety
· Insomnia
· Confidence, self-esteem and assertiveness

What Hypnosis Is

Hypnosis is a natural state of mind, enhanced by deep mental and physical relaxation. Without knowing it everyone drifts into and out of mild hypnotic states daily. These periods of time are commonly referred to as "day-dreams" or "running on autopilot".

Hypnosis has nothing to do with being asleep or unconscious in any way. You are able to hear and remember everything, and will know exactly what's going on.

People often worry that, under hypnosis, they can be made to do things they would not ordinarily agree to. This is incorrect: you remain in control all the time and cannot be made to do things that you genuinely object to.

Participants in entertainment and stage hypnosis shows are fully aware that they will be asked to act in silly ways, and they implicitly agree to this at some level of their mind.

Hypnotherapy

Hypnotherapy is simply the process of using hypnosis to ‘unlock' or access the unconscious mind, and to bring about therapeutic changes by modifying deeply-held assumptions, fears and misconceptions within it.

There are two forms of hypnotherapy:

· Suggestion Hypnotherapy or Clinical Hypnotherapy

The hypnotherapist guides the client into a relaxed state and enlists the power of the client's own imagination using a wide range of techniques from story-telling, metaphor or symbolism to the use of direct suggestions for beneficial change.

· Analytical Hypnotherapy or Hypnoanalysis

This therapy is rather more intense and requires several sessions. It involves an in-depth analysis of the individual's inner fears, blocked and unresolved feelings and repressed memories and is carried out in a quiet and gentle way allowing the memories and emotion to flow and release anger, fear and hurt of the past.

Source: www.fresharticles.net

CELLPHONES ARE THE DEVIL'S WORK

Well, if I called the wrong number, why did you answer the phone? - Cartoon caption by James Thurber.

But I say, "I will not pick up my cellphone even if you called the right number. Message/Text me."

Let's put this straight: cellphones are a threat to both your privacy and your grey matter - in the literal way.

You go to a party and find a really beautiful girl. You know that you have to talk to her. And that's exactly what you do. You grab two glasses of champagne and head towards her.
"Euhem," you manage to utter. She turns to look at you. She smiles and you hand her the champagne. Then out of nowhere, your cellphone vibrates in your pocket.
"Oh shit!" you say. She glares at you and asks, "sorry?"

Damn the cellphone. What's more, it was only mum calling to check whether you are all right.

Anywhere you go somebody can call you. Little by little, you've become an answering machine. And you don't even have a moment for yourself because the cellphone always rings/vibrates when you least need it to. Because cellphones actually do obey Murphy's Law!

But then you might say that you don't even have a private life anymore, so why bother. Well you have to know that cellphones may damage people in other ways.

Only a small amount of energy is emitted by a cellphone. However even this amount will cause stress responses in your cells and affect your reflexes.

Cellphone radiation can even cause molecular changes in your cells. An experiment was performed by a Finish team to prove this. 4500 genes in human cells, cultured in laboratory, were exposed to cellphone radiation for only 48 hours. More than 20 genes were found to have had their activity rate interrupted. Now in your brain alone are billions of genes. Even if a small group of cells will be perturbed, the group isn't really small at all. It still contains several millions of genes. Are you prepared to lose these genes and consequently the cells then?

Cellphone radiation has yet another effect on your grey matter. It increases blood-vessel permeability in the brain. This permits molecules normally excluded from the brain to seep in. This same thing happens in rats' brains. It is now thought that this bleach in the blood-brain barrier may be accompanied by the death of brain cells. If however you like the fact that all types of I-don't-know-what molecules are pouring into your ‘defeated' brain, then only may you continue to use your cellphones regularly. But don't blame me; blame yourself.

And yes; you're right! A rat's brain is not like a man's one. In fact the energy absorbed by the rat is really low compared to what a person gets when using a cellphone! And what if the effects add up over time? Maybe your head will literally explode.

Neuroscientist, W. Ross Adey of Loma Linda University says, "You have to ask, ‘How much can people handle before it becomes a significant problem?'"

Cellphones may come in handy when you're in need. But remember that many of your cells are dying every time you pick up the cellphone. Hang up!

Source: www.fresharticles.ne

THE BENEFIT OF YOGA BY DELLA MENECHELLA

The benefit of yoga practice goes far beyond the actual time you spend in the poses. One of the most common reasons why people begin practicing yoga is to improve their health and well-being. Yoga means union. It is a union of the mind, body and breath, so all aspects of your life are impacted by your practice.

A major benefit of yoga is physical.

Yoga improves your flexibility. - The stretching that you engage in during every practice helps lengthen and stretch muscles, which helps reduce the risk of injuries.

It helps to improve your balance. - The majority of yoga practices include some type of balancing in the poses. A significant number of people, especially as they began to get older, start to have problems with balance, which can lead to major injuries due to falls. By having a greater sense of balance, you are able to move more easily and safely.

Yoga can help reduce pain. - Tense muscles often contribute to pain. Relaxing muscles helps to minimize muscle tension and the pain that is associated with it. Also, breathing deeply into muscles helps lessen pain by altering your perception of it.

It tones your muscles. - Yoga works all the muscles in your body. It helps strengthen and tone them and also builds endurance and stamina.

It helps to increase your level of energy. . - Carrying tension in your body takes an enormous toll on your energy reserves. By learning how to relax through your yoga practice, you benefit by enjoying higher levels of energy so you can more thoroughly enjoy your daily activities.

Yoga helps promote a sense of relaxation. - Most people breathe high in their chests. This not only does not allow them to get sufficient oxygen, it also triggers the stress response, which contributes to feelings of anxiety. Breathing deeply as practiced in yoga, helps relax your muscles and also brings much needed oxygen to your cells. The deep sense of relaxation also leads to better quality sleep.

Each yoga practice ends with some type of relaxation. Since your body and mind are one, by relaxing your body you also relax your mind. Many yoga experts believe that a relaxation pose is the most beneficial pose in any yoga practice.

Another benefit of yoga is mental.

Yoga clears your mind and helps you focus your attention. - During your practice, you are focusing your attention on your breath and turning inward. This concentration allows you to withdraw from the distractions in your environment. A significant benefit of yoga practice is that you can take this ability to focus your attention into every aspect of your life. You can be fully present with whatever you are doing instead of worrying about tomorrow or regretting yesterday. Not only will your actions be more productive, you can also enjoy them in a greater way.

Yoga helps reduces stress. - Deep breathing helps reduce the hormones that are released when you are feeling overwhelmed, overloaded, and frazzled. The internal focus that accompanies the poses helps create a relaxation response in your body.

Yoga can help release stuck emotions. - Often stuck emotions find their way into our bodies. Remember, your mind and body are one, and if you are suppressing any painful emotions, you will often experience that as pain in some part of your body. A benefit of yoga is that by breathing deeply into places in your body that hold tension, you can help release the emotions that may be buried there. You can then examine these emotions and let go of those that do not serve you.

Also, as you take your body past the limits of where it has been, you start to feel that you can move past other limitations in your life as well.

You gain a sense of peace and tranquility. - Most yoga practices include some time for meditation. Regular meditation helps your mind reach a state of inner calm. It helps you gain control over your thinking instead of being at the mercy of wayward thoughts.

As you can see, the benefit of yoga has far reaching effects in every area of your life. Maintain a regular yoga practice, and you will see for yourself, how yoga can benefit you too.


Source: www.fresharticles.ne

Wednesday, September 9, 2009

Regional UNGASS Workshop Promoted by UNAIDS – Chile

Between September 8th and 11th, UNAIDS gathers, in Chile, the civil society and the governments of Latin American and the Caribbean for the UNGASS Regional Workshop. The aim is to consolidate the process of elaboration of the countries reports to be sent up to March 31st, 2010. Besides that, the meeting will be an opportunity to analyze key indicators within the framework of the Declaration of Commitment on HIV in the UNGASS, based on the guidelines for 2010 report. Gestos, invited by UNAIDS, will be represented by the activist Jair Brandão, who will be presenting the experience of SC in monitoring UNGASS in Brazil and Gestos’s experience internationally.

Source: ungassforum.wordpress.com

Ipsilateral irradiation for well lateralized carcinomas of the oral cavity and oropharynx: results on tumor control and xerostomia

Background

In head and neck cancer, bilateral neck irradiation is the standard approach for many tumor locations and stages. Increasing knowledge on the pattern of nodal invasion leads to more precise targeting and normal tissue sparing. The aim of the present study was to evaluate the morbidity and tumor control for patients with well lateralized squamous cell carcinomas of the oral cavity and oropharynx treated with ipsilateral radiotherapy.

Methods

Twenty consecutive patients with lateralized carcinomas of the oral cavity and oropharynx were treated with a prospective management approach using ipsilateral irradiation between 2000 and 2007. This included 8 radical oropharyngeal and 12 postoperative oral cavity carcinomas, with Stage T1-T2, N0-N2b disease. The actuarial freedom from contralateral nodal recurrence was determined. Late xerostomia was evaluated using the European Organization for Research and Treatment of Cancer QLQ-H&N35 questionnaire and the National Cancer Institute Common Terminology Criteria for Adverse Events (CTCAE), version 3.

Results

At a median follow-up of 58 months, five-year overall survival and loco-regional control rates were 82.5% and 100%, respectively. No local or contralateral nodal recurrences were observed. Mean dose to the contralateral parotid gland was 4.72 Gy and to the contralateral submandibular gland was 15.30 Gy. Mean score for dry mouth was 28.1 on the 0-100 QLQ-H&N35 scale. According to CTCAE v3 scale, 87.5% of patients had grade 0-1 and 12.5% grade 2 subjective xerostomia. The unstimulated salivary flow was >0.2 ml/min in 81.2% of patients and 0.1-0.2 ml/min in 19%. None of the patients showed grade 3 xerostomia.

Conclusions

In selected patients with early and moderate stages, well lateralized oral and oropharyngeal carcinomas, ipsilateral irradiation treatment of the primary site and ipsilateral neck spares salivary gland function without compromising loco-regional control.


Source: www.biomedcentral.com

Tuesday, September 8, 2009

CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations. As long as most memory accesses are cached memory locations, the average latency of memory accesses will be closer to the cache latency than to the latency of main memory.

When the processor needs to read from or write to a location in main memory, it first checks whether a copy of that data is in the cache. If so, the processor immediately reads from or writes to the cache, which is much faster than reading from or writing to main memory.

The diagram on the right shows two memories. Each location in each memory has a datum (a cache line), which in different designs ranges in size from 8[1] to 512[2] bytes. The size of the cache line is usually larger than the size of the usual access requested by a CPU instruction, which ranges from 1 to 16 bytes. Each location in each memory also has an index, which is a unique number used to refer to that location. The index for a location in main memory is called an address. Each location in the cache has a tag that contains the index of the datum in main memory that has been cached. In a CPU's data cache these entries are called cache lines or cache blocks.

Details of operation

When the processor needs to read or write a location in main memory, it first checks whether that memory location is in the cache. This is accomplished by comparing the address of the memory location to all tags in the cache that might contain that address. If the processor finds that the memory location is in the cache, we say that a cache hit has occurred; otherwise, we speak of a cache miss. In the case of a cache hit, the processor immediately reads or writes the data in the cache line. The proportion of accesses that result in a cache hit is known as the hit rate, and is a measure of the effectiveness of the cache.

In the case of a cache miss, most caches allocate a new entry, which comprises the tag just missed and a copy of the data from memory. The reference can then be applied to the new entry just as in the case of a hit. Misses are comparatively slow because they require the data to be transferred from main memory. This transfer incurs a delay since main memory is much slower than cache memory, and also incurs the overhead for recording the new data in the cache before it is delivered to the processor.

In order to make room for the new entry on a cache miss, the cache generally has to evict one of the existing entries. The heuristic that it uses to choose the entry to evict is called the replacement policy. The fundamental problem with any replacement policy is that it must predict which existing cache entry is least likely to be used in the future. Predicting the future is difficult, especially for hardware caches that use simple rules amenable to implementation in circuitry, so there are a variety of replacement policies to choose from and no perfect way to decide among them. One popular replacement policy, LRU, replaces the least recently used entry.

When data is written to the cache, it must at some point be written to main memory as well. The timing of this write is controlled by what is known as the write policy. In a write-through cache, every write to the cache causes a write to main memory. Alternatively, in a write-back or copy-back cache, writes are not immediately mirrored to memory. Instead, the cache tracks which locations have been written over (these locations are marked dirty). The data in these locations are written back to main memory when that data is evicted from the cache. For this reason, a miss in a write-back cache will often require two memory accesses to service: one to first write the dirty location to memory and then another to read the new location from memory.

There are intermediate policies as well. The cache may be write-through, but the writes may be held in a store data queue temporarily, usually so that multiple stores can be processed together (which can reduce bus turnarounds and so improve bus utilization).

The data in main memory being cached may be changed by other entities, in which case the copy in the cache may become out-of-date or stale. Alternatively, when the CPU updates the data in the cache, copies of data in other caches will become stale. Communication protocols between the cache managers which keep the data consistent are known as cache coherence protocols.

The time taken to fetch a datum from memory (read latency) matters because a CPU will often run out of things to do while waiting for the datum. When a CPU reaches this state, it is called a stall. As CPUs become faster, stalls due to cache misses displace more potential computation; modern CPUs can execute hundreds of instructions in the time taken to fetch a single datum from memory. Various techniques have been employed to keep the CPU busy during this time. Out-of-order CPUs (Pentium Pro and later Intel designs, for example) attempt to execute independent instructions after the instruction that is waiting for the cache miss data. Another technology, used by many processors, is simultaneous multithreading (SMT), or in Intel's terminology hyper-threading (HT), which allows an alternate thread to use the CPU core while a first thread waits for data to come from main memory.


Most modern desktop and server CPUs have at least three independent caches: an instruction cache to speed up executable instruction fetch, a data cache to speed up data fetch and store, and a translation lookaside buffer used to speed up virtual-to-physical address translation for both executable instructions and data.

Structure

The data blocks contain the actual data fetched from the main memory. The memory address is split (MSB to LSB) into the tag, the index and the displacement (offset), while the valid bit denotes that this particular entry has valid data. The index length is log2(cache_rows) bits and describes which row the data has been put in. The displacement length is log2(data_blocks) and specifies which block of the ones we have stored we need. The tag length is addressindexdisplacement and contains the most significant bits of the address, which are checked against the current row (the row has been retrieved by index) to see if it is the one we need or another, irrelevant memory location that happened to have the same index bits as the one we want.

Associativity

The replacement policy decides where in the cache a copy of a particular entry of main memory will go. If the replacement policy is free to choose any entry in the cache to hold the copy, the cache is called fully associative. At the other extreme, if each entry in main memory can go in just one place in the cache, the cache is direct mapped. Many caches implement a compromise in which each entry in main memory can go to any one of N places in the cache, and are described as N-way set associative. For example, the level-1 data cache in an AMD Athlon is 2-way set associative, which means that any particular location in main memory can be cached in either of 2 locations in the level-1 data cache.

Associativity is a trade-off. If there are ten places the replacement policy can put a new cache entry, then when the cache is checked for a hit, all ten places must be searched. Checking more places takes more power, chip area, and potentially time. On the other hand, caches with more associativity suffer fewer misses (see conflict misses, below), so that the CPU spends less time servicing those misses. The rule of thumb is that doubling the associativity, from direct mapped to 2-way, or from 2-way to 4-way, has about the same effect on hit rate as doubling the cache size. Associativity increases beyond 4-way have much less effect on the hit rate, and are generally done for other reasons (see virtual aliasing, below).

In order of increasing (worse) hit times and decreasing (better) miss rates,

  • direct mapped cache—the best (fastest) hit times, and so the best tradeoff for "large" caches
  • 2-way set associative cache
  • 2-way skewed associative cache -- "the best tradeoff for .... caches whose sizes are in the range 4K-8K bytes" -- André Seznec[3]
  • 4-way set associative cache
  • fully associative cache -- the best (lowest) miss rates, and so the best tradeoff when the miss penalty is very high

If each location in main memory can be cached in either of two locations in the cache, one logical question is: which two? The simplest and most commonly used scheme, shown in the right-hand diagram above, is to use the least significant bits of the memory location's index as the index for the cache memory, and to have two entries for each index. One good property of this scheme is that the tags stored in the cache do not have to include that part of the main memory address which is implied by the cache memory's index. Since the cache tags are fewer bits, they take less area [on the microprocessor chip] and can be read and compared faster.

One of the advantages of a direct mapped cache is that it allows simple and fast speculation. Once the address has been computed, the one cache index which might have a copy of that datum is known. That cache entry can be read, and the processor can continue to work with that data before it finishes checking that the tag actually matches the requested address.

The idea of having the processor use the cached data before the tag match completes can be applied to associative caches as well. A subset of the tag, called a hint, can be used to pick just one of the possible cache entries mapping to the requested address. This datum can then be used in parallel with checking the full tag. The hint technique works best when used in the context of address translation, as explained below.

Other schemes have been suggested, such as the skewed cache[3], where the index for way 0 is direct, as above, but the index for way 1 is formed with a hash function. A good hash function has the property that addresses which conflict with the direct mapping tend not to conflict when mapped with the hash function, and so it is less likely that a program will suffer from an unexpectedly large number of conflict misses due to a pathological access pattern. The downside is extra latency from computing the hash function[4]. Additionally, when it comes time to load a new line and evict an old line, it may be difficult to determine which existing line was least recently used, because the new line conflicts with data at different indexes in each way; LRU tracking for non-skewed caches is usually done on a per-set basis. Nevertheless, skewed-associative caches have major advantages over conventional set-associative ones.[5]

Pseudo-associative cache

A true set-associative cache tests all the possible ways simultaneously, using something like a content addressable memory. A pseudo-associative cache tests each possible way one at a time. A hash-rehash cache is one kind of pseudo-associative cache.

In the common case of finding a hit in the first way tested, a pseudo-associative cache is as fast as a direct-mapped cache. But it has a much lower conflict miss rate than a direct-mapped cache, closer to the miss rate of a fully associative cache. [4]

Cache misses

A cache miss refers to a failed attempt to read or write a piece of data in the cache, which results in a main memory access with much longer latency. There are three kinds of cache misses: instruction read miss, data read miss, and data write miss.

A cache read miss from an instruction cache generally causes the most delay, because the processor, or at least the thread of execution, has to wait (stall) until the instruction is fetched from main memory.

A cache read miss from a data cache usually causes less delay, because instructions not dependent on the cache read can be issued and continue execution until the data is returned from main memory, and the dependent instructions can resume execution.

A cache write miss to a data cache generally causes the least delay, because the write can be queued and there are few limitations on the execution of subsequent instructions. The processor can continue until the queue is full.

In order to lower cache miss rate, a great deal of analysis has been done on cache behavior in an attempt to find the best combination of size, associativity, block size, and so on. Sequences of memory references performed by benchmark programs are saved as address traces. Subsequent analyses simulate many different possible cache designs on these long address traces. Making sense of how the many variables affect the cache hit rate can be quite confusing. One significant contribution to this analysis was made by Mark Hill, who separated misses into three categories (known as the Three Cs):

  • Compulsory misses are those misses caused by the first reference to a datum. Cache size and associativity make no difference to the number of compulsory misses. Prefetching can help here, as can larger cache block sizes (which are a form of prefetching). Compulsory misses are sometimes referred to as Cold Misses.
  • Capacity misses are those misses that occur regardless of associativity or block size, solely due to the finite size of the cache. The curve of capacity miss rate versus cache size gives some measure of the temporal locality of a particular reference stream. Note that there is no useful notion of a cache being "full" or "empty" or "near capacity": CPU caches almost always have nearly every line filled with a copy of some line in main memory, and nearly every allocation of a new line requires the eviction of an old line.
  • Conflict misses are those misses that could have been avoided, had the cache not evicted an entry earlier. Conflict misses can be further broken down into mapping misses, that are unavoidable given a particular amount of associativity, and replacement misses, which are due to the particular victim choice of the replacement policy.

    The graph to the right summarizes the cache performance seen on the Integer portion of the SPEC CPU2000 benchmarks, as collected by Hill and Cantin [2]. These benchmarks are intended to represent the kind of workload that an engineering workstation computer might see on any given day. The reader should keep in mind that finding benchmarks which are even usefully representative of many programs has been very difficult, and there will always be important programs with very different behavior than what is shown here.

    We can see the different effects of the three Cs in this graph.

    At the far right, with cache size labelled "Inf", we have the compulsory misses. If we wish to improve a machine's performance on SpecInt2000, increasing the cache size beyond 1 MiB is essentially futile. That's the insight given by the compulsory misses.

    The fully-associative cache miss rate here is almost representative of the capacity miss rate. The difference is that the data presented is from simulations assuming an LRU replacement policy. Showing the capacity miss rate would require a perfect replacement policy, i.e. an oracle that looks into the future to find a cache entry which is actually not going to be hit.

    Note that our approximation of the capacity miss rate falls steeply between 32 KiB and 64 KiB. This indicates that the benchmark has a working set of roughly 64 KiB. A CPU cache designer examining this benchmark will have a strong incentive to set the cache size to 64 KiB rather than 32 KiB. Note that, on this benchmark, no amount of associativity can make a 32 KiB cache perform as well as a 64 KiB 4-way, or even a direct-mapped 128 KiB cache.

    Finally, note that between 64 KiB and 1 MiB there is a large difference between direct-mapped and fully-associative caches. This difference is the conflict miss rate. The insight from looking at conflict miss rates is that secondary caches benefit a great deal from high associativity.

    This benefit was well known in the late 80s and early 90s, when CPU designers could not fit large caches on-chip, and could not get sufficient bandwidth to either the cache data memory or cache tag memory to implement high associativity in off-chip caches. Desperate hacks were attempted: the MIPS R8000 used expensive off-chip dedicated tag SRAMs, which had embedded tag comparators and large drivers on the match lines, in order to implement a 4 MiB 4-way associative cache. The MIPS R10000 used ordinary SRAM chips for the tags. Tag access for both ways took two cycles. To reduce latency, the R10000 would guess which way of the cache would hit on each access.

Address translation

Most general purpose CPUs implement some form of virtual memory. To summarize, each program running on the machine sees its own simplified address space, which contains code and data for that program only. Each program uses this virtual address space without regard for where it exists in physical memory.

Virtual memory requires the processor to translate virtual addresses generated by the program into physical addresses in main memory. The portion of the processor that does this translation is known as the memory management unit (MMU). The fast path through the MMU can perform those translations stored in the Translation Lookaside Buffer (TLB), which is a cache of mappings from the operating system's page table.

For the purposes of the present discussion, there are three important features of address translation:

  • Latency: The physical address is available from the MMU some time, perhaps a few cycles, after the virtual address is available from the address generator.
  • Aliasing: Multiple virtual addresses can map to a single physical address. Most processors guarantee that all updates to that single physical address will happen in program order. To deliver on that guarantee, the processor must ensure that only one copy of a physical address resides in the cache at any given time.
  • Granularity: The virtual address space is broken up into pages. For instance, a 4 GiB virtual address space might be cut up into 1048576 4 KiB pages, each of which can be independently mapped. There may be multiple page sizes supported; see virtual memory for elaboration.

A historical note: the first virtual memory systems were very slow, because they required an access to the page table (held in main memory) before every programmed access to main memory. With no caches, this effectively cut the speed of the machine in half. The first hardware cache used in a computer system was not actually a data or instruction cache, but rather a TLB.

Caches can be divided into 4 types, based on whether the index or tag correspond to physical or virtual addresses:

  • Physically indexed, physically tagged (PIPT) caches use the physical address for both the index and the tag. While this is simple and avoids problems with aliasing, it is also slow, as the physical address must be looked up (which could involve a TLB miss and access to main memory) before that address can be looked up in the cache.
  • Virtually indexed, virtually tagged (VIVT) caches use the virtual address for both the index and the tag. This can result in much faster lookups as the MMU doesn't need to be consulted first. However, VIVT has the problem of aliasing, which is that several virtual addresses may refer to the same physical address. The result is that such addresses would be cached separately, even though they refer to the same memory, which can cause coherency problems. Another problem is that V->P mappings can change, which would require flushing cache lines, as the VAs would no longer be valid.
  • Virtually indexed, physically tagged (VIPT) caches use the virtual address for the index and the physical address in the tag. The advantage over PIPT is lower latency, as the cache line can be looked up in parallel with the TLB translation, however the tag can't be compared until the physical address is available. The advantage over VIVT is that since the tag has the physical address, the cache can detect aliasing. VIPT requires more tag bits, as the index bits no longer represent the same address.
  • Physically indexed, virtually tagged caches are only theoretical as they would basically be useless. [6]


The speed of this recurrence (the load latency) is crucial to CPU performance, and so most modern level-1 caches are virtually indexed, which at least allows the MMU's TLB lookup to proceed in parallel with fetching the data from the cache RAM.

But virtual indexing is not the best choice for all cache levels. The cost of dealing with virtual aliases grows with cache size, and as a result most level-2 and larger caches are physically indexed.

Caches have historically used both virtual and physical addresses for the cache tags, although virtual tagging is now uncommon. If the TLB lookup can finish before the cache RAM lookup, then the physical address is available in time for tag compare, and there is no need for virtual tagging. Large caches, then, tend to be physically tagged, and only small, very low latency caches are virtually tagged. In recent general-purpose CPUs, virtual tagging has been superseded by vhints, as described below.

Virtual indexing and virtual aliases

The usual way the processor guarantees that virtually aliased addresses act as a single storage location is to arrange that only one virtual alias can be in the cache at any given time.

Whenever a new entry is added to a virtually-indexed cache, the processor searches for any virtual aliases already resident and evicts them first. This special handling happens only during a cache miss. No special work is necessary during a cache hit, which helps keep the fast path fast.

The most straightforward way to find aliases is to arrange for them all to map to the same location in the cache. This happens, for instance, if the TLB has e.g. 4 KiB pages, and the cache is direct mapped and 4 KiB or less.

Modern level-1 caches are much larger than 4 KiB, but virtual memory pages have stayed that size. If the cache is e.g. 16 KiB and virtually indexed, for any virtual address there are four cache locations that could hold the same physical location, but aliased to different virtual addresses. If the cache misses, all four locations must be probed to see if their corresponding physical addresses match the physical address of the access that generated the miss.

These probes are the same checks that a set associative cache uses to select a particular match. So if a 16 KiB virtually indexed cache is 4-way set associative and used with 4 KiB virtual memory pages, no special work is necessary to evict virtual aliases during cache misses because the checks have already happened while checking for a cache hit.

Using the AMD Athlon as an example again, it has a 64 KiB level-1 data cache, 4 KiB pages, and 2-way set associativity. When the level-1 data cache suffers a miss, 2 of the 16 (==64 KiB/4 KiB) possible virtual aliases have already been checked, and seven more cycles through the tag check hardware are necessary to complete the check for virtual aliases.

Homonym and synonym problems

The cache that relies on the virtual indexing and tagging becomes inconsistent after the same virtual address is mapped into different physical address (homonym). This can be solved by using physical address for tagging or by storing the address space id in the cache line. However these two approaches does not help against the synonym problem, when several cache lines end up storing data for the same physical address. Writing to such location may update only one location in the cache, leaving others with inconsistent data. Problem might be solved by using non overlapping memory layouts for different address spaces or otherwise the cache (or part of it) must be flushed when the mapping changes [3]

Virtual tags and vhints

Virtual tagging is possible too. The great advantage of virtual tags is that, for associative caches, they allow the tag match to proceed before the virtual to physical translation is done. However,

  • Coherence probes and evictions present a physical address for action. The hardware must have some means of converting the physical addresses into a cache index, generally by storing physical tags as well as virtual tags. For comparison, a physically tagged cache does not need to keep virtual tags, which is simpler.
  • When a virtual to physical mapping is deleted from the TLB, cache entries with those virtual addresses will have to be flushed somehow. Alternatively, if cache entries are allowed on pages not mapped by the TLB, then those entries will have to be flushed when the access rights on those pages are changed in the page table.

It is also possible for the operating system to ensure that no virtual aliases are simultaneously resident in the cache. The operating system makes this guarantee by enforcing page coloring, which is described below. Some early RISC processors (SPARC, RS/6000) took this approach. It has not been used recently, as the hardware cost of detecting and evicting virtual aliases has fallen and the software complexity and performance penalty of perfect page coloring has risen.

It can be useful to distinguish the two functions of tags in an associative cache: they are used to determine which way of the entry set to select, and they are used to determine if the cache hit or missed. The second function must always be correct, but it is permissible for the first function to guess, and get the wrong answer occasionally.

Some processors (e.g. early SPARCs) have caches with both virtual and physical tags. The virtual tags are used for way selection, and the physical tags are used for determining hit or miss. This kind of cache enjoys the latency advantage of a virtually tagged cache, and the simple software interface of a physically tagged cache. It bears the added cost of duplicated tags, however. Also, during miss processing, the alternate ways of the cache line indexed have to be probed for virtual aliases and any matches evicted.

The extra area (and some latency) can be mitigated by keeping virtual hints with each cache entry instead of virtual tags. These hints are a subset or hash of the virtual tag, and are used for selecting the way of the cache from which to get data and a physical tag. Like a virtually tagged cache, there may be a virtual hint match but physical tag mismatch, in which case the cache entry with the matching hint must be evicted so that cache accesses after the cache fill at this address will have just one hint match. Since virtual hints have fewer bits than virtual tags distinguishing them from one another, a virtually hinted cache suffers more conflict misses than a virtually tagged cache.

Perhaps the ultimate reduction of virtual hints can be found in the Pentium 4 (Willamette and Northwood cores). In these processors the virtual hint is effectively 2 bits, and the cache is 4-way set associative. Effectively, the hardware maintains a simple permutation from virtual address to cache index, so that no content-addressable memory (CAM) is necessary to select the right one of the four ways fetched.

Page coloring

Large physically indexed caches (usually secondary caches) run into a problem: the operating system rather than the application controls which pages collide with one another in the cache. Differences in page allocation from one program run to the next lead to differences in the cache collision patterns, which can lead to very large differences in program performance. These differences can make it very difficult to get a consistent and repeatable timing for a benchmark run, which then leads to frustrated sales engineers demanding that the operating system authors fix the problem.

To understand the problem, consider a CPU with a 1 MiB physically indexed direct-mapped level-2 cache and 4 KiB virtual memory pages. Sequential physical pages map to sequential locations in the cache until after 256 pages the pattern wraps around. We can label each physical page with a color of 0–255 to denote where in the cache it can go. Locations within physical pages with different colors cannot conflict in the cache.

A programmer attempting to make maximum use of the cache may arrange his program's access patterns so that only 1 MiB of data need be cached at any given time, thus avoiding capacity misses. But he should also ensure that the access patterns do not have conflict misses. One way to think about this problem is to divide up the virtual pages the program uses and assign them virtual colors in the same way as physical colors were assigned to physical pages before. The programmer can then arrange the access patterns of his code so that no two pages with the same virtual color are in use at the same time. There is a wide literature on such optimizations (e.g. loop nest optimization), largely coming from the High Performance Computing (HPC) community.

The snag is that while all the pages in use at any given moment may have different virtual colors, some may have the same physical colors. In fact, if the operating system assigns physical pages to virtual pages randomly and uniformly, it is extremely likely that some pages will have the same physical color, and then locations from those pages will collide in the cache (this is the birthday paradox).

The solution is to have the operating system attempt to assign different physical color pages to different virtual colors, a technique called page coloring. Although the actual mapping from virtual to physical color is irrelevant to system performance, odd mappings are difficult to keep track of and have little benefit, so most approaches to page coloring simply try to keep physical and virtual page colors the same.

If the operating system can guarantee that each physical page maps to only one virtual color, then there are no virtual aliases, and the processor can use virtually indexed caches with no need for extra virtual alias probes during miss handling. Alternatively, the O/S can flush a page from the cache whenever it changes from one virtual color to another. As mentioned above, this approach was used for some early SPARC and RS/6000 designs.

Specialized caches

Pipelined CPUs access memory from multiple points in the pipeline: instruction fetch, virtual-to-physical address translation, and data fetch (see classic RISC pipeline). The natural design is to use different physical caches for each of these points, so that no one physical resource has to be scheduled to service two points in the pipeline. Thus the pipeline naturally ends up with at least three separate caches (instruction, TLB, and data), each specialized to its particular role.

Pipelines with separate instruction and data caches, now predominant, are said to have a Harvard architecture. Originally, this phrase referred to machines with separate instruction and data memories, which proved not at all popular. Most modern CPUs have a single-memory von Neumann architecture.

Victim cache

A victim cache is a cache used to hold blocks evicted from a CPU cache upon replacement. The victim cache lies between the main cache and its refill path, and only holds blocks that were evicted from the main cache. The victim cache is usually fully associative, and is intended to reduce the number of conflict misses. Many commonly used programs do not require an associative mapping for all the accesses. In fact, only a small fraction of the memory accesses of the program require high associativity. The victim cache exploits this property by providing high associativity to only these accesses. It was introduced by Norman Jouppi in 1990.

Trace cache

One of the more extreme examples of cache specialization is the trace cache found in the Intel Pentium 4 microprocessors. A trace cache is a mechanism for increasing the instruction fetch bandwidth and decreasing power consumption (in the case of the Pentium 4) by storing traces of instructions that have already been fetched and decoded.

The earliest widely acknowledged academic publication of trace cache was by Eric Rotenberg, Steve Bennett, and James E. Smith in their 1996 paper "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching."

An earlier publication is US Patent 5,381,533, "Dynamic flow instruction cache memory organized around trace segments independent of virtual address line", by Alex Peleg and Uri Weiser of Intel Corp., patent filed March 30, 1994, a continuation of an application filed in 1992, later abandoned.

A trace cache stores instructions either after they have been decoded, or as they are retired. Generally, instructions are added to trace caches in groups representing either individual basic blocks or dynamic instruction traces. A basic block consists of a group of non-branch instructions ending with a branch. A dynamic trace ("trace path") contains only instructions whose results are actually used, and eliminates instructions following taken branches (since they are not executed); a dynamic trace can be a concatenation of multiple basic blocks. This allows the instruction fetch unit of a processor to fetch several basic blocks, without having to worry about branches in the execution flow.

Trace lines are stored in the trace cache based on the program counter of the first instruction in the trace and a set of branch predictions. This allows for storing different trace paths that start on the same address, each representing different branch outcomes. In the instruction fetch stage of a pipeline, the current program counter along with a set of branch predictions is checked in the trace cache for a hit. If there is a hit, a trace line is supplied to fetch which does not have to go to a regular cache or to memory for these instructions. The trace cache continues to feed the fetch unit until the trace line ends or until there is a misprediction in the pipeline. If there is a miss, a new trace starts to be built.

Trace caches are also used in processors like the Intel Pentium 4 to store already decoded micro-operations, or translations of complex x86 instructions, so that the next time an instruction is needed, it does not have to be decoded again.

Multi-level caches

Another issue is the fundamental tradeoff between cache latency and hit rate. Larger caches have better hit rates but longer latency. To address this tradeoff, many computers use multiple levels of cache, with small fast caches backed up by larger slower caches.

Multi-level caches generally operate by checking the smallest Level 1 (L1) cache first; if it hits, the processor proceeds at high speed. If the smaller cache misses, the next larger cache (L2) is checked, and so on, before external memory is checked.

As the latency difference between main memory and the fastest cache has become larger, some processors have begun to utilize as many as three levels of on-chip cache. For example, the Alpha 21164 (1995) had a 96 KB on-die L3 cache; the IBM POWER4 (2001) had a 256 MB L3 cache off-chip, shared among several processors; the Itanium 2 (2003) had a 6 MB unified level 3 (L3) cache on-die; Intel's Xeon MP product code-named "Tulsa" (2006) features 16 MB of on-die L3 cache shared between two processor cores; the AMD Phenom II (2008) has up to 6 MB on-die unified L3 cache; and the Intel Core i7 (2008) has an 8 MB on-die unified L3 cache that is inclusive, shared by all cores. The benefits of an L3 cache depend on the application's access patterns.

Finally, at the other end of the memory hierarchy, the CPU register file itself can be considered the smallest, fastest cache in the system, with the special characteristic that it is scheduled in software—typically by a compiler, as it allocates registers to hold values retrieved from main memory. (See especially loop nest optimization.) Register files sometimes also have hierarchy: The Cray-1 (circa 1976) had 8 address "A" and 8 scalar data "S" registers that were generally usable. There was also a set of 64 address "B" and 64 scalar data "T" registers that took longer to access, but were faster than main memory. The "B" and "T" registers were provided because the Cray-1 did not have a data cache. (The Cray-1 did, however, have an instruction cache.)

Exclusive versus inclusive

Multi-level caches introduce new design decisions. For instance, in some processors, all data in the L1 cache must also be somewhere in the L2 cache. These caches are called strictly inclusive. Other processors (like the AMD Athlon) have exclusive caches — data is guaranteed to be in at most one of the L1 and L2 caches, never in both. Still other processors (like the Intel Pentium II, III, and 4), do not require that data in the L1 cache also reside in the L2 cache, although it may often do so. There is no universally accepted name for this intermediate policy, although the term mainly inclusive has been used.[citation needed]

The advantage of exclusive caches is that they store more data. This advantage is larger when the exclusive L1 cache is comparable to the L2 cache, and diminishes if the L2 cache is many times larger than the L1 cache. When the L1 misses and the L2 hits on an access, the hitting cache line in the L2 is exchanged with a line in the L1. This exchange is quite a bit more work than just copying a line from L2 to L1, which is what an inclusive cache does.

One advantage of strictly inclusive caches is that when external devices or other processors in a multiprocessor system wish to remove a cache line from the processor, they need only have the processor check the L2 cache. In cache hierarchies which do not enforce inclusion, the L1 cache must be checked as well. As a drawback, there is a correlation between the associativities of L1 and L2 caches: if the L2 cache does not have at least as many ways as all L1 caches together, the effective associativity of the L1 caches is restricted.

Another advantage of inclusive caches is that the larger cache can use larger cache lines, which reduces the size of the secondary cache tags. (Exclusive caches require both caches to have the same size cache lines, so that cache lines can be swapped on a L1 miss, L2 hit). If the secondary cache is an order of magnitude larger than the primary, and the cache data is an order of magnitude larger than the cache tags, this tag area saved can be comparable to the incremental area needed to store the L1 cache data in the L2.

Example: the K8

To illustrate both specialization and multi-level caching, here is the cache hierarchy of the K8 core in the AMD Athlon 64 CPU.[7]

The K8 has 4 specialized caches: an instruction cache, an instruction TLB, a data TLB, and a data cache. Each of these caches is specialized:

  • The instruction cache keeps copies of 64 byte lines of memory, and fetches 16 bytes each cycle. Each byte in this cache is stored in ten bits rather than 8, with the extra bits marking the boundaries of instructions (this is an example of predecoding). The cache has only parity protection rather than ECC, because parity is smaller and any damaged data can be replaced by fresh data fetched from memory (which always has an up-to-date copy of instructions).
  • The instruction TLB keeps copies of page table entries (PTEs). Each cycle's instruction fetch has its virtual address translated through this TLB into a physical address. Each entry is either 4 or 8 bytes in memory. Each of the TLBs is split into two sections, one to keep PTEs that map 4 KiB, and one to keep PTEs that map 4 MiB or 2 MiB. The split allows the fully associative match circuitry in each section to be simpler. The operating system maps different sections of the virtual address space with different size PTEs.
  • The data TLB has two copies which keep identical entries. The two copies allow two data accesses per cycle to translate virtual addresses to physical addresses. Like the instruction TLB, this TLB is split into two kinds of entries.
  • The data cache keeps copies of 64 byte lines of memory. It is split into 8 banks (each storing 8 KiB of data), and can fetch two 8-byte data each cycle so long as those data are in different banks. There are two copies of the tags, because each 64 byte line is spread among all 8 banks. Each tag copy handles one of the two accesses per cycle.

The K8 also has multiple-level caches. There are second-level instruction and data TLBs, which store only PTEs mapping 4 KiB. Both instruction and data caches, and the various TLBs, can fill from the large unified L2 cache. This cache is exclusive to both the L1 instruction and data caches, which means that any 8-byte line can only be in one of the L1 instruction cache, the L1 data cache, or the L2 cache. It is, however, possible for a line in the data cache to have a PTE which is also in one of the TLBs—the operating system is responsible for keeping the TLBs coherent by flushing portions of them when the page tables in memory are updated.

The K8 also caches information that is never stored in memory—prediction information. These caches are not shown in the above diagram. As is usual for this class of CPU, the K8 has fairly complex branch prediction, with tables that help predict whether branches are taken and other tables which predict the targets of branches and jumps. Some of this information is associated with instructions, in both the level 1 instruction cache and the unified secondary cache.

The K8 uses an interesting trick to store prediction information with instructions in the secondary cache. Lines in the secondary cache are protected from accidental data corruption (e.g. by an alpha particle strike) by either ECC or parity, depending on whether those lines were evicted from the data or instruction primary caches. Since the parity code takes fewer bits than the ECC code, lines from the instruction cache have a few spare bits. These bits are used to cache branch prediction information associated with those instructions. The net result is that the branch predictor has a larger effective history table, and so has better accuracy.

More hierarchies

Other processors have other kinds of predictors (e.g. the store-to-load bypass predictor in the DEC Alpha 21264), and various specialized predictors are likely to flourish in future processors.

These predictors are caches in that they store information that is costly to compute. Some of the terminology used when discussing predictors is the same as that for caches (one speaks of a hit in a branch predictor), but predictors are not generally thought of as part of the cache hierarchy.

The K8 keeps the instruction and data caches coherent in hardware, which means that a store into an instruction closely following the store instruction will change that following instruction. Other processors, like those in the Alpha and MIPS family, have relied on software to keep the instruction cache coherent. Stores are not guaranteed to show up in the instruction stream until a program calls an operating system facility to ensure coherency. The idea is to save hardware complexity on the assumption that self-modifying code is rare.

Implementation

Cache reads are the most common CPU operation that takes more than a single cycle. Program execution time tends to be very sensitive to the latency of a level-1 data cache hit. A great deal of design effort, and often power and silicon area are expended making the caches as fast as possible.

The simplest cache is a virtually indexed direct-mapped cache. The virtual address is calculated with an adder, the relevant portion of the address extracted and used to index an SRAM, which returns the loaded data. The data is byte aligned in a byte shifter, and from there is bypassed to the next operation. There is no need for any tag checking in the inner loop — in fact, the tags need not even be read. Later in the pipeline, but before the load instruction is retired, the tag for the loaded data must be read, and checked against the virtual address to make sure there was a cache hit. On a miss, the cache is updated with the requested cache line and the pipeline is restarted.

An associative cache is more complicated, because some form of tag must be read to determine which entry of the cache to select. An N-way set-associative level-1 cache usually reads all N possible tags and N data in parallel, and then chooses the data associated with the matching tag. Level-2 caches sometimes save power by reading the tags first, so that only one data element is read from the data SRAM.

he diagram to the right is intended to clarify the manner in which the various fields of the address are used. Address bit 31 is most significant, bit 0 is least significant. The diagram shows the SRAMs, indexing, and multiplexing for a 4 KiB, 2-way set-associative, virtually indexed and virtually tagged cache with 64 B lines, a 32b read width and 32b virtual address.

Because the cache is 4 KiB and has 64 B lines, there are just 64 lines in the cache, and we read two at a time from a Tag SRAM which has 32 rows, each with a pair of 21 bit tags. Although any function of virtual address bits 31 through 6 could be used to index the tag and data SRAMs, it is simplest to use the least significant bits.

Similarly, because the cache is 4 KiB and has a 4 B read path, and reads two ways for each access, the Data SRAM is 512 rows by 8 bytes wide.

A more modern cache might be 16 KiB, 4-way set-associative, virtually indexed, virtually hinted, and physically tagged, with 32 B lines, 32b read width and 36b physical addresses. The read path recurrence for such a cache looks very similar to the path above. Instead of tags, vhints are read, and matched against a subset of the virtual address. Later on in the pipeline, the virtual address is translated into a physical address by the TLB, and the physical tag is read (just one, as the vhint supplies which way of the cache to read). Finally the physical address is compared to the physical tag to determine if a hit has occurred.

Some SPARC designs have improved the speed of their L1 caches by a few gate delays by collapsing the virtual address adder into the SRAM decoders. See Sum addressed decoder.


Source: en.wikipedia.org