Classical papers/books/articles.
....After trying to convince you that I cannot be trusted, I wish to moralize. I would like to criticize the press in its handling of the "hackers," the 414 gang, the Dalton gang, etc. The acts performed by these kids are vandalism at best and probably trespass and theft at worst. It is only the inadequacy of the criminal code that saves the hackers from very serious prosecution. The companies that are vulnerable to this activity (and most large companies are very vulnerable) are pressing hard to update the criminal code. Unauthorized access to computer systems is already a serious crime in a few states and is currently being addressed in many more state legislatures as well as Congress.
There is an explosive situation brewing. On the one hand, the press, television, and movies make heroes of vandals by calling them whiz kids. On the other hand, the acts performed by these kids will soon be punishable by years in prison.
I have watched kids testifying before Congress. It is clear that they are completely unaware of the seriousness of their acts. There is obviously a cultural gap. The act of breaking into a computer system has to have the same social stigma as breaking into a neighbor's house. It should not matter that the neighbor's door is unlocked. The press must learn that misguided use of a computer is no more amazing than drunk driving of an automobile.
The only such paper I could find was The Relative Importance of Memory Latency, Bandwidth, and Branch Limits to Performance (1997) Norman P. Jouppi and Parthasarathy Ranganthan. But this is a paper with numbers got from a simulator and their memory model is really bad. I dont think this is that good a paper. I loved their color 3-D bar charts :-)
Other papers/articles/interviews
The first time a fragment of Java code is executed, the JIT compiler transparently converts the Java byte codes to highly optimized RISC primitives, then parallelizes them, so multiple RISC primitives can be executed in one machine cycle. The VLIW code is saved in a portion of main memory not visible to the Java architecture. Subsequent executions of the same fragment do not require translation (unless cast out). We describe fast compiler algorithms for dynamic translation of Java byte codes to VLIW code. These algorithms parallelize across multiple paths and loop iteration boundaries. In addition, they map the Java stack and local variables to real registers, thereby eliminating the pushes and pops between local variables and the stack by appropriate register allocation.