Important Dates
- Proposal due: Friday 3/28/03
- Revised proposal due: Wednesday 4/9/28/03
- Note that this is a fairly late date in the semester. This is because
of Spring break and my travel at a rather inconvenient time. So, you had
better plan on being a good way into your project already at this point
and using my feedback to tune things rather than start after you turn
this in.
- Final writeup due: Friday 5/9/0
Parameters
- You can form groups of up to 2-3 people. The more people, the more significant
the project should be.
- Your project should be experimental in nature. It should explore some nature
of architecture, or architecture as related to OS, or specialized architecture
for embedded systems, say.
- I will try to post some thoughts on potential projects over the next two
weeks. However, you should not depend on this; rather, you should be trying
to come up with ideas concurrently and use my ideas to tune yours.
- Some places to browse to get ideas:
Some Ideas
- We have 2 IBM secure co-processors that can be used to implement various
security-related functions on PCs such as verifying that the OS version is
appropriate before allowing it to boot. One interesting thing to do with these
co-processors is to try and port the
LGI controller to it and see whether performance would be acceptable.
- Recently, the developers of Jalepenos, a research open source Java compiler
and VM, added code to monitor performance counters on the PowerPC to give
detail performance information per Java thread. It would be interesting to
extend this implementation to read the performance counters on the Pentium
processors instead. Then, run some Java programs to see what you can learn
about them vs. the architecture of current PCs using this information.
- Locally, we have a fault-injection engine called Mendosus and a methodology
for studying performance and availability. Consider RAID systems. Often, peope
use RAID 5, which can only tolerate a single faulty disk at a time. This makes
RAID 5 vulnerable to human errors, such as pulling the wrong disk when trying
to replace the disk, or latent errors, where the faulty disk is left in place
until a second disk go bad. What is the expense of designing a RAID system
that would tolerate 2 concurrent failures vs. the expense of trying to avoid
operator errors and latent errors?
- At a recent HPCA conference, there was a paper from IBM describing a machine
where the content of main memory is compressed. This content is uncompressed
when it is brought into L3. Use analytical analysis or simulation to compare
this approach to that of GMS, a project at UW where an infrastructure was
built such that nodes in a cluster can use each other's memories.
- Talk to Uli and/or Ricardo for small projects involving low power computing.
- Develop a model for studying effect on power if you turn off memory banks
and disks when they are unused. There have been multiple studies of this so,
in effect, you would be repeating something that someone else has done. However,
for this study, I'm thinking that you might try to build a queueing model
since you cannot do actual experiments. It would be interesting to see how
close your model would come to measured experimental results.
- Tandem systems are used for critical highly available applications. The
approach they take is to provide hardware redundancy for just about everything
and extensive error checking. Would it be possible to emulate this approach
in a cluster using a high-performance SAN and the multiple components in the
cluster for redundancy?
- Build a unifying error detection and reporting system for a cluster. What
is already implemented? What do you need to do in addition to the various
monitoring systems that already exist.