Time: Tuesday 2:50pm-5:50pm
Place: CoRE A
Instructor: Thu D.
Nguyen
Email: tdnguyen@cs.rutgers.edu
Office hours: Thursday & Friday afternoon, knock if door is closed.
Other days, if door is open, welcome. Otherwise, please leave to Thursday or
Friday.
Office: CoRE 326
TA: Murali Rangarajan
Email: muralir@cs.rutgers.edu
Mailing list: dcs_545@email.rutgers.edu
Announcements
Text: Andrew Tanenbaum and Maarten van Steen. Distributed Systems: Principles and Paradigms. Prentice Hall. Note: we will derive most of our information from papers so this text is now more optional.
Project
You will be required to work on a project independent of any on-going research. The project may be related to or an off-shoot of something you are working on but it must be distinct. You are allowed to work in teams of up to 3 students. As the group gets larger, I will be expecting the scope of the project to expand. I will help (through discussion, etc.) but you will be primarily responsible for defining your own project.
- 3/4/02: project proposal due
- 4/2/02: interim report due
- 5/6/02: final report due
Schedule
1/22
Administrative
- Slides (important because has all the ground rules for the course in it)
1/29
Communication Medium
- Intro slides.
- Robert M. Metcalfe and David R. Boggs. Ethernet: distributed packet switching for local computer networks. Communications of the ACM, Volume 19 , Issue 7 (July 1976).
- Presenter: Chris Peery. Slides. Estimated time: 25 minutes
- Submit summary. View summaries.
- Evan Speight, Hazim Abdel-Shafi, and John K. Bennett. Realizing the Performance Potential of the Virtual Interface Architecture. In Proceedings of the International Conference on Supercomputing, 1999.
- Presenter: Constantin Serban. Slides. Estimated time: 50 minutes
- Submit summary. View summaries.
- Not required but might be worthwhile reading: Philip Buonadonna, Andrew Geweke, and David Culler. An Implementation and Analysis of the Virtual Interface Architecture, Proceedings of SC98, November 1998.
- Mellanox Technologies. Introduction to InfiniBand.
- Presenter: Aniruddha Bohra. Slides. Estimated time: 40 minutes
- Submit summary. View summaries.
- Specs: Vol1, Vol2
- Bonus paper: T. Heath, S. Kaur, R. Martin, and T. D. Nguyen. Quantifying the Impact of Architectural Scaling on Communication. In Proceedings of the Seventh International Symposium on High Performance Computer Architecture (HPCA), January 2001.
- Covered briefly in Intro slides
2/12
Communication Protocols
- David Patterson's keynote address at HPCA-8 (estimated time 20-30minutes)
- D. Clark. The design philosophy of the DARPA internet protocols. In Proceedings of the ACM SIGCOMM'98, 1998.
- Presenter: Rahul Pupala. Slides. Estimated time:
- Submit summary. View summaries.
- J. H. Saltzer, D. P. Reed, D. D. Clark. End-to-end arguments in system design. ACM Transactions on Computer Systems (TOCS), Volume 2 , Issue 4 (November 1984).
- Presenter: Jerry Hom. Slides. Estimated time: 30 minutes
- Submit summary. View summaries.
- Andrew D. Birrell and Bruce Jay Nelson. Implementing remote procedure calls. ACM Transactions on Computer Systems (TOCS), Volume 2, Issue 1, February 1984.
- Presenter: Kiran Nagaraja. Slides. Estimated time: 40 minutes
- Submit summary. View summaries.
- Matt Welsh and David Culler. Jaguar: Enabling Efficient Communication and I/O in Java. Concurrency: Practice and Experience, Vol. 12, pp. 519-538, Special Issue on Java for High-Performance Applications, December, 1999.
- Presenter: Thu. Slides. Estimated time: 30 minutes
2/19
Group Communication and Transaction
- Kenneth P. Birman, The Process Group Approach to Reliable Distributed Computing. Communications of the ACM (CACM), 36(12):37-53, December 1993.
- Presenter: Constantine Serban. Slides. Estimated time:
- Submit summary. View summaries.
- D.R. Cheriton and D. Skeen, Understanding the Limitations of Causal and Totally Ordered Multicast. Proceedings of the 14th Symposium on Operating System Principles (SOSP '93), December 1993.
- Presenter: Zhijun He. Slides. Estimated time:
- Submit summary. View summaries.
- J. Gray. The transaction concept: virtues and limitation. Proceedings of the 7th VLDB Conference, 1981
- Presenter: Thu. Slides. Estimated time:
- Submit summary. View summaries.
- This paper has been downgraded to useful but not required: K. P. Eswaran, J. N. Gray, R. A. Lorie, I. L. Traiger. The notions of consistency and predicate locks in a database system. Communications of the ACM, Volume 19 , Issue 11 (November 1976).
2/26
Naming
- Paul V. Mockapetris andKevin J. Dunlap. Development of the Domain Name System. In Proceedings of the ACM SIGCOMM'98, 1998.
- Presenter: Thu. Slides. Estimated time: 20 minutes
- Submit summary. View summaries.
- William Adjie-Winoto, Elliot Schwartz, Hari Balakrishnan, and Jeremy Lilley. The design and implementation of an intentional naming system. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP ’99), Dec 1999.
- Presenter: Vincent Matossian. Slides. Estimated time:
- Submit summary. View summaries.
Algorithms
- Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, Volume 21 , Issue 7 (July 1978).
- Presenter: Rahul Pupala. Slides. Estimated time:
- Submit summary. View summaries.
3/5
- K. Mani Chandy and Leslie Lamport. Distributed snapshots: determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS), Volume 3 , Issue 1 (February 1985).
- Presenter: Rahul Pupala. Slides. Estimated time: 50 minutes.
- Submit summary. View summaries.
- M. Fischer, N. A. Lynch, and M. S. Patterson. Impossibility of distributed consensus with one faulty processor. Journal of the ACM, 32(2):374-382, April 1985.
- Presenter: Zhijun He. Slides. Estimated time:
- Submit summary. View summaries.
3/12
- H. Garcia-Molina. Elections in a distributed computer system. IEEE Transactions on Computers, C-31(2): 48-59, 1982.
- Presenter: Srinath Rao. Slides. Estimated time:
- Submit summary. View summaries.
- The library did a terrible job scanning this paper. I have a couple of books that cover elections pretty well so people can drop by to borrow to take a look.
- B. Lampson. The ABCDs of Paxos. Presented at Principles of Distributed Computing, 2001, as one of the papers celebrating Leslie Lamport’s 60th birthday.
- Presenter: Thu.
- Submit summary. View summaries.
- L. Lamport. The Part-Time Parliament. ACM Transactions on Computer Systems, Vol. 16, No. 2 (May 1998), 133-169.
- L. Lamport. Paxos Made Easy. SIGACT News.
Clustering
- Matt Welsh, David Culler, and Eric Brewer. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services. In Proceedings of the 18th Symposium on Operating Systems Principles (SOSP-18), October 2001.
- Presenter: Aniruddha Bohra.
- Submit summary. View summaries.
3/19
Spring break - no class
3/26
- E. V. Carrera, S. Rao, L. Iftode, and R. Bianchini. User-Level Communication in Cluster-Based Servers. In Proceedings of the 8th IEEE International Symposium on High-Performance Computer Architecture (HPCA 8), February 2002.
- Presenter: Srinath Rao.
- Submit summary. View summaries.
- Jeffrey S. Chase, Darrell C. Anderson, Prachi N. Thakar, Amin M. Vahdat, Ronald P. Doyle. Managing energy and server resources in hosting centers. In Proceedings of the 18th Symposium on Operating Systems Principles (SOSP-18), October 2001.
- Presenter: Jerry Hom
- Submit summary. View summaries.
4/2
Fault Tolerance
- Flavin Cristian. Understanding fault-tolerant distributed systems. Communications of the ACM, Volume 34 , Issue 2 (February 1991).
- Presenter: Kiran Nagaraja.
- Submit summary. View summaries.
- Rodrigo Rodrigues, Miguel Castro, Barbara Liskov. BASE: using abstraction to improve fault tolerance. In Proceedings of the 18th ACM symposium on operating systems principles, 2001.
- Presenter: Thu
- Submit summary. View summaries.
- This paper is really the third in a series. The following are defintiely NOT required but they are really quite good (at least the first one is, I haven't read the second one yet) and would help understand this paper.
- Miguel Castro and Barbara Liskov. Practical Byzantine Fault Tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI '99), New Orleans, USA, February 1999.
- Miguel Castro and Barbara Liskov. Proactive Recovery in a Byzantine-Fault-Tolerant System. In Proceedings of the Fourth Symposium on Operating Systems Design and Implementation (OSDI '00), San Diego, USA, October 2000.
- David E. Lowell, Subhachandra Chandra, and Peter M. Chen. Exploring Failure Transparency and the Limits of Generic Recovery. Proceedings of the 2000 Symposium on Operating Systems Design and Implementation (OSDI), October 2000.
- Presenter: Jerry Hom
- Submit summary. View summaries.
- B. Chandra, M. Dahlin, L. Gao, A. Nayate. End-to-end WAN Service Availability. Third Usenix Symposium on Internet Technologies and Systems (USITS01). March 2001.
- Looks like this paper is out for now. It's actually pretty interesting. If we have time, we'll do it later.
4/9
Consistency and Replication
- John Ousterhout. The Role of Distributed State. CMU Computer Science: A 25th Anniversary Commemorative, ACM Press Anthology Series, R. Rashid (Ed.), July 1991.
- Presenter: Robin Carnow
- Haifeng Yu and Amin Vahdat. The costs and limits of availability for replicated services. In Proceedings of the 18th ACM symposium on operating systems principles, 2001.
- Presenter: Aniruddha Bohra
- D. B. Terry, K. Petersen, M. J. Spreitzer, and M. M. Theimer. The Case for Non-transparent Replication: Examples from Bayou. IEEE Data Engineering, December 1998, pages 12-20.
- D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer, and C. Hauser. Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System. In
Proceedings of the 15th Symposium on Operating Systems Principles (SOSP-15), December 1995, pages 172-183.
- Presenter: Chris Peery.
- Submit summary. View summaries.
- These two papers are together. More information on Bayou.
- Not required but interesting. K. Petersen, M. J. Spreitzer, D. B. Terry, M. M. Theimer, and A. J. Demers. Flexible Update Propagation for Weakly Consistent Replication. In
Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP-16), October 1997, pages 288-301.
4/16
- Chris will be finishing up Bayou
Security
- R. M. Needham, M. D. Schroeder. Using encryption for authentication in large networks of computers. Communications of the ACM, Volume 21 , Issue 12 (December 1978).
- Presenter: Robin Carnow
- Submit summary. View summaries.
- B. Clifford Neuman and Theodore Ts'o. Kerberos: An Authentication Service for Computer Networks, IEEE Communications, 32(9):33-38. September 1994.
- Steven M. Bellovin and Michael Merritt. Limitations of the Kerberos Authentication System. USENIX Conference Proceedings, pp. 253--267, Winter 1991.
- Presenter: Srinath Rao (will do both Kerberos papers)
- Submit summary. View summaries.
- Not required by interesting: M. Burrows, M. Abadi, R. Needham. A logic of authentication. ACM Transactions on Computer Systems, Volume 8 , Issue 1 (February 1990).
4/23
- R. L. Rivest and B. Lampson. SDSI - A Simple Distributed Security Infrastructure.
- Presenter: Aniruddha Bohra.
- Submit summary. View summaries.
- A. C. Snoeren, C. Partridge, L. A. Sanchez, C. E. Jones, F. Tchakountio, S. T. Kent, and W. Timothy Strayer. Hash-based IP traceback. In Proceedings of the ACM SIGCOMM'01, August, 2001.
- Presenter: Robin Carnow
- Submit summary. View summaries.
- K. Park and H. Lee. On the effectiveness of route-based packet filtering for distributed DoS attack prevention in power-law internets. In Proceedings of the ACM SIGCOMM'01, August, 2001
4/30
Overlay Networks
- David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris. Resilient overlay networks. In Proceedings of the 18th ACM symposium on operating systems principles, 2001.
- Alex C. Snoeren, Kenneth Conley, David K. Gifford. Mesh-based content routing using XML. In Proceedings of the 18th ACM symposium on operating systems principles, 2001.
Applications
- Antony Rowstron, Peter Druschel. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In Proceedings of the 18th ACM symposium on operating systems principles, 2001.