In Search of an Understandable Consensus Algorithm (Raft)

Paxos Made Simple

ZooKeeper: Wait-free coordination for Internet-scale systems

Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore

Impossibility of Distributed Consensus With One Faulty Process

Consensus in the presence of partial synchrony

Viewstamped Replication Revisited

Replication

Don’t be lazy, be consistent: Postgres-R, a new way to implement Database Replication

PacificA: Replication in Log-Based Distributed Storage Systems

Chain Replication for Supporting High Throughput and Availability

Byzantine Chain Replication

A Comprehensive Study of Convergent and Commutative Replicated Data Types

Optimistic Replication

Causality/Transactions

Stronger Semantics for Low-Latency Geo-Replicated Storage (Eiger)

Calvin: Fast Distributed Transactions for Partitioned Database Systems

Sinfonia: a new paradigm for building scalable distributed systems

Understanding the Limitations of Causally and Totally Ordered Communication

A Response to Cheriton and Skeen’s Criticism of Causal and Totally Ordered Communication

MDCC: Multi-Datacenter Consistency

Spanner: Google’s globally distributed database

Concurrency

Transactional Memory: Architectural Support for Lock-Free Data Structures

Software Transactional Memory

Sharing Memory Robustly in Message-Passing Systems

Wait-free Synchronization

ZooKeeper’s atomic broadcast protocol: Theory and practice

Kafka (LinkedIn)

Omega: flexible, scalable schedulers for large compute clusters

Thialfi: A Client Notification Service for Internet-Scale Applications

Large-scale Incremental Processing Using Distributed Transactions and Notifications

Note: We haven’t included anything already covered in 6.824 , but you should read those papers too.

Paxos Made Live: An Engineering Perspective

Viewstamped Replication: A new primary copy method to support highly-available distributed systems

Time, Clocks, and the Ordering of Events in a Distributed System

The Part-Time Parliament

Paxos Made Practical

The papers from SOSP 2013

Distributed Systems and Parallel Computing

No matter how powerful individual computers become, there are still reasons to harness the power of multiple computational units, often spread across large geographic areas. Sometimes this is motivated by the need to collect data from widely dispersed locations (e.g., web pages from servers, or sensors for weather or traffic). Other times it is motivated by the need to perform enormous computations that simply cannot be done by a single CPU.

From our company’s beginning, Google has had to deal with both issues in our pursuit of organizing the world’s information and making it universally accessible and useful. We continue to face many exciting distributed systems and parallel computing challenges in areas such as concurrency control, fault tolerance, algorithmic efficiency, and communication. Some of our research involves answering fundamental theoretical questions, while other researchers and engineers are engaged in the construction of systems to operate at the largest possible scale, thanks to our hybrid research model .

Recent Publications

Some of our teams.

Algorithms & optimization

Graph mining

Network infrastructure

System performance

We're always looking for more talented, passionate people.

Careers

distributed systems research paper topics

Distributed Computing

  • Covers topics from design and analysis of distributed algorithms to architectures and protocols for communication networks.
  • Includes discussions on synchronization protocols, concurrent programming, and distributed operating systems.
  • Explores areas of fault-tolerance, reliability, availability, and security in distributed computing.
  • Features content on mobile, sensor, and ad hoc networks, internet applications, and concurrency theory.
  • Emphasizes specification, semantics, verification, and testing of distributed systems.
  • Hagit Attiya

distributed systems research paper topics

Latest issue

Volume 37, Issue 1

Latest articles

On the power of bounded asynchrony: convergence by autonomous robots with limited visibility.

  • David Kirkpatrick
  • Irina Kostitsyna
  • Nicola Santoro

distributed systems research paper topics

Good-case early-stopping latency of synchronous byzantine reliable broadcast: the deterministic case

  • Timothé Albouy
  • Davide Frey
  • François Taïani

distributed systems research paper topics

Early adapting to trends: self-stabilizing information spread using passive communication

  • Amos Korman
  • Robin Vacus

distributed systems research paper topics

Component stability in low-space massively parallel computation

  • Artur Czumaj
  • Peter Davies-Peck
  • Merav Parter

distributed systems research paper topics

Distributed computing with the cloud

  • Yehuda Afek
  • Boaz Patt-Shamir

distributed systems research paper topics

Journal information

  • ACM Digital Library
  • Current Contents/Engineering, Computing and Technology
  • EI Compendex
  • Google Scholar
  • Japanese Science and Technology Agency (JST)
  • Mathematical Reviews
  • Norwegian Register for Scientific Journals and Series
  • OCLC WorldCat Discovery Service
  • Science Citation Index Expanded (SCIE)
  • TD Net Discovery Service
  • UGC-CARE List (India)

Rights and permissions

Springer policies

© Springer-Verlag GmbH Germany, part of Springer Nature

  • Find a journal
  • Publish with us
  • Track your research

distributed computing Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Reliability of Trust Management Systems in Cloud Computing

Cloud computing is an innovation that conveys administrations like programming, stage, and framework over the web. This computing structure is wide spread and dynamic, which chips away at the compensation per-utilize model and supports virtualization. Distributed computing is expanding quickly among purchasers and has many organizations that offer types of assistance through the web. It gives an adaptable and on-request administration yet at the same time has different security dangers. Its dynamic nature makes it tweaked according to client and supplier’s necessities, subsequently making it an outstanding benefit of distributed computing. However, then again, this additionally makes trust issues and or issues like security, protection, personality, and legitimacy. In this way, the huge test in the cloud climate is selecting a perfect organization. For this, the trust component assumes a critical part, in view of the assessment of QoS and Feedback rating. Nonetheless, different difficulties are as yet present in the trust the board framework for observing and assessing the QoS. This paper talks about the current obstructions present in the trust framework. The objective of this paper is to audit the available trust models. The issues like insufficient trust between the supplier and client have made issues in information sharing likewise tended to here. Besides, it lays the limits and their enhancements to help specialists who mean to investigate this point.

Guest Editorial: Special Section on Parallel and Distributed Computing Techniques for Non-Von Neumann Technologies

Asynchronous rpc interface in distributed computing system, developing an efficient secure query processing algorithm on encrypted databases using data compression.

Abstract Distributed computing includes putting aside the data utilizing outsider storage and being able to get to this information from a place at any time. Due to the advancement of distributed computing and databases, high critical data are put in databases. However, the information is saved in outsourced services like Database as a Service (DaaS), security issues are raised from both server and client-side. Also, query processing on the database by different clients through the time-consuming methods and shared resources environment may cause inefficient data processing and retrieval. Secure and efficient data regaining can be obtained with the help of an efficient data processing algorithm among different clients. This method proposes a well-organized through an Efficient Secure Query Processing Algorithm (ESQPA) for query processing efficiently by utilizing the concepts of data compression before sending the encrypted results from the server to clients. We have addressed security issues through securing the data at the server-side by an encrypted database using CryptDB. Encryption techniques have recently been proposed to present clients with confidentiality in terms of cloud storage. This method allows the queries to be processed using encrypted data without decryption. To analyze the performance of ESQPA, it is compared with the current query processing algorithm in CryptDB. Results have proven the efficiency of storage space is less and it saves up to 63% of its space.

Preparing Distributed Computing Operations for the HL-LHC Era With Operational Intelligence

As a joint effort from various communities involved in the Worldwide LHC Computing Grid, the Operational Intelligence project aims at increasing the level of automation in computing operations and reducing human interventions. The distributed computing systems currently deployed by the LHC experiments have proven to be mature and capable of meeting the experimental goals, by allowing timely delivery of scientific results. However, a substantial number of interventions from software developers, shifters, and operational teams is needed to efficiently manage such heterogenous infrastructures. Under the scope of the Operational Intelligence project, experts from several areas have gathered to propose and work on “smart” solutions. Machine learning, data mining, log analysis, and anomaly detection are only some of the tools we have evaluated for our use cases. In this community study contribution, we report on the development of a suite of operational intelligence services to cover various use cases: workload management, data management, and site operations.

Deep distributed computing to reconstruct extremely large lineage trees

Distributed computing and artificial intelligence, volume 2: special sessions 18th international conference, software engineering, artificial intelligence, networking and parallel/distributed computing, chinese keyword extraction model with distributed computing, on allocation of systematic blocks in coded distributed computing, export citation format, share document.

Advanced Distributed Systems

Research Seminar at Columbia University

  • --> Blog --> A more detailed course description prepared for the CEE program is available, as is a course preview briefing containing more detailed information on requirements and expectations. The course outline is given below.

    To provide additional support the CEE program, Professor Clifton will be available during office hours through H.323/T.120 desktop videoconferencing (e.g., SunForum , Microsoft NetMeeting .) Please send email if you wish to make use of this, or you might try opening an H.323 connection to blitz.cs.purdue.edu.

    More course information may be available in WebCT ( direct link ).

    Please add yourself to the course mailing list. Send mail to [email protected] containing the line:

    add your email to cs603

    Feel free to send things to the course mailing list if you feel it is appropriate. An example might be a pointer to a particularly helpful on-line manual describing an API used in one of the projects.

    Course Methodology

    The course will be taught through lectures, with class participation expected and encouraged. There will be frequent reading assignments to supplement the lectures.

    For now, Professor Clifton will not have regular office hours. Feel free to drop by anytime, or send email with some suggested times to schedule an appointment. You can also try H.323/T.120 desktop videoconferencing (e.g., SunForum , Microsoft NetMeeting .) You can try opening an H.323 connection to blitz.cs.purdue.edu - send email if there is no response.

    Prerequisites

    The official requirement is CS 503 (Operating systems), with CS 542 (Distributed Database systems) recommended. The practical requirement is a solid undergraduate background in computer science including some database and operating systems theory, and substantial programming experience. If you don't have 503, but feel you have sufficient background, please send me an explanation of why you feel you are prepared, along with a number/times for me to call and discuss approving your registration.

    The following is recommended (it will be a useful reference for much of the lab work in the course):

    Internetworking with TCP/IP Vol.III: Client-Server Programming and Applications, D. E. Comer and D. Stevens, Prentice Hall, (choose appropriate version for your favorite platform), 0-13-032071-4

    The following have been recommended in the past, and may provided useful background reading. However, none are required.

    Distributed Systems, 1993 Sape Mullender Prentice Hall 0-201-62427-3 Distributed Algorithms, 1997 Nancy Lynch Morgan Kaufmann 1-55860-348-4 Distributed Operating Systems, 1995 Tanenbaum Prentice Hall 0-13-219908-4

    Evaluation/Grading:

    Evaluation will be a subjective process, however it will be based primarily on your understanding of the material as evidenced in:

    • Midterm Exam (25%)
    • Final Exam (35%)
    • Projects (4-5) (40%)

    Exams will be open note / open book. To avoid a disparity between resources available to different students, electronic aids are not permitted. (If everyone has a notebook with wireless connection and all agree they want to use them in the exams, I could relax this.)

    I will evaluate projects on a five point scale:

    A substantial portion of your education in this course will come through performing programming projects: building components of a distributed system. Some examples of what projects might involve are:

    • Building a server capable of handling multiple simultaneous TCP/IP connections using the Socket API. The server would be trivial (e.g., calculate the square of the input and return the result after a five second delay), the key effort would be the API.
    • Implement an application that connects to a (provided) CORBA server.
    • Implement a clock synchronization protocol.

    My current expectation is that all projects will be done individually, as it is probable that some of the CEE students will not be collocated with other students in the course.

    Note on Network Access : If you will be doing your project work for the course at a site that is behind a firewall, let me know as soon as possible. Some of the projects will involve connecting to an on-campus server, and if that will involve a firewall on your end I need to know so I can ensure that the ports used are not blocked.

    Policy on Intellectual Honesty

    Please read the above link to the policy written by Professor Spafford . This will be followed unless I provide written documentation of exceptions.

    Late work will be penalized except in case of documented emergency (e.g., medical emergency), or by prior arrangement if doing the work in advance is impossible due to fault of the instructor (e.g., you are going to a conference and ask to start the project early, but I don't have it ready yet.)

    The penalty for late work is 1 point (of the possible 5) if turned in after the deadline, and one additional point for each week late.

    Syllabus (numbers correspond to week):

    Project start/due dates are tentative!

    • Course overview , Components of a distributed system
    • Message Passing
    • Stream-oriented communications
    • Remote Procedure Call
    • Remote Method Invocation
    • DCE RPC ( reading )
    • Java RMI ( reading )
    • SOAP (Reading: SOAP 1.1 spec , XML Protocol Working Group , Apache SOAP )
    • Active Directory ( reading )
    • What is clock synchronization? Leslie Lamport, " Time, clocks, and the ordering of events in a distributed system ", Communications of the ACM 21(7) (July 1978).
    • Possibility and impossibility Lundelius, J. and Lynch, N., " An Upper and Lower Bound for Clock Synchronization ," Information and Control, Vol. 62, Nos. 2/3, pp. 190-204, 1984. Danny Dolev, Joe Halpern, and H. Raymond Strong, " On the possibility and impossibility of achieving clock synchronization ", Journal of Computer and System Sciences 32(3) 230-250. April 1986. Michael J. Fischer, Nancy A. Lynch, and Michael Merritt, " Easy impossibility proofs for distributed consensus problems " Proceedings of the fourth annual symposium on Principles of distributed computing 1985 , Minaki, Ontario, Canada.
    • Practical solution: NTP ( Reading )

    Other Reading: Leslie Lamport and P. M. Melliar-Smith, " Synchronizing clocks in the presence of faults " Journal of the ACM 32(1) (January 1985). Jennifer Lundelius and Nancy Lynch, " A new fault-tolerant algorithm for clock synchronization , Proceedings of the third annual ACM symposium on Principles of distributed computing 1984 , Vancouver, British Columbia, Canada.

    • Overview : Global State, Mutual Exclusion Leslie Lamport, `` The Mutual Exclusion Problem '', Journal of the ACM 33(2) (April 1986). Read Part II section 2 - the rest is optional. Leslie Lamport, `` 1983 Invited address: Solved problems, unsolved problems and non-problems in concurrency , Proceedings of the third annual ACM symposium on Principles of distributed computing , 1984, Vancouver, British Columbia, Canada. Optional - Global State: K. Mani Chandy and Leslie Lamport, `` Distributed Snapshots: Determining Global States of Distributed Sytems '', ACM Transactions on Computer Systems 3(1) (February 1985) 63-75.
    • Fault Tolerant Solutions Michael J. Fischer, Nancy A. Lynch, James E. Burns and Allan Borodin, `` Distributed FIFO allocation of identical resources using small shared space '' ACM Transactions on Programming Languages and Systems 11(1) (1989) pp. 90-114.
    • Multiple resources Requirements Please don't check these out - others may want to read them. Dijkstra, E. `` Hierarchical Ordering of Sequential Processes '', ACTA Informatica 1 (1971), 115-138. M. Rabin and D. Lehmann, ``On the Advantages of Free Choice: A Symmetric and Fully Distributed Solution to the Dining Philosophers Problem'', Proceedings of the 8th Symposium on Principles of Programming Languagues (1981) pp. 133-138.
    • 2-Phase Commit
    • Formal Models for failure and recovery
    • 3-Phase Commit
    • Basics Reading: Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman, Concurrency Control and Recovery in Database Systems , Chapter 8: Replicated Data , Addison Wesley, 1987.
    • Example: Replication in Oracle
    • Advanced Techniques: Quasi-Copies Reading: Rafael Alonso, Daniel Barbará , and Hector Garcia-Molina, `` Data caching issues in an information retrieval system '', ACM Transactions on Database Systems (TODS) 15(3), September 1990.
    • Mid-Semester Review March 8, in class: Midterm on material from weeks 1-7. Please advise if this is a problem. -->
    • Threads vs. Processes, Code migration basics
    • Mobile Agents
    • Mobile Agents example: D'Agents Reading: D'Agents web site , position paper.
    • Distributed Object systems: CORBA ( OMG ) Reading: CORBA Overview from The Common Object Request Broker: Architecture and Specification , OMG group , 2001. CORBA Security Service ( reading ). Third project due April 3 , fourth project starts.
    • DCOM Reading: DCOM vs. .NET
    • Distributed Coordination: Jini . Further reading: Jan Newmarch's Guide to JINI Technologies .
    • Failure models . Reading: Dr. Flaviu Cristian , Understanding Fault-Tolerant Distributed Systems , Communications of the ACM 34(2) February 1991.
    • Fault Tolerance Reading: Felix C. Gärtner, Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments ACM Computing Surveys 31(1), March 1999.
    • Reliable communication
    • Recovery Optional reading: Richard Golding and Elizabeth Borowsky, Fault-Tolerant Replication Management in Large-Scale Distributed Storage Systems , in Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems 18-21 October, 1999, Lausanne, Switzerland. Hector Garcia-Molina, Christos A. Polyzois and Robert B. Hagmann, Two Epoch Algorithms for Disaster Recovery , in Proceedings of the 1990 conference on Very Large Data Bases , Brisbane, Australia, August 13-16 1990.

    Final exam Thursday, May 2, 2002 from 1:00pm to 3:00pm in RHPH 164.

        CSC 591D: Special Topics on Distributed Systems

    Spring 2008 Credits: 3 Meeting Times: Tuesday/Thursday, 3:50 - 5:05pm Meeting Location: EBII 1220 Wolfware Course Web

    Instructor Information:

    Instructor: Xiaohui (Helen) Gu : EB-II 3296, [email protected]