Skip to content.
Personal tools

Automatic Synchronization and Distribution of Biological Databases over Low-Bandwidth Networks among Developing Countries

Page Tools
Grant awarded in November 2005 to Centre for Genomics and Bioinformatics Research, Prince of Songkhla University, Thailand to research, implement, and test a next generation automatic biological software, courseware, and database distribution and synchronization network based on Peer-to-Peer technology, for developing countries in the Asia-Pacific region with low-bandwidth Internet links.

Project Leader: Amornrat Phongdara
Recipient Institution: Centre for Genomics and Bioinformatics Research, Faculty of Science, Prince of Songkhla University, Thailand 
Amount: USD 29,800
Duration: 18 Months
Commencement Date: 2006

Project Abstract

Bioinformatics involves the collection, organization and analysis of large amounts of biological data, using networks of computers and databases. Publicly available biological databases, such as GenBank, are currently growing at a rate of nearly a doubling in size every year; GenBank itself is already about 15GB large as of September 2005.

Bioinformatics Centres have to regularly update their database repositories with the latest releases. This is normally done by a file transfer over FTP; but the large and growing sizes of these databases means that large network bandwidth is required. Developing countries in the Asia-Pacific region are just moving into this new field of bioinformatics, but the computational infrastructure and network bandwidths available in those countries are still at a primitive level compared to that in more developed countries. These countries often have links of 512kbps – 1Mbps only, compared with a 155Mbps link to Internet / Internet 2 available in an advanced country such as Singapore.

In the late 90’s, the Internet community witnessed the start of a major revolution in the way people share files – Peer-to-Peer (P2P) file exchange was introduced with the wildly popular Napster in 1997. P2P technology continued to evolve over the years, and in 2002 the 3rd generation of P2P technology was introduced with BitTorrent.

This 3rd generation P2P technology was a major advance over previous P2P protocols - with 3rd generation technology, a large file to be distributed will be broken up into smaller fragments, typically around a quarter of a megabyte each. These fragments are distributed to each peer, and amongst peers, in a random manner, and are reassembled at the requesting
machine.

With this architecture, the more peers there are, the more nodes are available to distribute fragments of the file. High demand will actually lead to greater throughput as more bandwidth from additional nodes becomes available to the group. In the event of a disconnection, downloads can also recover and continue.

If the 3rd generation P2P protocol is used in distributing and synchronizing biological databases across the Asia-Pacific region, it offers to simultaneously solve the two major problems plaguing the developing countries: Low international bandwidth, and unreliable connections that may drop anytime.

The project aims to:

  1. Develop a client application based on 3rd generation P2P protocols, or extend an existing open-source one, for use in the distribution of biological software, courseware, and databases.
  2. Set up and test the performance of this biological software, courseware, and database distribution P2P network, with nodes in countries in the Asia-Pacific region starting with Singapore and Thailand, and to beyond. These tests will include:
  • Benchmarking performance against more traditional rsync and FTP techniques
  • Assessing the effect of bandwidth saturation in using P2P
  • Identifying P2P architecture and topology variations most suited for distributing the datasets of different sizes

 



Last modified 2006-05-12 03:52 PM


 
 

Powered by Plone rss logo

This site conforms to the following standards:

Valid XHTML 1.0 Transitional Valid CSS!

Hosted by Inigo