Show simple item record

dc.contributor.authorAhmed, Muniben_US
dc.date.accessioned2012-04-11T20:55:19Z
dc.date.available2012-04-11T20:55:19Z
dc.date.issued2012-04-11
dc.date.submittedJanuary 2011en_US
dc.identifier.otherDISS-11365en_US
dc.identifier.urihttp://hdl.handle.net/10106/9546
dc.description.abstractBioinformatics is an emerging branch of science where issues pertaining to molecular biology are evaluated and resolved by leveraging the techniques and algorithms devised in the field of computer science. Most of these issues are due to the enormous amount of data and the computational complexity involved in generating expeditious and qualitatively viable solutions. This poses a challenge to the algorithm developers who must strive to achieve multiple conflicting objectives of processing very large dataset with the highest accuracy possible while keeping the execution time to a minimum. Genome assembly is one such problem in bioinformatics where a DNA sequence is reconstructed using millions of small fragments of DNA that are produced in the laboratory as a result of sequencing process. When examined purely as data, these fragments are small in size (< 103 characters long) but large in numbers, have repetitive regions which exacerbates the complexity of the reconstruction algorithms, and contain erroneous data due to imperfect laboratory procedures. This dissertation takes a holistic approach to resolve these issues by first presenting a comprehensive study of contemporary work, highlighting its strengths and weaknesses while proposing improvements wherever needed, followed by the design and implementation of a new parallel framework. With the extra processing power available in a parallel computing environment, this framework enhances accuracy of the solution by correcting errors in the low quality data regions and improves the speedup by dynamically balancing the load among multiple processors and by utilizing innovative data structures along with a hashing technique that require lesser memory compared to other contemporary programs. One of the chief objectives of this work is to carve out an important and sizeable piece of the DNA sequence assembly process, device a new parallel algorithm, and provide its modular implementation in order to facilitate the scalability analysis and parametric study of various characteristics and interdependencies of multiple conflicting objectives such as speedup, accuracy, and data size. A comparison between experimental and theoretical statistics of the system explains similarities or deviations and their causes and effects. This research work and the underlying approach can be easily extended to other related areas of bioinformatics, including multiple sequence alignment and phylogenetics, using parallel computing.en_US
dc.description.sponsorshipAhmad, Ishfaqen_US
dc.language.isoenen_US
dc.publisherComputer Science & Engineeringen_US
dc.titleScalable Parallel Processing Of Multi-objective Optimized DNA Sequence Assemblyen_US
dc.typePh.D.en_US
dc.contributor.committeeChairAhmad, Ishfaqen_US
dc.degree.departmentComputer Science & Engineeringen_US
dc.degree.disciplineComputer Science & Engineeringen_US
dc.degree.grantorUniversity of Texas at Arlingtonen_US
dc.degree.leveldoctoralen_US
dc.degree.namePh.D.en_US


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record