Scalable Parallel Processing Of Multi-objective Optimized DNA Sequence Assembly

Ahmed, Munib

dc.contributor.author	Ahmed, Munib	en_US
dc.date.accessioned	2012-04-11T20:55:19Z
dc.date.available	2012-04-11T20:55:19Z
dc.date.issued	2012-04-11
dc.date.submitted	January 2011	en_US
dc.identifier.other	DISS-11365	en_US
dc.identifier.uri	http://hdl.handle.net/10106/9546
dc.description.abstract	Bioinformatics is an emerging branch of science where issues pertaining to molecular biology are evaluated and resolved by leveraging the techniques and algorithms devised in the field of computer science. Most of these issues are due to the enormous amount of data and the computational complexity involved in generating expeditious and qualitatively viable solutions. This poses a challenge to the algorithm developers who must strive to achieve multiple conflicting objectives of processing very large dataset with the highest accuracy possible while keeping the execution time to a minimum. Genome assembly is one such problem in bioinformatics where a DNA sequence is reconstructed using millions of small fragments of DNA that are produced in the laboratory as a result of sequencing process. When examined purely as data, these fragments are small in size (< 103 characters long) but large in numbers, have repetitive regions which exacerbates the complexity of the reconstruction algorithms, and contain erroneous data due to imperfect laboratory procedures. This dissertation takes a holistic approach to resolve these issues by first presenting a comprehensive study of contemporary work, highlighting its strengths and weaknesses while proposing improvements wherever needed, followed by the design and implementation of a new parallel framework. With the extra processing power available in a parallel computing environment, this framework enhances accuracy of the solution by correcting errors in the low quality data regions and improves the speedup by dynamically balancing the load among multiple processors and by utilizing innovative data structures along with a hashing technique that require lesser memory compared to other contemporary programs. One of the chief objectives of this work is to carve out an important and sizeable piece of the DNA sequence assembly process, device a new parallel algorithm, and provide its modular implementation in order to facilitate the scalability analysis and parametric study of various characteristics and interdependencies of multiple conflicting objectives such as speedup, accuracy, and data size. A comparison between experimental and theoretical statistics of the system explains similarities or deviations and their causes and effects. This research work and the underlying approach can be easily extended to other related areas of bioinformatics, including multiple sequence alignment and phylogenetics, using parallel computing.	en_US
dc.description.sponsorship	Ahmad, Ishfaq	en_US
dc.language.iso	en	en_US
dc.publisher	Computer Science & Engineering	en_US
dc.title	Scalable Parallel Processing Of Multi-objective Optimized DNA Sequence Assembly	en_US
dc.type	Ph.D.	en_US
dc.contributor.committeeChair	Ahmad, Ishfaq	en_US
dc.degree.department	Computer Science & Engineering	en_US
dc.degree.discipline	Computer Science & Engineering	en_US
dc.degree.grantor	University of Texas at Arlington	en_US
dc.degree.level	doctoral	en_US
dc.degree.name	Ph.D.	en_US

Files in this item

Name:: Ahmed_uta_2502D_11365.pdf
Size:: 3.361Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record