Show simple item record

dc.contributor.authorJayantilal, Tushar Kumaren_US
dc.date.accessioned2007-08-23T01:56:36Z
dc.date.available2007-08-23T01:56:36Z
dc.date.issued2007-08-23T01:56:36Z
dc.date.submittedAugust 2006en_US
dc.identifier.otherDISS-1492en_US
dc.identifier.urihttp://hdl.handle.net/10106/393
dc.description.abstractBioinformatics is a relatively new scientific field concerned with providing computational means and support to research in molecular biology and genetics. It draws from many different areas of computer science, including database theory, algorithm design and analysis, and artificial intelligence, to name just a few. In many applications, such as one described in this thesis, a biologist is interested in locating a particular pattern, or sequence motif, in a given string or set of strings over the four-letter DNA alphabet. In this thesis we present an efficient approach to locating promoter and other regulatory sequences in entire genomes or in specific target areas. Promoters are short conserved sequences located upstream of the genes they regulate, and they have an essential role in driving the expression of the genes. However, in higher organisms promoter sequences are very diverse, and their motifs can feature substantial sequence variation, including character (chemical base) substitutions, insertions and deletions. We were thus interested in designing methods for the efficient search for approximate matches of the target sequences to putative promoter consensus. To achieve this goal we have used a combination of classic pattern-matching algorithms and high performance computing, parallelizing the search over a number of processors. We have used these methods in the genome-wide search for a 14-base promoter element in the genome of the fruit fly Drosophila melanogaster, postulated to have a role in the testis-specific expression of the genes it controls. The list of testis-specific genes has been provided by our collaborator from the UTA Department of Biology, and we have obtained the raw sequence and other genetic data from Flybase, a public database maintained by a consortium of Drosophila researchers. Although the number of genes we were interested in was moderate, 791, the number of exact patterns approximately matching the 14 base consensus was very large, approximately 38,000. This problem has thus guided the design of the methods we describe in this thesis.en_US
dc.description.sponsorshipStojanovic, Nikolaen_US
dc.language.isoENen_US
dc.publisherComputer Science & Engineeringen_US
dc.titleAn Application Of Parallel And Distributed Computing Methods To Approximate Pattern Matching Of Genetic Regulatory Motifsen_US
dc.typeM.S.en_US
dc.contributor.committeeChairStojanovic, Nikolaen_US
dc.degree.departmentComputer Science & Engineeringen_US
dc.degree.disciplineComputer Science & Engineeringen_US
dc.degree.grantorUniversity of Texas at Arlingtonen_US
dc.degree.levelmastersen_US
dc.degree.nameM.S.en_US
dc.identifier.externalLinkhttps://www.uta.edu/ra/real/editprofile.php?onlyview=1&pid=210
dc.identifier.externalLinkDescriptionLink to Research Profiles


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record