Minimum probe length for unique identification of all open reading frames in a microbial genome (open access)

Minimum probe length for unique identification of all open reading frames in a microbial genome

In this paper, we determine the minimum hybridization probe length to uniquely identify at least 95% of the open reading frame (ORF) in an organism. We analyze the whole genome sequences of 17 species, 11 bacteria, 4 archaea, and 2 eukaryotes. We also present a mathematical model for minimum probe length based on assuming that all ORFs are random, of constant length, and contain an equal distribution of bases. The model accurately predicts the minimum probe length for all species, but it incorrectly predicts that all ORFs may be uniquely identified. However, a probe length of just 9 bases is adequate to identify over 95% of the ORFs for all 15 prokaryotic species we studied. Using a minimum probe length, while accepting that some ORFs may not be identified and that data will be lost due to hybridization error, may result in significant savings in microarray and oligonucleotide probe design.
Date: March 5, 2000
Creator: Sokhansanj, B. A.; Ng, J. & Fitch, J. P.
System: The UNT Digital Library