FGENESH
Softberry Inc, http://www.softberry.com/
Overview
FGENESH는 genomic DNA sequences에서 유전자서열을 예측하는 HMM(Hidden Markov Model) 기반의 프로그램으로 기존에 알려진 다른 유전자 예측 프로그램 보다 속도와 정확성측면에서 향상되었습니다. 최근 rice genome sequencing projects(Yu et al. (2002) Science 296:79)에서 "The most successful (gene finding) program"이라는 극찬을 들었고, 모든 예측되어진 유전자의 87%를 생산한다고 합니다(Goff et al. (2002) Science 296:92). 또한 FGENESH는 가장 널리 알려진 유전자 예측 프로그램 중 하나인 Genescan보다 50~100배 빠르게 분석합니다. FGENESH는 정확성을 향상시키기 위해 몇몇 분류학상 그룹(human, mouse, Drosophila, Anopheles, C.elegans, SS.pombe, Plasmodium, Neurospora, Arabidopsis, Tobacco and monocot plants)에 대해서는 학습되어진 data sets을 공급하고 있습니다.
Features
- Gene position의 기능으로서 rice genes를 가지고 다른 유전자 예측 프로그램들과의 비교 수행 결과
- 이 테스트에 의해 FGENESH가 5개의 프로그램 중 가장 정확한 프로그램이라는 것이 확인되었다.
원문.pdf (Yu et al. (2002) Science 296:79-92)
- human gene sequences(900 exons)로 알려진 178을 포함하고 있는 42 semiartificial genomic sequences를 가지고 3개의 일반적인 gene prediction programs을 비교 수행한 결과
- Sensitivity는 정확하게 예측되어진 exons의 퍼센트
- Specificity는 예측되어진 exons이 정확하다는 퍼센트
- 원문 : Yada et al., 2002 Cold Spring Harbor Genome Sequencing and Biology Meeting, May 7-11, 2002
- 이 테스트에 의해 FGENESH가 3개의 프로그램 중 가장 정확한 프로그램이라는 것을 설명할 수 있다.
구체적인 사용법은 Quick Reference을 참고하십시오.
Reference
A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) - SCIENCE VOL 296 5 APRIL 2002 A Draft Sequence of the Rice.pdf
Ab initio Gene Finding in Drosophila Genomic DNA - Genome Res. 2000 10: 516-522 Ab initio Gene Finding in Drosophila Genomic DNA.pdf
JA Paine, CA Shipton, S Chaggar, RM Howells, MJ ? Nature Biotechnology, 2005 Improving the nutritional value of Golden Rice through increased pro-vitamin A content ... Arabidopsis thaliana psy and rice psy (AY024351) genes identified genomic sequences of similarity in which genes were predicted using FGENESH algorithm with ...
Hashimoto et al. (2004) Nature Biotechnology 22, 1146 - 1149 5'-end SAGE for the analysis of transcriptional start sites ...from expressed sequence tag (EST) maps, analysis of full-length cDNAs and computational annotation by Genscan, Genie, Fgenes and other programs...
Biology analysis group et al. (2004) Science 306: 1937-1940. A Draft Sequence for the Genome of the Domesticated Silkworm (Bombyx mori). ...Usage of FGENESH for gene finding...
Ane et al. (2004) Science 303: 1364-1367. Medicago truncatula DMI1 Required for Bacterial and Fungal Symbioses in Legumes. ...Usage of FGENESH for gene finding...
FGENESH output
용어설명
- G - predicted gene number, starting from start of sequence
- Str - DNA strand (+ for direct or - for complementary)
- Feature - type of coding sequence
- CDSf - First (Starting with Start codon)
- CDSi - internal(internal exon)
- CDSl - last coding segment, ending with stop codon)
- TSS - Position of transcription start (TATA-box position and score)
- Start and End - Position of the Feature
- Weight - Log likelihood*10 score for the feature
- ORF - start/end positions where the first complete codon starts and the last codon ends
Regular output
FGENESH 1.1 Prediction of potential genes in Homo_sapiens genomic DNA Time : Fri Mar 29 18:55:31 2002 Seq name: >Adh_and_cact.1 (2919020 bases) 848501 853000 Length of sequence: 4500 Number of predicted genes 1 in +chain 1 in -chain 0 Number of predicted exons 6 in +chain 6 in -chain 0 Positions of predicted genes and exons: G Str Feature Start End Score ORF Len 1 + 1 CDSf 3 - 194 3.73 3 - 194 192 1 + 2 CDSi 2213 - 2339 2.48 2213 - 2338 126 1 + 3 CDSi 2577 - 2690 13.20 2579 - 2689 111 1 + 4 CDSi 2756 - 2936 17.20 2758 - 2934 177 1 + 5 CDSi 2991 - 3173 7.47 2992 - 3171 180 1 + 6 CDSl 3242 - 3419 6.66 3243 - 3419 177 1 + PolA 3968 1.12 Predicted protein(s): >FGENESH: 1 6 exon (s) 3 - 3419 324 aa, chain + MLVQTPGISKSWMSSICLRESTFFMSCDRFRRSVSHCEGDTHELTAWQRVYLATHIWHRL AGAQLAGKQTRSAVQTQAGLKKKYRGQFEKGEQNVVSTQNKLMQRLGPNMTAAPYNYNYI FKYIIIGDMGVGKSCLLHQFTEKKFMANCPHTIGVEFGTRIIEVDDKKIKLQIWDTAGQE RFRAVTRSYYRGAAGALMVYDITRRSTYNHLSSWLTDTRNLTNPSTVIFLIGNKSDLEST REVTYEEAKEFADENGLMFLEASAMTGQNVEEAFLETARKIYQNIQEGRLDLNASESGVQ HRPSQPSRTSLSSEATGAKDQCSC
FGENESH+
- FGENESH+는 알려진 단백질을 가지고 HMM 유전자 모델과 상동성을 가지고 Genomic DNA Sequences에서 Multiple Genes을 예측하는 프로그램입니다.
- 3개의 다른 유전자 예측 프로그램으로 서열을 처리하는 빠르기 비교
- 분석의 정확성 비교
- Reference
Galagan et al. (2003) Nature 422:859-868. The genome sequence of the filamentous fungus Neurospora crassa. ...Neurospora genome annotation based on FGENESH and FGENESH+...
- FGENESH+ output
- 용어 설명
G - predicted gene number, starting from start of sequence; Str - DNA strand (+ for direct or - for complementary); Feature - type of coding sequence: CDSf - First (Starting with Start codon), CDSi - internal (internal exon), CDSl - last coding segment, ending with stop codon); TSS - Position of transcription start (TATA-box position and score); Start and End - Position of the Feature; Weight - Log likelihood*10 score for the feature ORF - start/end positions where the first complete codon starts and the last codon ends Last three values: Length of exon, positions in protein, percent of similarity with target protein
- output
FGENESH+ Prediction of potential genes in Human genomic DNA Time: Tue Nov 7 15:56:51 2000 31 Seq name: Adh_and_cact.1 (2919020 bases) 848501 853000 Protein - gi|2313041|gnl|PID|d1022564 Length 215 Sim: 90 Length of sequence: 4500 GC content: 40 Zone: 1 Number of predicted genes 1 in +chain 1 in -chain 0 Number of predicted exons 4 in +chain 4 in -chain 0 Positions of predicted genes and exons: G Str Feature Start End Score ORF Len 1 + TSS 1455 -9.70 1 + 1 CDSf 2585 - 2690 199.20 2585 - 2689 105 1 - 35 100 1 + 2 CDSi 2756 - 2936 324.68 2758 - 2934 177 37 - 95 100 1 + 3 CDSi 2991 - 3173 315.30 2992 - 3171 180 97 - 156 100 1 + 4 CDSl 3242 - 3419 298.40 3243 - 3419 177 158 - 215 100 Predicted protein(s): >FGENESH+ 1 4 exon (s) 2585 - 3419 215 aa, chain + MTAAPYNYNYIFKYIIIGDMGVGKSCLLHQFTEKKFMANCPHTIGVEFGTRIIEVDDKKI KLQIWDTAGQERFRAVTRSYYRGAAGALMVYDITRRSTYNHLSSWLTDTRNLTNPSTVIF LIGNKSDLESTREVTYEEAKEFADENGLMFLEASAMTGQNVEEAFLETARKIYQNIQEGR LDLNASESGVQHRPSQPSRTSLSSEATGAKDQCSC
- 용어 설명