Nucleotide sequence of proviral genome of a novel type-C retrovirus (MmRV), taken from position 112341 to 121005 of Mus musculus PAC Clone 657p21 (Accession: AC005743). Total length: 8665 bp. This sequence has been deposited in GenBank (Accession number: XXXXXXX). The 9 bp imperfect repeats that define the LTRs have marked in green- note that 2 bp are deleted from the terminal copies of these repeats upon insertion. The CAT box is not clearly identifiable (see Ref 5), though a candidate sequence appears upstream of the TATA box on the opposite strand. The repeat region spanning U3 and R is marked in italics, and the following features identified: terminal 9 bp repeats (blue), pol-like motif (bold), 12 bp inverse palindrome (underlined), poly-adenylation signal (underlined). Upstream of the gag ATG start codon, a CTG start codon defines the start of 99 bp of glycosylated Gag. The untranslated region between U5 and the start of glycosylated gag contains a variable number (two to six) of perfect 35 bp repeats.
|U3->
1 TGAAGAATAAAAGTTACTGAACTCTTCCTCACCCCAGAACTCGTTCCCAGAACACTCCTG
61 AACTCTTCACCCTAGAATGCATTCCTGAACTCCTCACCCTAGAGTTCGAACCCTCCCAAC
121 TAAAGACTGTTCCAAGAACATTTTTGAGATAAGGGCCTCCTGGAACAACCTCAGAATGAA
181 CCGGGTACATTGCCAAATAATAGGACATGACCCCTTAGTTACGTAGAATCCCTTGGCAGA
241 ACCCCTTGTCCCTTGGCAGAACCCCTTAGTTATGTAAACTTGTACTTTCCCTACCCCGCT
promoter <-U3||R->
301 CTCCCCCCTTGAGTTTTCCTATATAAGCCTGTGAAAAATTTGGCTGGTCGTCGATTCTCC
inverse palindrome repeat region
361 TCTACACCACTAGGTGTATGAGTTTCGACCCCAGAGCTCTGGTCTATGTGCTTTCATGCT
pol motif
polyA signal <-R||U5->
421 GCTGCTTTATTAAATCTTGCCTTCAACATTTTGAGTTCGGTCTCAGTGTCTTCTTGGGTC
<-U5||pbs->
481 CGCGGCTGTCCCGAGGCTTGAGTGAGGGTCTCCCTTCGGGGGTCTTTCATTTGGGGGCTC
primer binding site splice donor site
541 GTCCGGGATCAGTGCGACCACCCAGAGACCCTAGACCCACTTTAAGGTAAGATTCTTTGA
601 CCTGTCTTGGTTTGGTGTCTCTGTTCTGTTTCTAAGTTTGGTGCGATCGCAGTTTCGGTT
661 TTGCGGACGCTCAGTGAGACTGCGCTCCGAGAGGGAACGCGGGGTGGATAAGGATAGACG
721 TGTCCAGGTGTCCGCCGTCCGTTCGCCCTGGGAGACGTCCCAGGAGGAACAGGGGAGGAC
[ Rpt1
781 CAGGGACGCCTGGTGGACCCCTTTGGAGGCCAAGAGACCATCTGGGGTTGCGAGATCGTG
] [ Rpt2 ]
841 GGTTCGAGTCCCACCTCGTGCCTTGTTGCGAGATCGTGGGTTCGAGTCCCACCTCGTGCA
|glyco-gag->
901 GAGGGTCTCAATCGGCCGGCCTTAGAAAAGCCATCTGATTCTCTGAGTTGCTTGTGGTCG
LeuSerCysLeuTrpSer
961 ACGCGAAGTCGCCGCCGCTTTTGGTTTCTTTTTTGTCTTAGTCTCGTGTTCGCTCTTGTT
ThrArgSerArgArgArgPheTrpPheLeuPheCysLeuSerLeuValPheAlaLeuVal
|gag->
1021 GTGTCTACTATTGTTCTGGAAATGGGACAATCTGTGTCCACTCCCCTTTCTCTAACTCTG
ValSerThrIleValLeuGluMetGlyGlnSerValSerThrProLeuSerLeuThrLeu
1081 GAGCATTGGAAGGAGGTGCGGGTCAGAGCTCACAACCAGTCGGTGGAGGTCAGAAAGGGT
GluHisTrpLysGluValArgValArgAlaHisAsnGlnSerValGluValArgLysGly
1141 CCGTGGCAGACCTTTTGCACCTCCGAGTGGCCGACGTTTGGAGTGGGCTGGCCACCAGAA
ProTrpGlnThrPheCysThrSerGluTrpProThrPheGlyValGlyTrpProProGlu
1201 GGTGCTTTTGACTTATCACTAATCGCCGCCGTCAGGCGAATTGTTTTTCAGGAGGAAGGG
GlyAlaPheAspLeuSerLeuIleAlaAlaValArgArgIleValPheGlnGluGluGly
1261 GGTCACCCTGATCAGATCCCCTACATTGTGACCTGGCAGAATCTCGTCCAATTCCCACCT
GlyHisProAspGlnIleProTyrIleValThrTrpGlnAsnLeuValGlnPheProPro
1321 CCGTGGGTCAAGCCTTGGACCCCAAATTCTTCGAAACTGACGGTCGCGGTTGCCCAGTCT
ProTrpValLysProTrpThrProAsnSerSerLysLeuThrValAlaValAlaGlnSer
1381 GATGCAGCCGGAAAGTCCAGCCCGTCAGCTCCCCCCAAGATTTATCCAGAGATTGACGAC
AspAlaAlaGlyLysSerSerProSerAlaProProLysIleTyrProGluIleAspAsp
1441 CTCCTCTGGATGGACTCCCAACCTCCCCCTTATCCCCTGCCCCAGCAGCCACCTGCAGCC
LeuLeuTrpMetAspSerGlnProProProTyrProLeuProGlnGlnProProAlaAla
1501 GCCCCACCACAGGGACCAATAGCGAGAGGGGCTCAGGGACCGGCGGGGGGGACTCGGAGC
AlaProProGlnGlyProIleAlaArgGlyAlaGlnGlyProAlaGlyGlyThrArgSer
1561 CGACGAGGCCGAAGCCCCGGGGAGGAAGGGGGGCCGGATTCAACAGTTGCCTTACCACTT
ArgArgGlyArgSerProGlyGluGluGlyGlyProAspSerThrValAlaLeuProLeu
1621 AGAGCACATGTGGGAGGGCCAGCGCCAGGACCCAATGATCTCATTCCTTTACAGTACTGG
ArgAlaHisValGlyGlyProAlaProGlyProAsnAspLeuIleProLeuGlnTyrTrp
1681 TCTTTTTCCTCTTCTGATTTATATAATTGGAAAACTAACCACCCTCCTTTCTCAGAGAAC
SerPheSerSerSerAspLeuTyrAsnTrpLysThrAsnHisProProPheSerGluAsn
1741 CCCTCTGGGCTTACTGGGCTCCTTGAGTCACTTATGTTCTCCCATCAACCCACTTGGGAT
ProSerGlyLeuThrGlyLeuLeuGluSerLeuMetPheSerHisGlnProThrTrpAsp
1801 GATTGTCAGCAGCTTTTGCAGGTTCTTTTTACCACAGAGGAAAGAGAAAGAATCCTGATG
AspCysGlnGlnLeuLeuGlnValLeuPheThrThrGluGluArgGluArgIleLeuMet
1861 GAGGCGAGAAAAAATGTTCTGGGAGAGGACGGCACACCCACTGCCCTCCCTAACCTCGTG
GluAlaArgLysAsnValLeuGlyGluAspGlyThrProThrAlaLeuProAsnLeuVal
1921 GACGAGGCTTTCCCCTTGAACCGCCCCAACTGGGACTACAACACCGCAGAAGGTAGGGGA
AspGluAlaPheProLeuAsnArgProAsnTrpAspTyrAsnThrAlaGluGlyArgGly
1981 CGCCTCCTTGTCTATCGCCAGACTCTAGTGGCAGGTCTCAGAGGAGCCGCTAGACGGCCC
ArgLeuLeuValTyrArgGlnThrLeuValAlaGlyLeuArgGlyAlaAlaArgArgPro
2041 ACCAATTTGGCTAAGGTAAGAGAGGTCTTGCAGGGGCAGACTGAACCACCCTCAGTCTTC
ThrAsnLeuAlaLysValArgGluValLeuGlnGlyGlnThrGluProProSerValPhe
2101 CTTGAGCGTCTAATGGAGGCATATAGGAGGTACACCCCTTTTGACCCCTTGTCAGAGGGG
LeuGluArgLeuMetGluAlaTyrArgArgTyrThrProPheAspProLeuSerGluGly
2161 CAGAGAGCCGCTGTAGCCATGGCCTTCATTGGTCAGTCCGTTCCCGACATTAAGAAAAAG
GlnArgAlaAlaValAlaMetAlaPheIleGlyGlnSerValProAspIleLysLysLys
2221 CTGCAAAGGCTGGAGGGGCTCCAAGATCATACGCTCCAAGATTTAGTAAAAGAAGCAGAG
LeuGlnArgLeuGluGlyLeuGlnAspHisThrLeuGlnAspLeuValLysGluAlaGlu
2281 AAAGTCTATCATAAGAGGGAAACAGAAGAAGAGAGGCAGGAGAGAGAGAAGAAAGAAATG
LysValTyrHisLysArgGluThrGluGluGluArgGlnGluArgGluLysLysGluMet
2341 GAGGAGAGGGAAAATAGACGGGGATTTCAGGAGAGAAATTTGAGTAAAATTTTGGCCGCA
GluGluArgGluAsnArgArgGlyPheGlnGluArgAsnLeuSerLysIleLeuAlaAla
2401 GTTGTAAATGATAGACAGTCAGGAAAAGGTAAAATAGGGCTCCTGGGCAACAGGGCAGTG
ValValAsnAspArgGlnSerGlyLysGlyLysIleGlyLeuLeuGlyAsnArgAlaVal
2461 AAACCGCCAGGTGGCAGAAAGATACCACTGGAAAAAGACCAATGCACCTATTGCAAAGAG
LysProProGlyGlyArgLysIleProLeuGluLysAspGlnCysThrTyrCysLysGlu
2521 AAAGGACACTGGGCTAGAGATTGCCCTAAAAAACGGGAGCGATCCAAGGTCCTGACCCTA
LysGlyHisTrpAlaArgAspCysProLysLysArgGluArgSerLysValLeuThrLeu
<-gag | pro-pol->
2581 GAAGATGATTAGGGAAGTCGGGGCTCAGACCCCCTCCCTGAGCCTAGGGTAACTTTGTCC
GluAspAspEndGlySerArgGlySerAspProLeuProGluProArgValThrLeuSer
2641 GTGGAGGGGACTCCCGTCAACTTCCTGATAGACACCGGAGCAGAGCATTCAGTACTCACT
ValGluGlyThrProValAsnPheLeuIleAspThrGlyAlaGluHisSerValLeuThr
2701 AGCCCCCTAGGCAAGCTAGGCTCTAAAAAGACCATGGTGATTGGAGCCACTGGTAGTAAA
SerProLeuGlyLysLeuGlySerLysLysThrMetValIleGlyAlaThrGlySerLys
2761 TTTTACCCCTGGACGACCGAACGAGCCCTACAGATAAACAAGAACATAGTGACTCATTCC
PheTyrProTrpThrThrGluArgAlaLeuGlnIleAsnLysAsnIleValThrHisSer
2821 TTCCTGGTGATACCTGAGTGTCCTGCTCCCCTCTTGGGGCGCGATCTGCTAACCAAACTA
PheLeuValIleProGluCysProAlaProLeuLeuGlyArgAspLeuLeuThrLysLeu
2881 AAGGCTCAAGTCCAATTTACTTCAGAAGGCCCACAAGTAAGCTGGGGAAAAGCCCCCGTT
LysAlaGlnValGlnPheThrSerGluGlyProGlnValSerTrpGlyLysAlaProVal
2941 GCCTGCCTTGTCCTCAACACAGAGGAAGAATATCGGTTGCATGAAGAGCAACCCAAAAAT
AlaCysLeuValLeuAsnThrGluGluGluTyrArgLeuHisGluGluGlnProLysAsn
3001 GCAGTCTCTTCAGGCTGGCTAACTGCGTTCCCCAATGTCTGGGCAGAACAAGCAGGAATG
AlaValSerSerGlyTrpLeuThrAlaPheProAsnValTrpAlaGluGlnAlaGlyMet
3061 GGGTTGGCTAAACAAGTGCCTCCGGTTGTGGTAGAACTTAAAGCTGATGCCACCCCCATC
GlyLeuAlaLysGlnValProProValValValGluLeuLysAlaAspAlaThrProIle
3121 TCGGTAAGACAATACCCCATGAGCAAGGAAGCTAGGGAGGGCATCCGGCCTCATATCCAG
SerValArgGlnTyrProMetSerLysGluAlaArgGluGlyIleArgProHisIleGln
3181 AGGTTGCTAGACCAAGGAGTTTTAGTGGCCTGTCAGTCCCCCTGGAATACACCACTTCTG
ArgLeuLeuAspGlnGlyValLeuValAlaCysGlnSerProTrpAsnThrProLeuLeu
3241 CCGGTTCGAAAACCAGGGACCAATGACTATCGCCCAGTGCAAGACCTCCGGGAAGTTAAC
ProValArgLysProGlyThrAsnAspTyrArgProValGlnAspLeuArgGluValAsn
3301 AAAAGGGTCCTGGACATTCACCCCACAGTCCCGAACCCATACAATTTATTAAGCTCTCTC
LysArgValLeuAspIleHisProThrValProAsnProTyrAsnLeuLeuSerSerLeu
3361 CCACCTGAGAGAACATGGTATACAGTCTTGGACTTAAAAGATGCCTTCTTTTGCCTGCGC
ProProGluArgThrTrpTyrThrValLeuAspLeuLysAspAlaPhePheCysLeuArg
3421 TTGCACCCTAAGAGTCAGCTCCTGTTTGCCTTTGAATGGAGGGACCCAGAGGGCGGACAG
LeuHisProLysSerGlnLeuLeuPheAlaPheGluTrpArgAspProGluGlyGlyGln
3481 ACTGGTCAACTAACCTGGACTAGGCTACCACAGGGGTTCAAAAATTCCCCCACCCTGTTT
ThrGlyGlnLeuThrTrpThrArgLeuProGlnGlyPheLysAsnSerProThrLeuPhe
3541 GACGAGGCCCTCCATCGGGATCTCGCGCCTTTTCGTGCTCGAAACCCTCAGCTTACCCTA
AspGluAlaLeuHisArgAspLeuAlaProPheArgAlaArgAsnProGlnLeuThrLeu
3601 CTACAGTATGTGGATGATCTCTTGGTCGCGGCGGCCTCGAAGGAGCTGTGTCACCAGGGA
LeuGlnTyrValAspAspLeuLeuValAlaAlaAlaSerLysGluLeuCysHisGlnGly
3661 ACTGAGAGGCTCCTTGCAGAACTGAGTGACTTGGGGTATCGAGTTTCGGCTAAGAAGGCA
ThrGluArgLeuLeuAlaGluLeuSerAspLeuGlyTyrArgValSerAlaLysLysAla
3721 CAAATTTGTCAAACTGAGGTAACCTACCTGGGGTATACCCTCCGAGGGGGCAAAAGATGG
GlnIleCysGlnThrGluValThrTyrLeuGlyTyrThrLeuArgGlyGlyLysArgTrp
3781 CTCACAGAGGCCCGGAAGAAGACTGTTATGATGATCCCATCGCCAACTACCCCACGGCAG
LeuThrGluAlaArgLysLysThrValMetMetIleProSerProThrThrProArgGln
3841 GTACGTGAGTTTCTGGGGACTGCTGGCTTTTGTAGACTCTGGATTCCAGGCTTTGCAACC
ValArgGluPheLeuGlyThrAlaGlyPheCysArgLeuTrpIleProGlyPheAlaThr
3901 CTAGCAGCACCTCTATATCCTTTGACTAAGGAAGGGTTTCCTTTTGAGTGGAAAGAAGAG
LeuAlaAlaProLeuTyrProLeuThrLysGluGlyPheProPheGluTrpLysGluGlu
3961 CACCAAAGAGCTTTTGAGGCTATCAAGTCGTCTCTAATGACTGCCCCCGCGCTAGCATTA
HisGlnArgAlaPheGluAlaIleLysSerSerLeuMetThrAlaProAlaLeuAlaLeu
4021 CCAGACTTGACTAAGCCTTTCGTCCTATATGTGGACGAGAGAGCGGGTGTAGCCAGGGGA
ProAspLeuThrLysProPheValLeuTyrValAspGluArgAlaGlyValAlaArgGly
4081 GTGTTGACACAAGCACTGGGACCCTGGAAGAGACCTGTAGCCTATTTGTCAAAGAAATTA
ValLeuThrGlnAlaLeuGlyProTrpLysArgProValAlaTyrLeuSerLysLysLeu
4141 GATCCCGTTGCTAGTGGATGGCCCACATGCCTGAAAGCTATTGCGGCAATGGCCCTGCTG
AspProValAlaSerGlyTrpProThrCysLeuLysAlaIleAlaAlaMetAlaLeuLeu
4201 ATCAAAGATGCTGACAAATTGACAATGGGACAACAGGTGACTGTTGTGGCCCCTCATGCC
IleLysAspAlaAspLysLeuThrMetGlyGlnGlnValThrValValAlaProHisAla
4261 TTGGAAAGTATCGTGCGGCAGCCACCTGACAGATGGATGACAAATGCCCGAATGACACAC
LeuGluSerIleValArgGlnProProAspArgTrpMetThrAsnAlaArgMetThrHis
4321 TATCAGAGCTTGCTGCTAAATGAGCGTGTAACCTTTGCGCCCCCTGCCATCCTCAACCCA
TyrGlnSerLeuLeuLeuAsnGluArgValThrPheAlaProProAlaIleLeuAsnPro
4381 GCTACCCTTCTCCCTCTAACAAATGATTCCGTCCCAGTACATCAATGTACAGACATCCTC
AlaThrLeuLeuProLeuThrAsnAspSerValProValHisGlnCysThrAspIleLeu
4441 GCTGAAGAGACTGGGACCAGAAGAGACCTGACTGACCAACCCTGGCCTGGAGCTCCCAGT
AlaGluGluThrGlyThrArgArgAspLeuThrAspGlnProTrpProGlyAlaProSer
4501 TGGTATACGGATGGCAGCAGTTTCCTGATAGAGGGGAAGCGAAAGGCTGGAGCTGCGGTG
TrpTyrThrAspGlySerSerPheLeuIleGluGlyLysArgLysAlaGlyAlaAlaVal
4561 GTGGACGGGAAAAAGGTAATTTGGGCAAGCGCTTTGCCTGAAGGAACGTCGGCACAAAAG
ValAspGlyLysLysValIleTrpAlaSerAlaLeuProGluGlyThrSerAlaGlnLys
4621 GCTGAACTTATAGCACTTATACAAGCCCTCCGAGAGGCTAAAGGTAAGATCGTTAACATC
AlaGluLeuIleAlaLeuIleGlnAlaLeuArgGluAlaLysGlyLysIleValAsnIle
4681 TACACTGACAGCCGCTATGCTTTTGCTACCGCACACATCCATGGGGCCATCTACAGGCAG
TyrThrAspSerArgTyrAlaPheAlaThrAlaHisIleHisGlyAlaIleTyrArgGln
4741 CGAGGGCTATTGACTTCGGCTGGTAAAGACATTAAAAACAAAGAAGAAATTCTGGCCCTG
ArgGlyLeuLeuThrSerAlaGlyLysAspIleLysAsnLysGluGluIleLeuAlaLeu
4801 TTGGAAGCCATACATGCACCTAAGAAGGTAGCCATCATCCACTGCCCCGGCCACCAAAGA
LeuGluAlaIleHisAlaProLysLysValAlaIleIleHisCysProGlyHisGlnArg
4861 GGAGAAGACTTGGTGGCCAAGGGCAACCGAATGGCAGACTCAGTGGCAAAACAAGTTGCT
GlyGluAspLeuValAlaLysGlyAsnArgMetAlaAspSerValAlaLysGlnValAla
4921 CAAGGGGCCATGATCTTAACTGAAAAAGGTGATCCACCCAAAAGCCCTGAGGATGAGAGG
GlnGlyAlaMetIleLeuThrGluLysGlyAspProProLysSerProGluAspGluArg
4981 TATAACATAAAAGAGCTATTGTGGACCAGTGATCCCCTCCCATACTTTTTTGAAGGGAAA
TyrAsnIleLysGluLeuLeuTrpThrSerAspProLeuProTyrPhePheGluGlyLys
5041 ATAGAATTGACTCCCGAAGAAGGAATAAAATTTGTGAAAGGACTACACCAATTCACCCAC
IleGluLeuThrProGluGluGlyIleLysPheValLysGlyLeuHisGlnPheThrHis
5101 CTGGGAGTTGAAAAAATGATGAGACTAATTAAGAATTCCCGATACCAAGTCCCCAACCTG
LeuGlyValGluLysMetMetArgLeuIleLysAsnSerArgTyrGlnValProAsnLeu
5161 AAGTCAGTGGCTCAAAAGATTATAGACTCCTGCAAACCATGTGCATTCACTAATGCGACT
LysSerValAlaGlnLysIleIleAspSerCysLysProCysAlaPheThrAsnAlaThr
5221 AAAGCCTACAAAGAACCTGGAAAGAGACAACGGGGAGACCGTCCTGGAGTGTATTGGGAG
LysAlaTyrLysGluProGlyLysArgGlnArgGlyAspArgProGlyValTyrTrpGlu
5281 GTAGATTTTACTGAAGTTAAACCTGGAATGTATGGTAACAAGTATCTGTTAGTATTTGTA
ValAspPheThrGluValLysProGlyMetTyrGlyAsnLysTyrLeuLeuValPheVal
5341 GACACTTTTTCAGGATGGGTTGAGGCGTTTCCCACTAAAACTGAGACTGCCCAGATTGTG
AspThrPheSerGlyTrpValGluAlaPheProThrLysThrGluThrAlaGlnIleVal
5401 GCCAAGAAGATCCTTGAAGAAATCCTGCCAAGATTTGGAATCCCTAAGGTAATCGGGTCC
AlaLysLysIleLeuGluGluIleLeuProArgPheGlyIleProLysValIleGlySer
5461 GATAATGGACCAGCCTTTGTTGCCCAGGTAAGTCAGGGCTTGGCCACTCAGTTGGGCATC
AspAsnGlyProAlaPheValAlaGlnValSerGlnGlyLeuAlaThrGlnLeuGlyIle
5521 GATTGGAAATTACACTGTGCTTACCGCCCTCAAAGCTCAGGACAGGTAGAGAGGATGAAT
AspTrpLysLeuHisCysAlaTyrArgProGlnSerSerGlyGlnValGluArgMetAsn
5581 AGGACCTTAAAAGAGACCTTGACTAAATTAGCCATTGAGACCGGCGGGAAAGACTGGGTG
ArgThrLeuLysGluThrLeuThrLysLeuAlaIleGluThrGlyGlyLysAspTrpVal
5641 GCTCTCCTTCCTCTTGCGCTCTTCCGAGCCCGAAACACCCCTGGACGTTTCGGGCTCACT
AlaLeuLeuProLeuAlaLeuPheArgAlaArgAsnThrProGlyArgPheGlyLeuThr
5701 CCTTTTGAAGTTCTGTATGGAGGACCTCCCCCTTTAATGGAAGCTGGTGGAACATTGGTT
ProPheGluValLeuTyrGlyGlyProProProLeuMetGluAlaGlyGlyThrLeuVal
splice acceptor site |
5761 TCCGGCTCTGACCCTGTCTTACCCTCCTCTTTGCTTATTCATTTAAAGGCCCTAGAAGTG
SerGlySerAspProValLeuProSerSerLeuLeuIleHisLeuLysAlaLeuGluVal
5821 ATTAGGACCCAGATTTGGGACCAACTGAAGGCAGCCTATACCCCAGGGACCACCGCAGTA
IleArgThrGlnIleTrpAspGlnLeuLysAlaAlaTyrThrProGlyThrThrAlaVal
5881 CCCCACGGGTTCCGAGTTGGAGATAAAGTCTTGGTCAGACGGCATCGAACCGGCAGCCTC
ProHisGlyPheArgValGlyAspLysValLeuValArgArgHisArgThrGlySerLeu
5941 GAGCCACAGTGGAAGGGACCCTATTTGGTGTTACTGACAACCCCTACTGCGGTAAAAGTC
GluProGlnTrpLysGlyProTyrLeuValLeuLeuThrThrProThrAlaValLysVal
|env->
6001 GACGGGATTGCCTCCTGGATCCACGCCTCCCACGTCAAGAGGGCCGCAAGTCAAGATGAA
AspGlyIleAlaSerTrpIleHisAlaSerHisValLysArgAlaAlaSerGlnAspGlu
MetLys
6061 GAAAACCATGAAGACAATTGGACAGTGGCAGCCACTGACAATCCTCTTAAGCTTCGTTTG
GluAsnHisGluAspAsnTrpThrValAlaAlaThrAspAsnProLeuLysLeuArgLeu
LysThrMetLysThrIleGlyGlnTrpGlnProLeuThrIleLeuLeuSerPheValCys
<-pol|
6121 TGCCGCAGGCGCCACCCTGAGCCTAGGGAACCATAACCCTCATGCTCCAATTCAACAGTC
CysArgArgArgHisProGluProArgGluProEnd
AlaAlaGlyAlaThrLeuSerLeuGlyAsnHisAsnProHisAlaProIleGlnGlnSer
6181 TTGGGAAGTGCTTAATGAGGAGGGAAACATTGTGTGGGCAACCACTGCAGTCCATCCCCT
TrpGluValLeuAsnGluGluGlyAsnIleValTrpAlaThrThrAlaValHisProLeu
6241 CTGGACTTGGTGGCCTGATCTCACACCTGACATCTGTAAGTTAGTGGCAGGATCCACCAA
TrpThrTrpTrpProAspLeuThrProAspIleCysLysLeuValAlaGlySerThrLys
6301 ATGGGACCTCCCTGATCATACCGATCTTAGTAACCCACCCCCTGAAGAGCGGTGTGTCCC
TrpAspLeuProAspHisThrAspLeuSerAsnProProProGluGluArgCysValPro
6361 AAACGGGATAGGGAGCACATATTGGTGTTCGGGGCAGTTTTACCGAGCTAATCTTAGAGC
AsnGlyIleGlySerThrTyrTrpCysSerGlyGlnPheTyrArgAlaAsnLeuArgAla
6421 TGCACAATTTTATGTTTGCCCTGGTCAGGGTCAGAGCAAAAGGCTTCAACGAGAATGTGG
AlaGlnPheTyrValCysProGlyGlnGlyGlnSerLysArgLeuGlnArgGluCysGly
6481 AGGGGCATCAGATTACTTTTGTGGTAAATGGACATGTGAAACGACAGGGGAAGCTTACTG
GlyAlaSerAspTyrPheCysGlyLysTrpThrCysGluThrThrGlyGluAlaTyrTrp
6541 GAAGCCCTCCTCTGACTGGGACCTAATCACGGTAAAACGAGGAAGTGGCTATGATAGGTC
LysProSerSerAspTrpAspLeuIleThrValLysArgGlySerGlyTyrAspArgSer
6601 AAACGAAGGAGAAAGAAACCCCTATAAATATCCAGAGAATGGGTGCGCTTTTAAAAACAG
AsnGluGlyGluArgAsnProTyrLysTyrProGluAsnGlyCysAlaPheLysAsnSer
6661 CCCCCCAGGACCATGCAAAGGTAAATACTGCAACCCCCTACTTATAAAGTTCACCGAGAA
ProProGlyProCysLysGlyLysTyrCysAsnProLeuLeuIleLysPheThrGluLys
6721 AGGGAAACAACACCGTCTAAGTTGGCTTAAAGGAAATAGGTGGGGTTGGCGAGTATACCT
GlyLysGlnHisArgLeuSerTrpLeuLysGlyAsnArgTrpGlyTrpArgValTyrLeu
6781 TCCACTAAGAGATCCTGGGTTCATTTTCACGATCAGGCTGACAGTGAGAGACCTGGCGGT
ProLeuArgAspProGlyPheIlePheThrIleArgLeuThrValArgAspLeuAlaVal
6841 GACACCTGTTGGGCCCAACAAGGTCCTTATAGAACAGGGCCCCCCAGTCGTACCGGCTCC
ThrProValGlyProAsnLysValLeuIleGluGlnGlyProProValValProAlaPro
6901 CCCAAAGGTCCCAGCCGTACCAGCTCCACCAACTCCACAGCCCAACATAGTGGTACCCTC
ProLysValProAlaValProAlaProProThrProGlnProAsnIleValValProSer
6961 CCTAGGGACTAATACTCCCCTCATAAAGCCTACCTTGGCTTCCCCACCGCCCCTAGGTAC
LeuGlyThrAsnThrProLeuIleLysProThrLeuAlaSerProProProLeuGlyThr
7021 AGAGGACCGTCTGGTCAGTCTACTCCAGGGAGCTTTTTTAGCTTTAAATAGAACTAACCC
GluAspArgLeuValSerLeuLeuGlnGlyAlaPheLeuAlaLeuAsnArgThrAsnPro
7081 TAATATGACTCAATCATGCTGGTTATGCTATACCTCTAGCCCCCCTTATTATGAAGGAAT
AsnMetThrGlnSerCysTrpLeuCysTyrThrSerSerProProTyrTyrGluGlyIle
7141 AGCTCAGATCAGGACTTATAATATTACTTCAGATCATTCTCAATGTCTTTGGGGAGAAAA
AlaGlnIleArgThrTyrAsnIleThrSerAspHisSerGlnCysLeuTrpGlyGluAsn
7201 CAGAAAGTTGACTCTGGCAGCAGTTTCAGGAAGAGGGCTTTGTTTGGGCCAGGTACCTCA
ArgLysLeuThrLeuAlaAlaValSerGlyArgGlyLeuCysLeuGlyGlnValProGln
7261 GGATAAAGGGCACCTCTGTAATCAGACCCAGAACATCCAGTCTAGCAAAAGTGGTCAGTA
AspLysGlyHisLeuCysAsnGlnThrGlnAsnIleGlnSerSerLysSerGlyGlnTyr
7321 TCTAGTGCCCCCCTTAGACACAGTATGGGCTTGCAATACCGGTCTCACTCCTTGTGTGTC
LeuValProProLeuAspThrValTrpAlaCysAsnThrGlyLeuThrProCysValSer
7381 TATGTCTGTTTTTAATAGTTCCAAAGATTTCTGCATTTTGGTTCAGCTTATTCCTAGACT
MetSerValPheAsnSerSerLysAspPheCysIleLeuValGlnLeuIleProArgLeu
7441 CCTGTATCATGATGATAGCTCCTTTTTAGACAAATTTGAGCGTTGGGTCCGCTGGAGAAG
LeuTyrHisAspAspSerSerPheLeuAspLysPheGluArgTrpValArgTrpArgArg
7501 AGAGCCCGTTACCCTAACTTTGGCAGTTCTATTAGGATTAGGAGTAGCGGCTGGAGTAGG
GluProValThrLeuThrLeuAlaValLeuLeuGlyLeuGlyValAlaAlaGlyValGly
7561 TACAGGAACCGCTGCCTTAATTAAGACCCCCCAATACTATGAAGAACTACGTGCAGCTAT
ThrGlyThrAlaAlaLeuIleLysThrProGlnTyrTyrGluGluLeuArgAlaAlaMet
7621 GGATGTTGATCTTAGAACTATAGAACAGTCTATAACCAAATTAGAAGAATCTTTAACTTC
AspValAspLeuArgThrIleGluGlnSerIleThrLysLeuGluGluSerLeuThrSer
7681 CCTGTCCGAAGTGGTGCTACAGAAAGGAAGGGGATTAGACTTATTATTCCTTAAAGAAGG
LeuSerGluValValLeuGlnLysGlyArgGlyLeuAspLeuLeuPheLeuLysGluGly
37741 AGGACTCTGTGCTGCCCTAAAAGAAGAATGTTGTTTTTATGTTGACCATTCAGGAGTAAT
GlyLeuCysAlaAlaLeuLysGluGluCysCysPheTyrValAspHisSerGlyValIle
7801 CAAAGATTCTATGGCCAAACTTAGAGAACGCCTAGATATACGTAAAAGAGAAAGAGAAAG
LysAspSerMetAlaLysLeuArgGluArgLeuAspIleArgLysArgGluArgGluSer
7861 CCAACAAGGATGGTTCGAAAGCTGGTTTAATAAGTCCCCTTGGCTCACCACTCTCCTCTC
GlnGlnGlyTrpPheGluSerTrpPheAsnLysSerProTrpLeuThrThrLeuLeuSer
7921 CACCATAGCAGGACCTTTGATTACTCTTATGCTTTTGCTTACTTTTGGCCCCTGCATCCT
ThrIleAlaGlyProLeuIleThrLeuMetLeuLeuLeuThrPheGlyProCysIleLeu
7981 TAATAAGTTAGTAGCTTTTATTAGAGAAAGGATAAATGCAGTACAGGTTATGGTACTAAG
AsnLysLeuValAlaPheIleArgGluArgIleAsnAlaValGlnValMetValLeuArg
<-env|
8041 GCAACAATATCGGGTCCTTCAGGAGGTTGAAAACTCGCTCTAAGATTAGAGCTATTTCCT
GlnGlnTyrArgValLeuGlnGluValGluAsnSerLeuEnd
PPT |U3->
8101 AAAAAGAGTGGGGAATGAAGAATAAAAGTTACTGAACTCTTCCTCACCCCAGAACTCGAC
8161 CCCTTCCATCTAGAGAGTGTTCCCAGAACACTCCTGAACTCTTCACCCTAGAATGCATTC
8221 CTGAACTCCTCACCCTAGAGTTCGAACCCTCCCAACTAAAGACTGTTCCAAGAACATTTT
8281 TGAGATAAGGGCCTCCTGGAACAACCTCAGAATAAACCGGGTACATTGCCAAATAATAGG
8341 ACATGACCCCTTAGTTACGTAGAATCCCTTGGCAGAACCCCTTGTCCCTTGGCAGAACCC
promoter
8401 CTTAGTTATGTAAACTTGTACTTTCCCTACCCCGCTCTCCCGCCTTGAGTTTTCCTATAT
<-U3||R->
8461 AAGCCTGTGAAAAATTTGGCTGGTCGTCGATTCTCCTCTACACCACTAGGTGTATGAGTT
inverse palindrome polyA signal
8521 TCGACCCCAGAGCTCTGGTCTATGTGCTTTCATGCTGCTGCTTTATTAAATCTTGCCTTC
pol motif
<-R||U5->
8581 AACATTTTGAGTTCGGTCTCAGTGTCTTCTTGGGTCCGCGGCTGTCCCGAGGCTTGAGTG
<-U5|
8641 AGGGTCTCCCTTCGGGGGTCTTTCA