Explaining by way of example: The flat file of acceptor splice sites, with information about putative polypyrimidine (PPT) and U2 Branch Point (BP) sequences, contains entries of the form: >IDB1072296.1230 GB_MAP: IDB1072296 = A06939.1 (1..5322) PROD: furin AGEZ: 27 ROI: 1181..1265 -> -50..34 AG: -50, -47, -29, -2, 3, 35, PPT: -42..-33, -23..-5, 5..14, U2BP: -31 [5.49], -24 [5.2], 11 [4.58], SEQ1: agaaggcaCTCTGTGCCTgacagctgaCCCTACCTTCCCTGTCCCCacag SEQ2: tgagCCACTCATATggctacgggcttttggacgcag END Where: >IDB1072296.1230 gives the altExtron identifier for the gene (IDB1072296), with the acceptor (3') splice site position (1230) representing the position of the final nulceotide in an intron GB_MAP: IDB1072296 = A06939.1 (1..5322) is the mapping to the GenBank entry from which this gene was derived. "A06939" is the accession; ".1" is the version; and "(1..5322)" is the position. In the case that the entry is on the complement strand in GenBank (always sense in altExtron) then the mapping will be set as: complement( .. ). PROD: furin is the gene product as best as we are able to parse it from the GenBank flat files. AGEZ: 27 is the AG "Exclussion Zone", being the number of nucleotides upstream from the end of the intron to the first AG other than the acceptor splice site itself. This parameter is of interest because it helps to define the region in which we expect to find the branch point and PPT. ROI: 1181..1265 -> -50..34 is the "Region of interest", being arbitarilly defined as being from 3 AG's upstream of the splice site to 2 AG's downstream. The first set of co-ordinates give this region in terms of the gene overall, ane the second set give this region relative to the splice site (with no position 0). AG: -50, -47, -29, -2, 3, 35, are the relative positions of the AG nts in the ROI, including the splice site itself at -2. PPT: -42..-33, -23..-5, 5..14, describes the positions of pyrimidine rich tracts that we take as putative PPTs. U2BP: -31 [5.49], -24 [5.2], 11 [4.58], are the positions of putative U2 Branch Point Sequences. The number in the [] brackets is the bit score generated for this site based on a weight matrix analysis. SEQ1: agaaggcaCTCTGTGCCTgacagctgaCCCTACCTTCCCTGTCCCCacag is the intronic sequence part of the ROI with the putative PPTs in upper case. SEQ2: tgagCCACTCATATggctacgggcttttggacgcag is the exonic sequence part of the ROI. END is a tag helpful in file parsing that indicates end of record Full details of the PPT and BP prediction methods may be found in the associated publications. In the above example the AGEZ is 27, and as such we can expact the branch point to be no further than about 10-15 nts upstream of this AG, otherwise we would expect this AG to be used for splicing. We see that there is a putative PPT from -5 to -23, and then a putative BP at -24, and another at -31. We thus suppose that this PPT is correctly identified, but have two candidate BP positions.