Research Programme
| Arabidopsis Functional Genomics Resources Final Report for grant 208/IGF12438 1. Objectives There were two main objectives in this work: 1. To sequence the insertion sites of dSpm and Ds-gene trap transposons in Arabidopsis and make sequences and seeds of lines available to the public. |
||||||||||||||||||||
2. Progress towards goals Two transposon populations were sequenced: the dSpm population created and described by Tissier et al (1999) and a new population of Ds-Gene Trap lines (Sundaresan et al, 1995) created in the linked grant FGT11378. Both transposon populations offered advantages and together provide a reasonable number of potential insertion sites. The advantages of the dSpm lines include their availability and widespread use, and the Ds-GT lines provide the means to determine gene expression patterns at the cellular level and are in the Landsberg erecta (Ler) background, which is the next most commonly used ecotype next to Columbia, in which most other gene insertion lines have been made. |
||||||||||||||||||||
| 2.1 dSpm line sequencing.
At the start of the project these lines (called the SLAT lines) comprised approximately 48,000 lines in 1200 pools of 50 plants, each with a single dSpm insertion. Our original intention was to use the method described by Tissier et al to identify PCR fragments from individual lines in the pools of 50. This method used polyacrylamide gels to display inverse-PCR (I-PCR) products amplified from the ends of elements. These PCR products are cut from the gel, re-amplified and sequenced. Although this works well, it could not be scaled appropriately to meet the throughput demands of the project. We also found that tracking samples was difficult due to distortion of lanes on the gel. The output of this sequence analysis also only identified a sequenced insertion line in a pool of 50, from which users would have to grown approximately 200 individuals and PCR these to identify the single line of interest. We therefore changed the strategy of this part of the work (see interim progress reports to GARNET Steering Committee June 8 2001 and Sept 2001) to increase efficiency of sequencing and to provide a single line with a sequenced insertion for users. The new strategy which was implemented after Sept 2001 involved growing seedlings from each of the 1200 pools and selecting approximately 20-30 single lines per pool using BASTA and R7402 spraying. This method removed lines still containing the transposase source, guaranteeing stability of the dSpm insertion. When prototyping this new single seed descent method we noted that 80% of the 1200 pools contained at least one line that had inherited the transposase. We also noted a reduction in complexity of the SLAT that had occurred during bulking that further emphasised the need for generating single seed descent lines. The lines resulting from this selection were called SM lines. |
||||||||||||||||||||
Table 1. Progress in sequencing dSpm lines
|
||||||||||||||||||||
| The insertion sites in the SM lines, which arose primarily from two T-DNA inserts on chromosome 1, were distributed fairly evenly across euchromatic regions of the 5 chromosomes (see Figure 3), showing dSpm insertions are essentially random with respect to chromosomal location. At the sequence level dSpm insertions have a marked preference for G+C- rich regions, resulting in about 60% of insertions being in exonic sequences. Apart from the initial difficulties in characterising single insertion lines, the main problems encountered were the low (about 65%) efficiency of generating sequence from the amplified I-PCR product, and the redundancy in the lines caused by pooling and amplification. To achieve the required sequencing throughput all steps were performed in 96-well format. DNA preparations used Qiagen kits to purify DNA from inflorescence tissue extracted using a mixer mill. Different strategies were used to increase the efficiency of the key steps, which were PCR amplification of insertion sites and sequencing the PCR products. Various enzymes were used to cleave DNA before ligation, but none worked better than the original combination of Mse1 and Bfa1. Purification of I-PCR products prior to sequencing proved to be a difficult step to implement on large scale; we used a combination of shrimp alkaline phosphatase and exonuclease (SAP-EXO) to digest primers and sort PCR products prior to sequencing. Different sequence primers were used to obtain better quality sequence. Once optimised each step was integrated into a standard procedure that was then deployed on a large scale. Often the procedures broke down- for example obtaining consistent results with PCR cleanup, and the efficiency of sequencing itself were not consistent. A final reduction in overall efficiency resulted from matching flanking sequences to the genome sequence. About 15% of the sequences were of too low quality to obtain sufficient high scoring pairs (HSPs). Analysis of sequences is described in section 2.3. If time and finances permitted a second round of amplification from lines not yielding useable sequence would likely yield improved results. However, the redundancy in the pools resulted in nearly 16,000 useable sequences yielding approximately 7,000 unique insertions, suggesting another approach might be to identify pools with the most unique sequences and re-sample these. This option was not pursued in order to attempt to complete sequencing the Ds populations, which do not suffer from redundancy problems. | ||||||||||||||||||||
2.2 Ds line sequencing The outputs of the sister project FGT11378 included generating a new population of 26,000 gene trap lines in ecotype Ler. The insertion sites of transposon insertions in these lines were sequenced in this proposal. Progress towards the overall objective of sequencing 30,000 lines is described in Table 2. |
||||||||||||||||||||
Table 2:
Progress in sequencing Ds lines
|
||||||||||||||||||||
As described in the final report of grant FGT11378, production of Ds-GT lines was slower than expected due to the need to change the procedure for generating transposants to a less efficient but proven procedure. This delayed initiation of sequencing and as shown in Table 2 we still need to sequence about 40 96-well plates of GT DNA and complete about 20 96-well DNA preps and sequence these. Note we only sequence one end (usually the 5' end) of the Ds element. This work is still in progress and should be completed by May 2004. Hopefully this will take the total sequenced inserts up to approximately 12,000 verified insertions. We encountered persistent difficulties in two stages of the process, as shown by the current overall 40% success rate of generating verified insertions. The first problem involved "crashes" of the TAIL-PCR procedure, in which we suddenly failed to generate discrete PCR products. These failures, which usually destroyed batches of 384 PCR products, were traced to problems with synthesis and storage of the Adaptor Primers, which are highly degenerate. Different suppliers were tested and problems with each were encountered from time to time. Another cause of failed TAIL-PCR was due to trace contaminants in water due to use of glass bottles previously used for bacterial cultures. The second major problem involved inconsistent results using the SAP-EXO cleanup procedure of TAIL-PCR products, which caused an unacceptably low rate of sequencing success. This has been overcome by using 96 well spin columns. Other problems affecting overall performance were the occasional failure of PCR machines and sequence runs. We still encounter problems obtaining reasonable TAIL-PCR results, but we can now achieve up to 70% sequence success rates. To reach our original objectives we still need to complete sequencing 60 96 well plates. This work is continuing, although at a slower rate, and should be completed by May 2004. Overall we hope to generate insertions in approximately 7,000 genes with Ds transposons. |
||||||||||||||||||||
2.2 Database development. The second objective of this project was to develop a database to integrate insertion site sequences with genome features and to aid distribution of lines. This work involved collaboration with Dr Lincoln Stein at CSHL and was partly funded by the EC project PlaNET, and has resulted in the ATIDB database (www.atidb.org). This is run from BITS at Harpenden on a new server (funded by PlaNET) specified to cope with the increased demand for the ATIDB database by incorporating fast processors, 2 giga-bytes of RAM, and redundant disk and power supplies. The database structure uses the generic genome browser GMOD and an underlying MySQL database. Insertion sequences are retrieving from public resources using bespoke PERL scripts. Insertion sequences were fixed to the Arabidopsis genome (TIGR v4 annotation) using a new algorithm which utilized multiple HSPs from BLASTN. This algorithm is described in the document (http://atidb.org/README.html). The algorithm determines the most likely insertion point on the Arabidopsis genome and assigns a hit quality score. It was also possible to determine the orientation of the insert using this process. Genome features displayed include alternate transcripts, predicted ORFs and annotation. Links to NASC are provided for those lines bulked and submitted to NASC. |
||||||||||||||||||||
| Line Availability all lines are available from NASC, and these can be accessed through AtIDB. ![]() Delivery record. Approximately 80,000 lines (a Volvo-full) delivered to NASC 4th May 2004 |
||||||||||||||||||||
| |
||||||||||||||||||||