Accelerating Pine Genomics Mississippi Genome Exploration Laboratory

102 Million base pairs of publicly-available sequence has been generated through APG!

ABI 3730xl Capillary Sequencing

As part of DBI-0421717, we have generated random genomic and Cot-filtered moderately repetitive (M) and single/low-copy (S) sequence libraries.  A high copy Cot (H) library also was constructed, but sequencing of this library proved unfruitful.  Clones were sequenced using ABI 3730xl DNA analyzers.  Sequences are available through the National Center for Biotechnology Information (NCBI) and can be accessed via NCBI's GenBank using the following steps:

Alternatively, FASTA files containing the sequences can be downloaded via the links in the table below.

  DNA source Number of reads Mean read length (bp)a Total base pairs GenBank Accession String Downloadable data foldersb  
  Genomic (random)c 1,007 876 881,730 ET181630:ET182636 [ACCN]  
  Cot-filtered moderately repetitive (M) 266 442 117,631 ET182637:ET182902 [ACCN]  
  Cot-filtered single/low-copy (S) 2,328 213 495,667 ET182903:ET185230 [ACCN]  
  TOTAL 3,601   1,495,028 ET181630:ET185230 [ACCN]    

a After trimming
b Each folder contains all trimmed reads in a single FASTA file.
c Paired end reads with significant overlap (i.e., representing a continuous sequence) were assigned a single GenBank accession.  Consequently, while there were 1,007 genomic reads, there are only 600 GenBank entries.

454/Roche Pyrosequencing

A collaboration with John E. Carlson (Penn State) allowed us to sequence uncloned kinetic components and genomic DNA using the 454/Roche Applied Sciences GS20 platform. The 454 sequencing, performed with the permission of the NSF and requiring only minor re-budgeting, has resulted in production of > 100 Mb of pine genomic sequence, i.e., 18 times more sequence than proposed in the original funded version of DBI-0421717. This bonus sequence has provided many new opportunities and challenges. To facilitate timely characterization of these sequences, we have developed an automated "Sequence Read Classification Pipeline" (see Publications and Bioinformatics Tools).  All 454 sequence data has been archived in the NCBI Short Read Archive.  It can also be obtained below.

DNA source Number of reads Mean read length (bp)a Total base pairs Short Read Archive Accession Downloadable data foldersb
  Genomic (random) 275,038 102 28,038,360 SRX001948  
  Cot-filtered highly repetitive (H) 216,921 97 21,029,350 SRX001949  
  Cot-filtered moderately repetitive (M) 206,402 97 20,017,474 SRX001950  
  Cot-filtered single/low-copy (S) 102,708 93 9,544,980 SRX001951  
  Cot-filtered theoretical single-copy (T)c 215,387 101 21,801,502 SRX001952  
  TOTAL 1,016,456   100,431,666      

a After trimming
b Each folder contains sequence files in FASTA format and their corresponding quality files.
c A "T" sequence is isolated from any DNA that remains single-stranded at 0.1*theoretical Cot value for single-copy DNA as predicted from genome size (see Sequence Names for further explanation).

The sequence data from both ABI 3730xl and 454 synthesis sequencing should facilitate many aspects of pine genomics including detailed characterization of pine repeat sequences, e.g., see GenomeWeb News article.

Please feel free to contact us if you have any questions.

*This material is based upon work supported by the National Science Foundation under Grant No. DBI-0421717.  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.