DNA Sequences - Accelerating Pine Genomics - APG - NSF DBI-0421717 - Daniel G. Peterson - Pinus taeda - loblolly pine - pine genome - National Science Foundation

LOBLOLLY PINE DNA SEQUENCES

102 Million base pairs of publicly-available sequence has been generated through APG!

ABI 3730xl Capillary Sequencing

As part of DBI-0421717, we have generated random genomic and Cot-filtered moderately repetitive (M) and single/low-copy (S) sequence libraries. A high copy Cot (H) library also was constructed, but sequencing of this library proved unfruitful. Clones were sequenced using ABI 3730xl DNA analyzers. Sequences are available through the National Center for Biotechnology Information (NCBI) and can be accessed via NCBI's GenBank using the following steps:

From the table below, select and copy the GenBank Accession String corresponding to the sequences you are interested in obtaining, e.g., ET181630:ET182636 [ACCN].
Go to the GenBank Genome Survey Sequence Database (dbGSS) by clicking here.
Paste the GenBank Accession String into the "Search for" box at the top of the dbGSS Homepage.
Click on the "Go" button. The sequence files should appear.
Select a display style from the "Display" pull-down menu. You can sort the sequence files and download the sequence data using the "Sort by" and "Send to" pull-down menus, respectively.

Alternatively, FASTA files containing the sequences can be downloaded via the links in the table below.

DNA source	Number of reads	Mean read length (bp)^a	Total base pairs	GenBank Accession String	Downloadable data folders^b
Genomic (random)^c	1,007	876	881,730	ET181630:ET182636 [ACCN]	PT_7GC.zip
Cot-filtered moderately repetitive (M)	266	442	117,631	ET182637:ET182902 [ACCN]	PT_7MC.zip
Cot-filtered single/low-copy (S)	2,328	213	495,667	ET182903:ET185230 [ACCN]	PT_7SC.zip
TOTAL	3,601		1,495,028	ET181630:ET185230 [ACCN]

^a After trimming
^b Each folder contains all trimmed reads in a single FASTA file.
^c Paired end reads with significant overlap (i.e., representing a continuous sequence) were assigned a single GenBank accession. Consequently, while there were 1,007 genomic reads, there are only 600 GenBank entries.

454/Roche Pyrosequencing

A collaboration with John E. Carlson (Penn State) allowed us to sequence uncloned kinetic components and genomic DNA using the 454/Roche Applied Sciences GS20 platform. The 454 sequencing, performed with the permission of the NSF and requiring only minor re-budgeting, has resulted in production of > 100 Mb of pine genomic sequence, i.e., 18 times more sequence than proposed in the original funded version of DBI-0421717. This bonus sequence has provided many new opportunities and challenges. To facilitate timely characterization of these sequences, we have developed an automated "Sequence Read Classification Pipeline" (see Publications and Bioinformatics Tools). All 454 sequence data has been archived in the NCBI Short Read Archive. It can also be obtained below.

DNA source	Number of reads	Mean read length (bp)^a	Total base pairs	Short Read Archive Accession	Downloadable data folders^b
Genomic (random)	275,038	102	28,038,360	SRX001948	PT_7G4.zip
Cot-filtered highly repetitive (H)	216,921	97	21,029,350	SRX001949	PT_7H4.zip
Cot-filtered moderately repetitive (M)	206,402	97	20,017,474	SRX001950	PT_7M4.zip
Cot-filtered single/low-copy (S)	102,708	93	9,544,980	SRX001951	PT_7S4.zip
Cot-filtered theoretical single-copy (T)^c	215,387	101	21,801,502	SRX001952	PT_7T4.zip
TOTAL	1,016,456		100,431,666

^a After trimming
^b Each folder contains sequence files in FASTA format and their corresponding quality files.
^c A "T" sequence is isolated from any DNA that remains single-stranded at 0.1*theoretical Cot value for single-copy DNA as predicted from genome size (see Sequence Names for further explanation).

The sequence data from both ABI 3730xl and 454 synthesis sequencing should facilitate many aspects of pine genomics including detailed characterization of pine repeat sequences, e.g., see GenomeWeb News article.

Please feel free to contact us if you have any questions.


	APG Only