From Noelle.Cockett usu.edu Tue Jan 29 18:58:56 2013
From: Noelle Cockett <Noelle.Cockett usu.edu>
(by way of Jill Maddox <jillian.maddox alumni.unimelb.edu.au>)
To: Multiple Recipients of <sheepmodels animalgenome.org>
Subject: ISGC - Minutes of the January 14, 2013 meeting
Date: Tue, 29 Jan 2013 18:58:56 -0600
Dear all,
Thank you for your participation at the ISGC meeting held on January 14,
2013 in San Diego. Here are the minutes of the meeting, prepared by Brian
Dalrymple, John McEwan and myself.
Best wishes, Noelle
International Sheep Genome Consortium January 14, 2013 - San Diego, CA
(Plant and Animal Genome XXIth Meeting)
Welcome: Noelle Cockett (Utah State University) and Brian Dalrymple (CSIRO
Animal, Food and Health Sciences) welcomed around 40 people in attendance.
Ovine Parentage SNP Chips: John McEwan (AgResearch) reported that three
parentage chips are being developed (AgResearch, CSIRO, USDA/ARS) and all
have common SNPs selected from the Illumina Ovine SNP50 BeadChip.
Mike Heaton (USDA/ARS) indicated that he had selected 168 SNP with MAF >
.30 from the SNP50 chip for inclusion on the USDA/ARS parentage chip. He
also used the resequencing dataset of 75 animals to annotate sequence
surrounding those SNP.
A motion was made and approved that the ISGC supports the development of
this parentage chip, which will be "branded" as an ISGC product.
Ovine Custom SNP Chips: John reported on the ovine LD chip that was
produced in 2011 by Illumina as a custom chip. A two-stage selection of the
SNP resulted in 91-96% accuracy for imputation to the SNP50 chip tested
across 5,000 animals. He also reported on the Bovine LD+6K ovine "MooBaa"
chip, which was produced in 2012 and should work across sheep and cattle.
The ovine content is the same as the previous ovine LD chip. He indicated
that 30% of all New Zealand born 2012 rams will have come from flocks using
SNP technology through either direct genotyping or imputation. Typically
only 5-10% of candidate rams are tested. He indicated that designs of these
chips are freely available on request.
Ovine HD SNP Chip: John reported that development of the HD SNP chip, which
is an ISGC project, is almost completed. The bulk of the SNP were
identified from the 75 sheep that have been sequenced to 10X coverage at
BCM-HGSC (see Table 1 for breeds of sheep included in the resequencing).
From these data, over 32.1 million SNP were identified using the SNP
pipelines of BCM-HGSC and DPI. In addition, DPI examined the.bam files and
found good concordance with the SNP50 chip genotypes on the resequenced
animals (all have been typed with the SNP50 chip).
Over 18.7 million SNP had MAF > 0.1, with the minor allele in at least two
animals and of those, approximately 6 million SNP have sequence on at least
one side of the SNP. This last group of SNP was sent to Illumina for
screening for the HD chip. The HD chip will also include 30K functional
SNP, 50K from the SNP50 chip, 1K other functional SNP (mutations), and 20K
that match a proposed GBS protocol. The SNP selected from the 6 million SNP
will be equally spaced and will have a range of MAF, which is a slightly
different strategy than what was used on the cattle chip which selected SNP
based on MAF in various subspecies and breeds.
Cindy Lawley (Illumina) said that the ovine HD SNP will be referred to as
"Illumina sheep HD SNP chip". The bulk of the SNP have been selected from
the 6M SNPs sent by John and Illumina is now ready to start production.
January 31 is the deadline for John to submit the list of other SNP and
Illumina expects to ship the HD chip in early May. As far as ordering chips
using the consortium price, they first extended the September, 2012
deadline to December 31, 2012 because they expected to have the SNP list by
January 1, 2013. Because construction of the HD SNP chip is slightly
delayed, Illumina will extend the deadline to order under the consortium
price to January 24, 2013.
Sequences from the 72 animals are in the short read archives of NCBI. The
.bam and .vcf files are available to members of the consortium by
contacting James Kijas (james.kijas csiro.au). The files are not to be
distributed beyond the initial receipients until the full release of the
data. The .vcf files have been filtered but not "heavily" and include a
full set of SNP, a subset of the SNP that have been agreed upon to have
further analysis and a subset of the SNP that are being considered for the
Illumina HD SNP chip. At the time of the release of the annotation by
Ensembl (probably mid-September, 2013), the release will include the
individual animal .bam/.vcf files as tracks.
Update on Ovine SNP50 BeadChip: John reported that around 40,000-50,000
animals have been genotyped with the Illumina Ovine SNP50 BeadChip to date.
A survey of the literature indicated about 22-25 publications using data
from the SNP50 chip.
Cindy indicated that Illlumina has sufficient bead pool for another run of
the SNP50 chip using the12-bead pool format. Therefore, they have no plans
to re-synthesize the oligos or change to a different format.
It was noted that the SNP50 chip has ovine Oar v1.0 SNP coordinates and
position assignments. Illumina is willing to update the position locations
to Oar v3.1 but recommends that the SNP names remain the same. Brian will
send the Oar v3.1 coordinates for the SNP on the SNP50 chip to Illumina.
Ovine whole genome assembly: Brian Dalrymple provided a summary of Oar v3.1
whole genome assembly. As presented last year, there were gaps in GC-rich
regions in Oar v2.0 so strategies for sequencing across these regions were
implemented at BCM-HGSC and Roslin using the male Texel animal. These
efforts have improved the GC-rich regions but there may still be some
issues with methylation analyses. Although an updated version of the
assembly (Oar v4.0) won't be released until late 2014, patches will be
released but they won't include these more "global" changes.
Brian said Jiang Yu has analyzed the 72 resequences for CNV and deletions.
Jiang used five continuous 200 bp windows and required presence of the CVN
or deletion in at least five animals. The results of this analysis will
contribute to the refinement of the assembly of cnv regions in Oar v4.0.
Jiang also mapped all contigs in Oar v3.1 to the bovine assembly, then
looked at every place where the two species differed. Are these a true
difference between cow and sheep or are they a problem with the assemblies?
Users of the assembly will need to make their own decision on which
assembly is the issue. Almost 1/3 of the differences occur on the X
chromosome, but that chromosome has been an issue for both assemblies.
A call has gone out for RNA seq data that will be used for annotation. Cut-
off date to send these data to Brian is January 31, 2013 and the files are
best transferred by FTP. Ensembl will add in sequences that are publicly
available but Brian needs to let them know where these sequences are
located.
Brian will send all collected RNA data from CSIRO to Ensembl on a hard-
drive. Ensembl believes it will take six months to finish annotation (so
should be completed around September, 2013). Thibaut Hourlier (Ensembl,
Sanger Institute) said that Ensembl will develop gene models using pooled
data across all sequences within a tissue and then have a single "tissue"
track which will include the number of tracks that contributed to that
variant. Esembl can return .bam files to submitters and also, contributors
can request that .bam files be released/not released.
PacBio project: Kim Worley (BCM-HGSC) reported that most of the gaps in the
ovine whole gene assembly are small (40% or 50,000 are in the first peak)
and are most often found in repeats at scaffold ends. She will be using
PBJElly (developed by BCM-HGSC) to fill captured reads, as well as any
high-throughput sequencing platform although the approach is designed for
PacBio sequencing. Improvements in the process will ensure longer and more
reads. A grant proposal has been submitted to the USDA/AFRI Tools and
Resources program area. ISGC has committed another $100,000 to this
project. If completed, the resulting data will have 7X coverage, the contig
N50 will be doubled or tripled, and the number of the gaps in the assembly
will be reduced by half.
Epigenomic analysis: Chris Couldrey (AgResearch) described an epigenomic
analysis which requires both whole genome sequence (provided by the ovine
assembly) and gene expression (RNA seq) data, as well as miRNA and DNA
methylation (representational bisulfate sequencing) data. When finished,
the information will be added to the annotation of the whole genome
assembly and would allow a look at methylation levels at individual CpG
sites. A major consideration is how to put the results across multiple
tissues/animals together and how to annotate. Ensembl has visualization
ways, probably a "track", but to include it in Ensembl, the data will need
to be publicly available.
Contribution of 3SR to the whole genome assembly: Huw Jones (3SR project)
indicated that the 3SR project is scheduled to end in October, 2013. The
contribution of 3SR to the whole genome assembly has been revised from the
original proposal. The revised plan is separated into three areas. First,
samples from CNV animals will be selected by James Kijas (CSIRO) and sent
to INRA to include in their analysis of CNV funded under the CNV work plan.
Second, 3SR funds will be used for targeted BAC sequencing of ~500 BAC in
the next 3-4 months. Brian will select the BACs based on gaps/comparative
analysis and have the BAC clones sent directly from the CHORI 234 library
to Roslin which will do the sequencing. A test run using a 96-well plate
will be done first and based on those results additional BACs will be done.
Selected clones will include regions that are functionally interesting to
3SR as well as problems in the assembly. Third, Roslin Institute will
obtain RNA seq from about 20-30 tissues across a Texel ram, a Texel ewe, a
Texel ewe-lamb and a whole 16-d embryo (see attached Table 2). Some pooling
of tissues or animals in a lane will be done utilizing bar coding.
Expression profiles will be made available through the http://biogps.org
site. There is some concern on how to do the assembly of the transcripts
from the RNA seq data since the annotation isn't done. It was noted by
Garth Brown, the NCBI representative, that NCBI can also assemble from the
RNA seq reads.
Analysis of the ovine X chromosome: Wan-Sheng Liu (Penn State University)
presented results from BTAY and BTAX done in his lab under a USDA/AFRI
grant, as well as a proposed pathway for developing information for the X
and Y chromosomes in sheep.
Sheep Genomes Project: Noelle Cockett described a USDA/AFRI Tools and
Resources grant proposal that will result in a project to develop a
resequencing database that includes extensive annotation of variants. The
database will include sequence data from the 75 animals already sequenced
at BCM-HGSC and an additional 25 animals that will be sequenced using ISGC
funding at BCM-HGSC, as well as exome sequencing by BCM-HGSC from 145
animals (current pricing, requested funding in the proposal). Sequence and
variant data will be available through NCBI.
Publication of the whole genome assembly paper: Brian led a discussion on
the assembly paper, with a central question on whether to publish now or
hold off for more annotation and biological stories to add with it. There
is some concern with a delay in publishing because Esembl requires a
"released" sequence. Also, there may be hesitation for people to use the
assembly because it's not published. The bar is going up for "strong"
publications on whole genome sequences and therefore the scope of the sheep
paper will need to be high. Unfortunately, key individuals haven't
coordinated a large effort to get this "big story" organized. Ensembl says
there will be some statistics on the assembly that could be added in a few
weeks.
It was decided that biologically interesting stories, such as mothering
ability, reproduction function which seems different from cattle and goat,
wool, X chromosome, litter size, etc., would be added to the publication.
Brian/Jiang will circulate the draft paper in late January/early February.
People should indicate their interest of contribution by the end of
February. A deadline for submitting sections was set as the end of March.
Ensembl will "share" the results before the full annotation and then the
details can be updated closer to date of publication (like was done for
swine). Provisional submission date for the manuscript to the journal is
August 2013.
Alan indicated that companion papers will be able to "link" to the paper up
to 6 months after publication of the main paper Genome Biology might be a
good target for the main paper.
ENCODE effort: Alan Archibald (Roslin) indicated that an ENCODE project for
farmed/companion animals has been proposed. The project would combine gene
biology and variation information. The results would be put it in the
ENCODE browser and linked to the whole genome assemblies of each species.
Alan suggested that the focus be on target tissues (e.g. musco-skeletal,
immune tissues), limited assays (e.g. DNaseI, FAIREseq, histone markers,
methylation, etc.) Samples would be shared across participants and would
likely be cells (transformed, primary cells, iPS cells) and then the raw
data would be shared across the group. A good way to visualize the data
will be important.
Plans are to 1) generate a white paper on the project, 2) develop a data
management strategy, 3) review/promote ENCODE experimental protocols, 4)
develop/review cell line resources, 5) develop communications strategy, 6)
establish a EU-US Biotechnology Working Party (close to PAG XXII).
Other projects: Jennifer Thompson (Montana State University) reported that
she had access to sheep reproduction lines that have been closed since
1988. She is developing a proposal to perform selection sweeps across the
population and conduct RNA seq on reproductive tissues from selected
animals. Jennifer is seeking collaborators.
The meeting ended at 3:00 p.m.
Table 1
Animal Identifier Breed Animal_ID Contributor
BGE2 Bangladeshi BGE2 Faruque Mdomar
BGE4 Bangladeshi BGE4 Faruque Mdomar
GAR14 Garole GAR14 Faruque Mdomar
40 Indian Garole GAR4 Vidya Gupta
CHA02 Changthangi CHA02 Jorn Bennenwitz
CHA05 Changthangi CHA05 Jorn Bennenwitz
IXX.3178 Garut GUR4 Herman Raadsma
IXX.3530 Garut GUR5 Herman Raadsma
I99.1574 Sumatra SUM2 Herman Raadsma
I99.1595 Sumatra SUM7 Herman Raadsma
ZB08 Northern Tibetan ZB08 Kui Li
ZD11 Eastern Tibetan ZD11 Kui Li
NA_02_NO TAG Namaqua Afrikaner NQA11 James Kijas
RDA_99_007 Ronderib Afrikaner RDA2 James Kijas
RDA_017034 Ronderib Afrikaner RDA4 James Kijas
WD_032101 African White Dorper AWD1 Miika Tapio
WD_032122 African White Dorper AWD3 Miika Tapio
M01 Ethiopian Menz EMZ1 Miika Tapio
KR4 Karya KR4 Ibrahim Cemal
CC50 Cine Capari CC50 Ibrahim Cemal
SZ3 Sakiz SKZ1 Ibrahim Cemal
SZ6 Sakiz SKZ4 Ibrahim Cemal
NZ1 Norduz NDZ1 Ibrahim Cemal
NZ4 Norduz NDZ4 Ibrahim Cemal
1i Turkish Awassi AWT1 Ibrahim Cemal
3i Turkish Awassi AWT2 Ibrahim Cemal
Afsh-032 Afshari AFS32 Henner Simianer
Afsh-033 Afshari AFS33 Henner Simianer
KK3 Karakas KRS3 Ibrahim Cemal
KK7 Karakas KRS5 Ibrahim Cemal
T7 Cheviot CHVA1 Steve Bishop
896 Cheviot CHVC1 Steve Bishop
Bl Salz SALA1 Luis V. Monteagudo Ibáñez
Neg Salz SALA2 Luis V. Monteagudo Ibáñez
40 Salz SALC1 Luis V. Monteagudo Ibáñez
BSI3 Santa Inês BSI3 Samuel Paiva
BSI4 Santa Inês BSI4 Samuel Paiva
BMN3 Morada Nova BMN3 Samuel Paiva
BMN4 Morada Nova BMN4 Samuel Paiva
GCN4 Gulf Coast native GCN4 Noelle Cockett
GCN5 Gulf Coast native GCN5 Noelle Cockett
BCS1 Brazilian Creole BCS1 Samuel Paiva
BCS3 Brazilian Creole BCS3 Samuel Paiva
131-5g Ovis canadensis OCAN1 Dave Coltman
131-6g Ovis canadensis OCAN2 Dave Coltman
26783 Ovis dalli ODAL1 Dave Coltman
26761 Ovis dalli ODAL2 Dave Coltman
FIN1 Finnsheep FIN1 Juha Kantanen
FIN4 Finnsheep FIN4 Juha Kantanen
CHU01 Churra CHU1 Juan Jose Arranz
CHU02 Churra CHU2 Juan Jose Arranz
64 Ovis canadensis OCAN3 Stefan Hiendleder
CAS 01_ULE Castellana CAS1 Juan Jose Arranz
CAS 03_ULE Castellana CAS3 Juan Jose Arranz
OJA04 Ojalada OJA4 Juan Jose Arranz
OJA05 Ojalada OJA5 Juan Jose Arranz
SWA03 Swiss White Alpine SWAN3 Cord Droegemueller
SWA04 Swiss White Alpine SWAN4 Cord Droegemueller
SWA 27 Swiss White Alpine SWAA27 Cord Droegemueller
SWA 29 Swiss White Alpine SWAA29 Cord Droegemueller
VBS02 Valais Blacknose VBS2 Cord Droegemueller
SMS02 Swiss Mirror SMS2 Cord Droegemueller
604343 Meat Lacaune LAC1 Carole Moreno
604299 Milk Lacaune LAC84 Carole Moreno
1560 Merino MERC1 Kristen Nowak
398 Merino MERA1 Kristen Nowak
SNP1_040129 Poll Dorset PD454 James Kijas
SNP2_951 0000488312 Merino MER454 James Kijas
SNP3_9009 Awassi AW454 Herman Raadsma
SNP4_113/05 Texel TEX454 John McEwan
SNP5_180/05 Romney ROM454 John McEwan
SNP6_590771 Scottish Blackface SBF454 Steve Bishop
1 Dollgellau Welsh Mountain DWM1 Denis Larkin
10 Welsh Hardy Speckled Face WHSF1 Denis Larkin
20 Tregaon Welsh mountain TWM1 Denis Larkin
Table 2 - full table not included as xlsx attachment
Tissues are:
Ram
Brain cerebellum
Brain cerebrum
Brain frontal lobe
Brain stem
Hypothallamus
Pituitary gland
Cornea
Lens
Optic nerve
Retina
Sclera
Abomasum mucosa
Caecum
Colon
Duodenum
Omentum
Rectum
Rumen
Larynx
Oesophagus
Pharynx
Salivary gland parotid
Thyroid gland
Tongue dermis
Tongue muscle
Tonsil
Aorta
Atrium
Ventricle
Adrenal gland
Kidney cortex
Kidney medulla
Liver
Lung
Lymph node mesenteric
Lymph node prescapular
Testes
Testes epididymis
Bladder
Diaphragm
Pancreas
Fat subcutaneous
Fat kidney
Muscle long dorsal
Muscle biceps
Muscle intercostal
Skin back
Spleen
Packed blood cells
Ewe dam
Tissue
Cerebellum
Cerebrum
Frontal lobe
Brain stem
Hypothalamus
Pituitary
Cornea
Lens
Optic nerve
Retina
Sclera
Cervix
Mammary gland
Corpus luteum
Ovary
Placenta + membranes
Uterus
WHOLE EMBRYO
Abomasum
Caecum
Colon
Duodenum
Omentum
Peyer's patch
Rectum
Rumen
Larynx
Oesophagus
Pharynx
Salivary gland
Thyroid gland
Tongue dermis
Tonsil
Aorta
Heart atrium
Heart vertricle
Adrenal gland
Kidney cortex
Kidney medulla
Liver
Lung
Lymph node mediastinal
Lymph node mesenteric
Lymph node prescapular
Bladder
Diaphragm
Pancreas
Fat subcutaneous
Fat kidney
Tongue muscle
Muscle long dorsal
Muscle biceps
Muscle intercostal
Skin side
Spleen
Packed blood cells
Lamb
Tissue
Cerebellum
Cerebrum
Frontal lobe
Brain stem
Hypothalamus
Pituitary gland
Cornea
Lens
Optic nerve
Retina
Eye membrane (not sclera)
Cervix
Mammary gland
Ovarian follicles
Ovary
Uterus
Abomasum
Caecum
Colon
Duodenum
Omentum
Peyer's patch
Rectum
Rumen
Pharynx
Oesophagus
Larynx
Salivary gland parotid
Thyroid gland
Tongue dermis
Tongue muscle
Tonsil
Aorta
Heart atrium
Ventricle
Adrenal Gland
Kidney cortex
Kidney medulla
Liver
Lung
Lymph node mediastinal
Lymph node mesenteric
Lymph node prescapular
Bladder
Diaphragm
Pancreas
Fat kidney
Fat subcutaneous
Muscle long dorsal
Muscle biceps
Muscle intercostal
Skin back
Packed blood cells
Spleen
|