header

This page provides a list of data sets generated from our research projects that can be viewed and downloaded. Some of the data has been published, while some not yet. If you are interested in using them for further global analysis please firstly contact us [Dr. Jianbing Yan]. We will be pleased to share the data for any specific gene or region analysis.

*****We also keep a synchronized copy of all these resources on [Baidu Cloud], for there may be failure connections when downloading*****

 

Genetic Resources

*****We also keep a synchronized copy of all these resources on [Baidu Cloud], for there may be failure connections when downloading*****

  • Maize is an ideal crop for association mapping due to its great genetic diversity and rapid linkage disequilibrium (LD) decay.Successful association mapping of a species requires firstly the creation of a desirable germplasm collection that reflects genetic diversity, extent of LD decay and genetic relatedness in a population, which determine the mapping resolution and power. Generally, germplasm collections need to encompass adequate genetic diversity to cover most variations for the traits of interest.

    • We have assembled a global germplasm collection with more than 1,000 maize elite inbred lines released from the major temperate and tropical/subtropical breeding programs of China, CIMMYT and Germplasm Enhancement of Maize (GEM) project in US. Totally 527 inbred lines assayed by genome-wide single nucleotide polymorphism (SNP) markers are listed below, of which the adaptation data obtained from field experiments is also available. RNA sequencing was performed on 368 lines[368 diversity inbred lines and pop-structure][0.1 MB] of these 527 lines using kernels harvested 15 days after pollination. We also provide 7 SSRs [SSR information and related PCR results] to detect the whole AMP lines.


Related papers:
Characterization of a global germplasm collection and its potential utilization for analysis of complex quantitative traits in maize.

Genetic analysis and characterization of a new maize association mapping panel for quantitative trait loci dissection.


MODEM: multi-omics data envelopment and mining in maize (2016).

Besides, we have developed 11 RIL and one BC2F6 populations, each containing 200 families. All the families were genotyped with 50K Maize SNP array [50K SNPs for 12 populations download][18 MB] and phenoytped in at least 6 locations. More RIL populations are under development and will be phenoytped in more locations .

Linkage maps [maps download] for above 12 populations were constructed based on SNPs from the 50K array [raw data] with the software CARTHAGENE, and the recombination dynamics along the maize genome was invesgated. [download for software, code and guide]


Related papers:
Genome-wide dissection of the maize ear genetic architecture using multiple populations (2016).

Genome-wide recombination dynamics are associated with phenotypic variation in maize (2016).

Genotypic Data:

*****We also keep a synchronized copy of all these resources on [Baidu Cloud], for there may be failure connections when downloading*****

  • The RNA-seq project (in collaboration with CAAS, BGI and CAU) has generated more than 3.6 million SNPs for the 368 diverse inbred lines. The genotypic data of each line is released below, containing high quality SNPs (missing rate less than 0.6) combined with SNPs from MaizeSNP50 BeadChip (totally more than 1.06 million)[Download].


Related papers:
Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels (2013).

RNA sequencing reveals the complex regulatory network in the maize kernel (2013).

To increase the power of association analysis, we imputed high density genotype [MAF<5% filtered, 66.3MB] to the whole 513 panel based on an integrated IBD and KNN model.


Related paper:
Genome Wide Association Studies Using a New Nonparametric Model Reveal the Genetic Architecture of 17 Agronomic Traits in an Enlarged Maize Association Panel (2014).

In recently, we created an new integrated variant map with much more higher density (1.25M SNPs with MAF≥0.05) and enlarged panel size (n=540), through combining genotypes from previous RNA-sequencing and 50K array with newly identified from high-density array (600K) and GBS technology. This dataset was applied in re-mapping the eQTL landscape for maize kernel, and would provide a great resource for future genetic studies. Due to the big size of related files, the finally merged genotyping set (with hapmap format) together with separately raw ones genotyped from different strategies could be available at Here [1.25M with 540 size].


Related paper:
Distant eQTLs and non-coding sequences play critical roles in regulating gene expression and quantitative trait variation in maize (2016).

We also genotyped some RILs (close to publish listed here only):


Zong3 X 87-1:

[261 SSR markers for the 294 RILs][156KB].

[261 SSR markers for the 441 crosses][232KB].

[Genotypes and map information of the 3,184 bins for the 256 RILs][1.64MB].


Related papers:
Genetic basis of grain yield heterosis in an “immortalized F2s” maize population (2014).

Performance prediction of F1 hybrids between recombinant inbred lines derived from two elite maize inbred lines (2012).

Expression data of 368 lines:

*****We also keep a synchronized copy of all these resources on [Baidu Cloud], for there may be failure connections when downloading*****

  • The expression level in DAP15 kernel of the 368 association panel was quantified based on RNA-seq. Read counts for each gene were calculated and scaled according to RPKM. After RPKM normalization, all genes with a median expression level larger than zero for each sample were included, and the overall distribution among 368 lines of expression levels for each gene is normalized using a normal quantile transformation.

[Normalized(final) expression data of whole 368 panel][19.0 MB].

[Normalized(rpkm) expression of Genes][49.7 MB].

[Normalized(rpkm) expression of Transcript][78.1 MB].


Related papers:
RNA sequencing reveals the complex regulatory network in the maize kernel (2013).


Distant eQTLs and non-coding sequences play critical roles in regulating gene expression and quantitative trait variation in maize (2016).


MODEM: multi-omics data envelopment and mining in maize (2016).

PAN-transcriptome related:

*****We also keep a synchronized copy of all these resources on [Baidu Cloud], for there may be failure connections when downloading*****

  • The above described the deep RNA-seq of the 368 inbred lines (DAP15 kernel). Novel transcripts were de novo assemblied based on the preferred modified “assemble-then-align” strategy. After filtering and cluster steps, 2355 reference novel genes(transcripts) were obtained for the 368 panel, which were applied in further association mapping and estimation the maize pan-size. A paper has been submitted and the novel sequences (fasta format) and their annotation could be available below:

[novel gene sequences (2355)][570 KB].

[Annotation of Novel sequences][185 KB].

[novel genes variation (2355)][167 KB].

  • Additionally, we investigated the extreme variation at the transcript level by analyzing above RNA-seq data. We have identified almost one-third (13,443) nuclear genes under expression presence and absence variation (ePAV) in maize. The ePAV genes (dispensable transcriptome) are further shown to undergo different genetic mechanism and regulation roles compared with core expression genes, which tend to be much more regulated by distant eQTLs and likely to be functional as regulatory roles. We thus believe this new identified "markers" might be useful in your further specific studies, which could be availability below.

[ePAV variation (13443)][1.48 MB].

  • Further, the kernel metabolome (including 616 metabolic traits) and 17 agronomic phenotypes were used to explore the genetic architecture of pan-transcriptome and thus to give us a much more general evaluation of phenotypic contribution from dispensable expressed genes and novel ones. Importantly, the ePAV states, not the genomic variation, have been demonstrated to be valuable effective markers and easy to interpret, to make respective advantages complementary to SNP-GWAS studies in understanding the genome regulatory complexity and for applications in quantitative trait loci (QTL) cloning. All the association mapping results are provided here:

[PAN GWAS details][436 KB].


Related paper:
Maize pan-transcriptome provides novel insights into genome complexity and quantitative trait variation (2016).

Phenotypic Data:

*****We also keep a synchronized copy of all these resources on [Baidu Cloud], for there may be failure connections when downloading*****

  • The association panel and RILs were planted in multiple locations as follows: Honghe autonomous prefecture, Yunan province; Sanya, Hainan province; Wuhan, Hubei province; Ya’an, Sichuan province in Year 2010 and Chongqing; Hebi, Henan province; Nanning, Guangxi province; Kunming, Yunnan province in Year 2011; ranging from 18 to 35 degrees north latitude, from 102 to 114 degrees east longitude. The phenotypic data (including agronomic, metabolic and grain quality traits) is listed below.

[Agronomic traits(blup) of association panel][117 KB].

[Amino acids traits of association panel][102 KB].

[Metabolic traits (Experiment-1 of association panel)][380 KB].

[Metabolic traits (Experiment-2 of association panel)][526 KB].

[Metabolic traits (Experiment-3 of association panel)][508 KB].

[Metabolic traits (RIL-B73 X BY804)][249 KB].

[Metabolic traits (RIL-Zong3 X 87-1)][237 KB].


Related papers:
Genome Wide Association Studies Using a New Nonparametric Model Reveal the Genetic Architecture of 17 Agronomic Traits in an Enlarged Maize Association Panel (2014).


Metabolome-based genome-wide association study of maize kernel leads to novel biochemical insights (2014).


Genomic, Transcriptomic, and Phenomic Variation Reveals the Complex Adaptation of Modern Maize Breeding (2015).


Maize pan-transcriptome provides novel insights into genome complexity and quantitative trait variation (2016).


MODEM: multi-omics data envelopment and mining in maize (2016).

We also penotyped some RILs (close to publish listed here only):


By804 X B73:

A maize recombinant inbred line population (By804/B73) which was derived from a cross between normal line B73 and high-oil line By804 was planted at Huazhong Agricultural University field experiment station (Wuhan, E 109°51', N 18°25') in 2013. Phenotypes including seven agronomic traits (plant height (PH), ear height (EH), length of ear leaf (LL), width of ear leaf (LW), tassel length (TL), tassel branch number (TBN) and fresh shoot biomass) as well as 79 metabolic traits (profiled from three tissue types by using GC-MS) were measured and the dataset is provided below. Relevant analyses and results based on the data were integrated into a manuscript which has been submitted recently.

[79 metabolic traits (from three tissues) & 7 agronomic traits][421KB].


Related paper:
Genetic Determinants of the Network of Primary Metabolism and Their Relationships to Plant Performance in a Maize Recombinant Inbred Line Population (2015).


Zong3 X 87-1:

[Phenotypes of the 294 RILs and 441 crosses][178KB].

Ear traits on ROAM population (10 RILS):


A maize ROAM (Random-open-parent Association Mapping) population was set of 10 independent recombinant inbred line (RIL) populations, i.e., B73×BY804, KUI3×B77, K22×CI7, DAN340×K22, ZHENG58×SK, YU87-1×BK, ZONG3×YU87-1, DE3×BY815, K22×BY815 and BY815×KUI3. These ten RIL populations were planted in eight trials during the summer and winter of 2011 and 2012 in five locations in China with one random-block replication per location. Four ear traits, Ear length (EL), Ear row number (ERN), Ear weight (EW) and Cob weight (CW), were measured and the dataset of the best linear unbiased prediction values is provided below. Relevant analyses and results based on this data could be found below.

[4 ear traits from ROAM population][124 KB].


Related paper:
Genome-wide dissection of the maize ear genetic architecture using multiple populations (2016).

Software and packages:

*****We also keep a synchronized copy of all these resources on [Baidu Cloud], for there may be failure connections when downloading*****

  • Anderson-Darling (A-D) test


A new method called A-D test was developed to Genome-wide association studies (GWAS). One open R package [including manual and sample data] could be used easily to make the A-D test for GWAS, which is available here:

[R package for ADGWAS (Windows)][391 KB].

[R package for ADGWAS (Linux)][379 KB].


Related paper:
Genome Wide Association Studies Using a New Nonparametric Model Reveal the Genetic Architecture of 17 Agronomic Traits in an Enlarged Maize Association Panel (2014).

  • Random-Open-parents Association Mapping (ROAM)


These R scripts are designed for ROAM, a new proposed multi-parental population for large scale genetic analysis, including genotype imputation, projection, bin extraction, kinship calculation, joint linkage mapping (JLM) and genome-wide association study (GWAS) analyses. The related paper is submitted.

[R source scripts][10.8 MB].


Related paper:
Genome-wide dissection of the maize ear genetic architecture using multiple populations (2016).

  • Genetic Map Construction Software (HDGenMap)


    HDGenMap is designed for genetic map construction based-on carthagene.

    [HDGenMap.zip][3.14 MB].


    Related paper:
    Genome-wide recombination dynamics are associated with phenotypic variation in maize (2016).

  • Protocols from Maizego lab:

    *****We also keep a synchronized copy of all these resources on [Baidu Cloud], for there may be failure connections when downloading*****

    • Single Tetrad-stage Microspore Sequencing in Maize


    The protocol from isolating single tetrad-stage microspores to single cell sequencing is available here:

    [maize single tetrad-stage microspore sequencing protocol][296 KB].


    Related paper:
    Dissecting meiotic recombination based on tetrad analysis by single-microspore sequencing in maize (2015).