È il più grande avanzamento delle conoscenze nella genetica da quando, nel 2000, è stato aperto il libro del Dna, con la pubblicazione della sequenza di tre miliardi di lettere che costituiscono l’alfabeto della vita, il cosiddetto “Progetto genoma umano”. Adesso, a questa enciclopedia dell’uomo è stato aggiunto un corposo aggiornamento. Da poche ore è online, liberamente accessibile a tutti in 24 articoli su riviste del gruppo “Nature”, la mappa dell’attività dei geni che orchestra la comparsa delle malattie. Si va dal diabete all’ipertensione, dall’artrite all’Alzheimer, dal cancro alle malattie autoimmuni. Tutte le informazioni sono frutto del programma di ricerca denominato “Epigenome Roadmap”, diretto dall’ US National Institutes of Health.
I genetisti negli ultimi tempi hanno, infatti, compreso che ciascuno individuo non è la semplice sommatoria algebrica dei propri geni. Non è vero che i geni sono capaci di spiegare il funzionamento degliorganismi viventi e le differenze tra le specie. Non da soli, almeno. Esistono altri livelli di controllo, altri linguaggi, non necessariamente rappresentati dalle sequenze di Dna. Gli scienziati hanno iniziato a decodificarli. E hanno battezzato questo nuovo vocabolario “Epigenoma”. Una vera e propria “sinfonia nelle cellule dell’organismo”, la definisce Nature.
In pratica, come un direttore d’orchestra armonizza i singoli musicisti creando un insieme perfettamente integrato, allo stesso modo nelle cellule sono necessari dei meccanismi di controllo dell’attività dei geni. Che devono funzionare in gruppi, in stretta collaborazione tra loro, scambiandosi continuamente informazioni. Spesso differenti a seconda del tipo di cellula, o se una cellula è sana o malata.
“Quasi tutte le cellule del corpo umano hanno la stessa sequenza di Dna, e gli stessi geni. Ma allora – scrive Nature -, cosa differenzia una cellula cardiaca da una cerebrale? Una cellula sana da una difettosa? Lecellule utilizzano il loro codice genetico in modi differenti, a seconda della funzione che svolgono nell’organismo. Proprio come un’orchestra – sottolinea la rivista britannica – che può eseguire un brano musicale in molti modi diversi. La combinazione dei cambiamenti nell’espressione genica di una cellula è ciò che si definisce il suo epigenoma”. Lo spartito, quindi, è sempre lo stesso, ma le note e il modo in cui sono suonate cambiano da cellula a cellula. E, ad esempio, al variare della funzione di un tipo di cellula uno stesso gene può essere, di volta in volta, acceso o spento.
“Un team internazionale di ricercatori – scrive Nature – ha messo insieme 111 diversi epigenomi, provenienti da differenti tipi di cellule umane, inclusi tutti i principali organi, le cellule del sistema immunitario, le staminali embrionali e le cosiddette staminali pluripotenti indotte (cellule adulte ringiovanite, per renderle capaci di trasformarsi nelle diverse parti del corpo, ndr).
I genetistihanno, in particolare, studiato i rocchetti proteici attorno ai quali è avvolta la doppia elica di Dna, per poter essere impacchettata all’interno del nucleo delle cellule. Se tutto il Dna umano fosse srotolato e disteso, infatti, raggiungerebbe una lunghezza di due metri. Questi rocchetti, formati da proteine chiamate istoni, sono come degli interruttori che, a seconda del tipo di cellula, possono determinare se e quando accendere o spegnere certi geni.
“Lo studio degli epigenomi di tessuti umani sani e malati – si legge in un editoriale di Nature – può fornire informazioni cruciali per collegare le variazioni genetiche alle malattie. Affrontare le malattie basandosi solo sulle informazioni fornite dal genoma è stato, infatti, finora come lavorare con una mano legata dietro la schiena. Nel suo insieme – conclude Nature – questo programma di ricerca dimostra quanto l’epigenoma di una cellula sia complesso e splendidamente arrangiato. Proprio come una sinfonia di Beethoven”.DavidePatitucci-f.q.-18 febbraio 2015
Epigenomics: Roadmap for regulation
Epigenomics is the study of the key functional elements that regulate gene expression in a cell.
Epigenomes provide information about the patterns in which structures such as methyl groups tag DNA and histones (the proteins around which DNA is packaged to form chromatin), and about interactions between distant sections of chromatin.
They also contain information about regulatory elements in DNA itself: both those that lie in the promoter region immediately upstream of where a gene’s transcription begins, and those in distant enhancer sequences.
The ENCODE Project1 aimed to catalogue the regulatory elements in human cells, studying the epigenomic signatures of cells grown in culture. The Roadmap Epigenomics Project2, 3, 4, 5, 6, 7, 8, 9 builds on this by analysing samples taken directly from human tissues andcells — embryonic and adult, diseased and healthy (Fig. 1).
The researchers have linked these epigenomic data to the corresponding genetic information, producing reference epigenomes for 127 tissue and cell types.
The result is a representation of how epigenomic elements regulate gene expression in the human body.
Casey E. Romanoski & Christopher K. Glass
All the cells in the body contain essentially the same genome, and arise from the progeny of a single fertilized egg. How does each cell type interpret this common set of instructions to achieve its specific identity? The Roadmap Epigenomics Project has tackled this question by defining the epigenomic signatures of a broad spectrum of human tissues and cells undergoing crucial developmental transitions (for an overview2, see page 317). Collectively, these papers and the associated data sets provide an unprecedented resource forunderstanding relationships between cells and tissues, and for delineating how cell-specific programs of gene expression are achieved.
Only about half of the approximately 25,000 protein-coding genes that make up the mammalian genome are expressed in any given cell type. Although many of these genes are required for general functions and are ubiquitously expressed, others are active in only one or a few cell types, or exhibit different patterns of regulation from cell to cell. A remarkable achievement of the ENCODE Project was the use of epigenomic signatures to infer the existence of hundreds of thousands of enhancer-like regions in the mammalian genome that regulate gene expression at long range. From this vast palate, each cell type is regulated by a subset of perhaps 20,000–40,000 enhancers, which determine its particular gene-expression profile.
Enhancers are activated through interactions with transcription factors, which recognize and bind to specific DNA sequences withinthe enhancer region. Bound transcription factors recruit co-regulators, many of which deposit or remove modifications on histones. The way in which each cell type interprets genomic information is therefore closely linked to the organization of its DNA regulatory elements. Enhancers that are active in cell-type-specific epigenomic signatures are typically highly enriched in DNA sequences to which lineage-determining and signal-dependent transcription factors bind. Therefore, the delineation of a particular cell’s active enhancer repertoire provides a powerful means of predicting the transcription factors required for that cell’s identity. By extension, changes in epigenomic signatures during developmental transitions reflect activation or inhibition of such factors.
Four of the papers in this issue2, 3, 4, 5 exploit these relationships to identify combinations of transcription factors that might define different cell types during development. Ziller et al.4 (page 355) modelledneuronal development in vitro, by generating six lineages of neuronal progenitors from embryonic stem (ES) cells, which give rise to almost every cell type of the body. The authors developed computational models to predict the transcription factors that bind to core neural-differentiation enhancers, as well as those that bind enhancers of distinct neural lineages only.
Tsankov et al.5 (page 344) studied the sets of transcription factors that bind to promoters and enhancers in the first three cell lineages that differentiate from ES cells. Sequences bound by transcription factors in one of the three lineages exhibited molecular modifications that promote gene expression, such as loss of DNA methylation. By contrast, the same DNA regions exhibited repressive modifications in the other two cell types. Both Ziller et al. and Tsankov et al. found that regulatory elements controlling genes that are essential for cellular identity are often also epigenetically modified in parental cells,highlighting the importance of existing regulatory landscapes and stage-specific expression of transcription factors for defining the developmental potential of cells.
Some major caveats should be noted. These studies are based on analysis of cell populations, and therefore miss potentially crucial aspects of cellular variability within populations. When tissues are examined, enhancer landscapes represent the composite of the cell types that make up that tissue, not a pure cell population. Studies10, 11 of different populations of white blood cells called macrophages suggest that the tissue environment can shape enhancer landscapes, emphasizing the value of studying purified cell populations from in vivo sources. Finally, although the DNA sequences found in cell-specific enhancers provides clues to the identities of the transcription factors that regulate enhancer activation, functional roles must be validated experimentally. The Roadmap Epigenomics Project has made some effortsalong these lines, but the large number of hypotheses generated by the current papers means that this step is largely left for future work.
Hendrik G. Stunnenberg
For decades, biomedical science has focused on ways of identifying the genes that contribute to a particular trait, or phenotype. Approaches such as genome-wide association studies12 (GWAS) identify locations in the human genome at which variations in DNA sequence are linked to specific phenotypes, but if the variant is located in a region of DNA that does not encode a protein, such studies rarely provide insights into the regulatory mechanisms underlying the association. In these cases, comprehensive epigenomic analyses can provide the missing link between genomic variation and cellular phenotype.
The various consortia, including the Roadmap Epigenomics Program, that are gathered under the umbrella of the International Human Epigenome Consortium (www.ihec-epigenomes.org) have taken up the challenge of deciphering hundreds of cell-type-specific epigenomes using human cells and tissues from healthy donors and people with disease. In this issue, the Roadmap Epigenomics Project presents a wealth of epigenomes, a resource that provides a plethora of new hypotheses to be tested in relation to human health and disease. Given that epigenomes are cell-type specific, it makes sense to analyse disease-associated variants identified by GWAS in the context of the epigenome of the disease cell type. Indeed, previous groundbreaking observations13 revealed that non-protein-coding genetic variants that are associated with phenotypic changes are often located in tissue-specific regulatory regions. The current papers use innovative analytical approaches to deepen and extend this knowledge.
Gjoneska et al.6 (page 365) made use of a mouse model of neurodegeneration that mimics Alzheimer’s disease. Theyfound that disease-related changes in gene expression in the hippocampus of the mouse brain correlate with those in post-mortem brain samples taken from people with Alzheimer’s disease, but not with those from people without the disease. Subsequent detailed analyses revealed an upregulation of genes and regulatory regions linked to immune responses seen in Alzheimer’s disease. Genetic variants associated with the condition seemed to be enriched within evolutionarily conserved regulatory elements that control immune pathways, but not in neuronal pathways, providing fresh entry points for treatment.
Farh et al.7 (page 337) developed an algorithm to identify non-protein-coding genetic variants that might underlie autoimmune disease. The authors found that these variants are often located in or near enhancers or promoters. However, only a small fraction of the variants cause a change in a sequence at which transcription factors are known to bind. This suggests that there is more to anenhancer than a ’simple’ collection of sites of transcription-factor binding embedded in the composition of its DNA sequence. For example, flanking sequences might have a topological role affecting chromatin packaging and, consequently, DNA accessibility.
Polak et al.8 (page 360) investigated the distribution of cancer-associated genetic mutations in a set of diverse cancers, and correlated them with cell-type-specific epigenomic features. They found that the mutation profile of each cancer could often be predicted from the epigenomic signature of the cell type from which that cancer was most likely to have originated. Remarkably, the epigenomic signatures of cancer-cell lines (which are often used to study disease) were poor predictors of this profile. The authors conclude that the density and distribution of cancer mutations are strongly linked to a cell-type-specific epigenomic signature.
What comes next? The Roadmap Epigenomics Project has reached a major milestone, but theepigenomes of 127 cell types are just the beginning of the road to a comprehensive epigenome encyclopaedia. The International Human Epigenome Consortium plans to determine the epigenomes of every cell type in the human body — estimated to be several hundred to a thousand. Furthermore, each cell type must be analysed in many individuals, to assess the effect of genetic variation on personal cell-type-specific epigenomes. Finally, studies monitoring the epigenomic changes that arise as a result of ageing and of changes in environmental factors such as nutrients and metabolites will also be interesting. The epigenomics project has taught us that analysis and comparison of the genome and epigenome of healthy and diseased cells is essential for detecting and understanding the drivers of multifactorial diseases and traits.
Laurence Wilson & Genevieve Almouzni
Chromatin is the complex of DNA, RNA and proteins that packages DNA within the cell.At the core of chromatin is an eight-subunit protein complex composed of histones. Molecular modifications to either DNA or histones can affect the structure and function of chromatin. For example, some modifications promote chromatin compaction, affecting how easily DNA can be accessed by transcription factors, whereas others act as signals that modulate gene expression. A case in point is modification of the amino-acid residue lysine 27 (K27) on histone H3 in chromatin. Addition of an acetyl group (a modification known as H3K27ac) correlates with transcription of the corresponding region of DNA, whereas trimethylation (H3K27me3) is linked to transcriptional repression.
Several papers published by the Roadmap Epigenomics Project investigate histone modifications, and provide insights into the relationship between histone signatures and gene expression throughout development and adult life. For instance, three studies investigate the histone modifications associated with disease6,7, 8. Focusing on normal development, Tsankov et al.4 and Ziller et al.5 have mapped histone modifications that occur during the differentiation of embryonic cells (specifically, H3K4me1, H3K4me3, H3K27ac and H3K37me modifications), alongside patterns of transcription-factor binding and DNA methylation. They describe chromatin remodelling events that alter the accessibility of DNA sequences to which combinations of key regulatory transcription factors bind. These events correlate with the changes in gene expression that occur as cells differentiate.
In addition to the linear viewpoint of chromatin alterations presented through histone modifications, long-range chromatin interactions can also modulate gene expression — for instance, by bringing distant enhancers into contact with promoters that regulate the same gene. Dixon et al.9 (page 331) investigated this phenomenon, charting changes in three-dimensional (3D) chromatin organization during stem-cell differentiation. Human cellscontain two copies, or alleles, of each gene, which can vary in terms of DNA sequence, resulting in differences in transcriptional activity (allele-restricted transcription). The allelic complement of a cell is known as its haplotype. Strikingly, Dixon and colleagues report that different haplotypes display different histone modifications and 3D chromatin organization, correlating with its allele-restricted transcription.
Leung et al.3 (page 350) confirmed this observation, reporting haplotype-specific differences in histone modifications and chromatin architecture that correlate with allele-restricted transcription across many tissues. Notably, these differences also correlate with mutations that disrupt sites of either transcription-factor binding or long-range chromatin interactions. However, the functional relevance of these imbalances remains to be deciphered.
These eight studies showcase the use of the first large-scale reference epigenome database, taking advantage of thestatistical power afforded by large sample sizes to formulate hypotheses about the relationships between the epigenome and the genome in different biological processes. They strengthen the link between chromatin modifications and gene expression in development and disease, defining core regulatory circuits that act in different tissues and at different developmental stages. This provides the community with a powerful reference tool, allowing researchers to compare the epigenome in their tissue of choice with snapshots from the database.
It is, however, still early days. Future work should try to address the changing relationship between the epigenome and genome over the lifespan of the cell, in different phases of the cell cycle and across cellular generations. Other factors that modulate chromatin organization also remain to be investigated — the proteins responsible for chromatin remodelling, for example, and the chaperone proteins associated with histone variants that controlassembly and disassembly of chromatin14.
Defining the mechanisms that underlie chromatin-based regulation of gene expression will require integration of the observations made by the Roadmap Epigenomics Project with other approaches that directly test for function. For instance, model organisms will remain essential for comparative epigenomics and for garnering evolutionary information. Cutting-edge techniques, such as high-resolution microscopy, will allow live imaging of chromatin architecture and a means of studying its dynamics in space and time.
Above all, approaches and technologies that draw from different disciplines must be integrated in future epigenomic projects. This multidisciplinary approach is being catalysed by collaborations such as the EpiGeneSys network (www.epigenesys.eu), which bridges epigenetics and systems biology. Combining such efforts will be essential for understanding the functional link between the epigenome andthe genome.
Integrative analysis of 111 reference human epigenomes
While the primary sequence of the human genome is largely preserved in all human cell types, the epigenomic landscape of each cell can vary considerably, contributing to distinct gene expression programs and biological functions1, 2, 3, 4. Epigenomic information, such as covalent histone modifications, DNA accessibility and DNA methylation can be interrogated in each cell and tissue type using high-throughput molecular assays2, 5, 6, 7, 8. The resulting maps have been instrumental for annotating cis-regulatory elements and other non-exonic genomic features with characteristic epigenomic signatures9, 10, and for dissecting gene regulatory programs in development and disease7, 9, 11, 12, 13, 14. Despite these technological advances, we still lack a systematic understanding of how the epigenomic landscape contributes to cellular circuitry, lineage specification, and the onset and progression of humandisease.
To facilitate and spearhead these efforts, the NIH Roadmap Epigenomics Program was established with the goal of elucidating how epigenetic processes contribute to human biology and disease. One of the major components of this programme consists of the Reference Epigenome Mapping Centers (REMCs)15, which systematically characterized the epigenomic landscapes of representative primary human tissues and cells. We used a diversity of assays, including chromatin immunoprecipitation (ChIP)9, 10, 16, 17, DNA digestion by DNase I (DNase)7, 18, bisulfite treatment1, 2, 19, 20, methylated DNA immunoprecipitation (MeDIP)21, methylation-sensitive restriction enzyme digestion (MRE)22, and RNA profiling8, each followed by massively parallel short-read sequencing (-seq). The resulting data sets were assembled into publicly accessible websites and databases, which serve as a broadly useful resource for the scientific and biomedical community. Here we report the integrative analysis of 111reference epigenomes (Fig. 1 and Extended Data Fig. 1a–d), which we analyse jointly with an additional 16 epigenomes previously reported by the Encyclopedia of DNA Elements (ENCODE) project9, 23.
We integrate information about histone marks, DNA methylation, DNA accessibility and RNA expression to infer high-resolution maps of regulatory elements annotated jointly across a total of 127 reference epigenomes spanning diverse cell and tissue types. We use these annotations to recognize epigenome differences that arise during lineage specification and cellular differentiation, to recognize modules of regulatory regions with coordinated activity across cell types, and to identify key regulators of these modules based on motif enrichments and regulator expression. In addition, we study the role of regulatory regions in human disease by relating our epigenomic annotations to genetic variants associated with common traits and disorders. These analyses demonstrate the importance and wideapplicability of our data resource, and lead to important insights into epigenomics, differentiation and disease. Specific highlights of our findings are given below.
• Histone mark combinations show distinct levels of DNA methylation and accessibility, and predict differences in RNA expression levels that are not reflected in either accessibility or methylation.
• Megabase-scale regions with distinct epigenomic signatures show strong differences in activity, gene density and nuclear lamina associations, suggesting distinct chromosomal domains.
• Approximately 5% of each reference epigenome shows enhancer and promoter signatures, which are twofold enriched for evolutionarily conserved non-exonic elements on average.
• Epigenomic data sets can be imputed at high resolution from existing data, completing missing marks in additional cell types, and providing a more robust signal even for observed data sets.
• Dynamics of epigenomic marks in their relevant chromatin statesallow a data-driven approach to learn biologically meaningful relationships between cell types, tissues and lineages.
• Enhancers with coordinated activity patterns across tissues are enriched for common gene functions and human phenotypes, suggesting that they represent coordinately regulated modules.
• Regulatory motifs are enriched in tissue-specific enhancers, enhancer modules and DNA accessibility footprints, providing an important resource for gene-regulatory studies.
• Genetic variants associated with diverse traits show epigenomic enrichments in trait-relevant tissues, providing an important resource for understanding the molecular basis of human disease.de nature