Abstract
Vitamin D3 insufficiency is associated with a number of diseases, such as cancer and autoimmune disorders. This important medical problem leads to the question, whether an insight into the genome-wide actions of the transcription factor vitamin D receptor (VDR) and its high affinity ligand 1α,25-dihydroxyvitamin D3 (1,25(OH)2D3) can help in a more global appreciation of the physiological impact of vitamin D3. Chromatin immunoprecipitation sequencing (ChIP-seq) studies in 6 human cell culture models demonstrated 1,000 to 10,000 genomic VDR binding sites per cell type that sum-up to more than 23,000 non-overlapping loci of the receptor. After ligand stimulation VDR associates with many new binding loci, of which the most important have a higher rate of DR3-type VDR binding sequences than average sites. On the majority of latter VDR interacts directly or indirectly with genomic DNA in a presently uncharacterized fashion. Formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-seq) monitors the dynamically opening chromatin regions after 1,25(OH)2D3 stimulation. The integration of ChIP-seq and FAIRE-seq data combined with a screening for DR3-type sequences facilitates the identification of key VDR binding sites and primary 1,25(OH)2D3 target genes. Recent results of the FANTOM5 project strongly suggest a shift from in vitro cell culture experiments to primary human cells stimulated in vivo. First results suggest that both the number of genome-wide VDR binding sites and the expression of VDR target genes correlate with vitamin D status of the studied human individuals. In conclusion, a genome-wide overview provides a broader basis for addressing vitamin D's role in health and disease.
Abbreviations: 1,25(OH)2D3: 1α,25-dihydroxyvitamin D3; 25(OH)D3: 25-hydroxyvitamin D3; ASAP2: ArfGAP with SH3 domain, ankyrin repeat and PH domain 2; CBS: cystathionine β-synthase; CCNC: cyclin C; CD14: CD14 molecule; CDKN1A: cyclin-dependent kinase inhibitor 1A; ChIA-PET: chromatin interaction analysis by paired-end tag sequencing; ChIP: chromatin immunoprecipitation; ChIP-seq: ChIP sequencing; CTCF: CCCTC-binding factor; CYP: cytochrome P450; DR3: direct repeat spaced by 3 nucleotides; DUSP10: dual specificity phosphatase 10; FAIRE-seq: formaldehyde-assisted isolation of regulatory elements sequencing; FANTOM: functional annotation of the mammalian genome HOMER: hypergeometric optimization of motif enrichment IGV: Integrative Genomics Viewer; LPS: lipopolysaccharide; LRP5: low density lipoprotein receptor-related protein 5; MACS: model-based analysis of ChIP-seq data; MYC: v-myc avian myelocytomatosis viral oncogene homolog; NRIP1: nuclear receptor interacting protein 1; PBMC: peripheral blood mononuclear cell; RXR: retinoid X receptor; TBP: TATA box binding protein; TNFSF11: tumor necrosis factor (ligand) superfamily, member 11; TRPV6: transient receptor potential cation channel, subfamily V, member 6; TSS: transcription start site; VDR: vitamin D receptor.
Vitamin D3 is a pleiotropic signaling molecule, which is involved in the regulation of a large number of physiological processes, such as bone formation, immune function and cellular growth and differentiation (6, 8, 26, 54). Via binding to and activation of the transcription factor VDR the biologically active form of vitamin D3, 1,25(OH)2D3, has a direct effect on the transcriptional regulation of the genome. Since VDR is the only protein that binds 1,25(OH)2D3 with high-affinity (17), the genomic effects of vitamin D are a subset of those exerted by its nuclear receptor. VDR is one of some 1,900 human transcription factors (55), but within these it plays a special role, since it belongs to the few dozens proteins that are specifically activated by lipophilic molecules directly reaching the cell nucleus (4). Therefore, the signal transduction by vitamin D3/1,25(OH)2D3 is in contrast to that of hydrophilic signaling molecules, such as peptide hormones, growth factors and cytokines, a rather straightforward process. In addition, VDR is expressed in most human tissues and cell types (57). This suggests that understanding of the actions of VDR on a genome-wide level will lead to a comprehensive insight on the genomic effects of vitamin D in health and disease.
Number of VDR ChIP-seq peaks in human cellular models. The raw data of six publically available VDR ChIP-seq experiments had been re-analyzed under identical settings (49). The resulting number of significant VDR peaks is reported in bold, while the originally reported peaks is indicated in brackets. ND, Not determined.
This review discusses how the understanding of vitamin D signaling has changed and improved during the last years based on the knowledge of the genome-wide locations of VDR in a number of key human cell types.
Monitoring Genomic VDR Loci
ChIP is a method that is used since more than 10 years (33), in order to monitor the genomic location of transcription factors. The core of the technique is a physical cross-linking between nuclear proteins and genomic DNA and a sonication of chromatin into small fragments in the size of 200-400 bp. A precipitation step with an antibody specific for a nuclear protein of choice (for example against VDR) and a reverse cross-linking allows the enrichment for those regions of the genome that, at the time of the crosslinking, had been in complex with the chosen nuclear protein (27).
The specific enrichment of a chosen genomic region in reference to a negative control region can be monitored by quantitative PCR (ChIP-qPCR). This approach had been successfully used in the past for the study of the regulatory regions of known primary VDR target genes, such as CYP24A1 (52), CYP27B1 (50), CCNC (46) and CDKN1A (40, 41). For a short while also the use of tiled microarrays, so-called “chips”, was popular for the identification of enriched chromatin fragments (ChIP-chip) and allowed for identification of VDR binding sites of the mouse genes Vdr (60), Trpv6 (30), Lrp5 (12), Tnfsf11 (also known as Rankl) (24), Cyp24a1 (28) and Cbs (25). However, PCR and microarrays are based on nucleic acid hybridization, which could cause a bias for certain genomic regions, since the efficiency of these methods varies with the investigated sequence.
This possible bias does not apply to the so-called “next generation sequencing” methods, such as ChIP-seq. These techniques often provide only small stretches of 35-50 nucleotides, so-called “sequence tags”, of the investigated genomic regions. However, the length of these tags is sufficient to locate them back to their origin in the genome of the species, in which the experiment had been originally performed. The accumulation of sequence tags to a genomic region looks like a peak, when it is visualized via a genome browser, such as the Integrative Genome Viewer (IGV). So-called “peak calling software”, such as MACS, is then applied to identify those regions within the genome, where the peaks are significantly above the signal of a control, such as the input of the ChIP reaction (Figure 1). The specific peaks mark in this way the genomic loci of the investigated nuclear protein (14, 35). It should be noted that ChIP-seq peaks represent in most cases the average of a signal from millions of cells. Therefore, the ChIP signal of an individual cell can differ from the often heterogeneous average.
VDR belongs to those nuclear receptors, for which the complex formation with genomic DNA depends on the absence or presence of their specific ligand. Therefore, the VDR ChIP-seq signal from 1,25(OH)2D3-treated cells differs from those that were not stimulated. Figure 1 illustrates three scenarios, where 1,25(OH)2D3 stimulation either leads to i) a strong induction of VDR binding, such as illustrated for the locus of the known VDR target gene ASAP2 (45) (Figure 1A), ii) no significant differences, such as those demonstrated for the transcription start site (TSS) of the TBP gene (Figure 1B) or iii) a down-regulation of VDR binding, as shown for the VDR site of the MYC gene (Figure 1C). The latter example demonstrates that VDR also acts in the absence of its ligand and, in fact, the MYC gene is down-regulated by 1,25(OH)2D3 (48).
At present (August 2014) VDR ChIP-seq data are publically available for 6 human cellular models. These are the immortalized lymphoblastoid cell lines GM10855 and GM10861 (36), THP-1 monocyte-like cells (19), lipopolysaccharide (LPS)-polarized THP-1 macrophage-like cells (49), LS180 colorectal cancer cells (29) and LX2 hepatic stellate cells (9). For a direct comparison the raw datasets were re-analyzed under identical settings (49) (Table I). The original publications report between 1,820 and 6,281 VDR binding sites in stimulated samples and 262 to 1,161 VDR peaks in unstimulated cells, i.e. they agree on that ligand activation of VDR increases the number of its genomic binding sites. The harmonized re-analysis of the datasets came to the same general conclusion, but reported for the B cells higher and for the other cellular models lower VDR peak numbers ranging from 774 to 12,448 for stimulated cells and from 165 to 4,072 for unstimulated samples.
The overlap between the VDR peaks sets of ligand-treated and untreated cells varied between 31% and 77% (49). This suggests that a reasonable number of VDR loci are occupied in the presence and absence of ligand (Figure 1B and C). In total 23,409 non-overlapping VDR binding sites were identified when allowing a distance of 100 bp between the peak summits (49). Interestingly, some 70% of these VDR loci are unique for one of the six cellular models. This indicates that the genomic binding of VDR is largely cell-specific. In contrast, within related cellular models, such as between B cells or between monocytes and macrophages, 53 to 73% of the VDR peaks were the same (49). However, within all six VDR ChIP-seq datasets only 43 loci were identical. This number increases to 60, when larger overlap distances were allowed (49). Nevertheless, only at a very limited number of genomic loci VDR is found in all cellular systems.
In summary, ChIP-seq studies demonstrated that there are 1,000 to 10,000 genomic VDR binding loci per human cell type. Most of these sites are cell-specific, but the closer cells are developmentally related, the higher is their overlap in VDR binding. Ligand stimulation changes the genome-wide VDR binding profile: the receptor disappears from some sites, stays on approximately half of all loci, but in particular finds new binding locations.
Mechanistic Implications of Genome-wide VDR Binding
The genome-wide view on VDR binding suggests some reconsideration of the models of vitamin D signaling. In the past only a few dozens VDR binding sites were known, which were located in rather close distance to the TSS regions of physiologically relevant 1,25(OH)2D3 target genes (18). In contrast, today one is facing thousands of VDR binding sites in the vicinity of genes, out of which most had not been related to vitamin D before. This may suggest that far greater number of genes are vitamin D targets than previously assumed. In fact, transcriptome-wide studies suggest that in every cellular system hundreds of genes are up- or down-regulated, when stimulated with 1,25(OH)2D3 (1). For example, in THP-1 cells 408 genes were found to be statistically significantly up-regulated after 4-h treatment with 1,25(OH)2D3 (19). However, only 67 of these genes were more than 1.5-fold induced. Interestingly, only a few of the latter genes are close to a conserved VDR binding site, i.e. these sites may have a more general function than controlling genes in their vicinity.
Different types of VDR binding scenarios. The Integrative Genomics Viewer (IGV) browser (38) was used to visualize a ligand-inducible VDR binding site within the ASAP2 gene (A, 54 kb downstream of the TSS), a ligand-independent VDR site at the TSS of the TBP gene (B) and a ligand-repressed VDR locus within the MYC gene (C, 2.7 kb downstream of the TSS). The peak tracks display data from input controls (grey) or VDR ChIP-seq datasets (red) from THP-1 human monocytic leukemia cells (19). The gene structures are shown in blue and the sequence of the DR3-type VDR binding motifs is indicated below the respective peaks.
Chromatin domain containing VDR binding sites. The linear view (A) and the looping view (B) of a hypothetical 1,25(OH)2D3 target gene is shown. The chromatin domain of the gene is defined by upstream and downstream CTCF binding sites (BS). VDR BS 1 and 2 are within the chromatin domain and therefore able to regulate the gene, while VDR BS 3 is located outside of the loop and therefore not implicated in the regulation.
In this context it is important to define the term “vicinity” more precisely. Chromatin is organized into loops of in average a few hundred kb in size (23). These loops are mostly separated by insulator regions (53), which are often characterized by the binding of the highly conserved transcription factor CCCTC-binding factor (CTCF) (42). Genes of a given chromatin loop, which is often also referred as chromatin domain, are preferentially regulated by transcription factors of the same loop (Figure 2). This means that a primary 1,25(OH)2D3 target gene should have a VDR binding site within the borders of CTCF binding loci. The recently developed method chromatin interaction analysis by paired end-tag sequencing (ChIA-PET) (13) is able to monitor genome-wide CTCF-mediated chromatin loops. Since the genomic binding of CTCF is exceptionally highly conserved, ChIA-PET data, which the ENCODE project delivered for K562 monocytic leukemia cells and MCF-7 breast cancer cells (10), allows a good estimation of chromatin domain borders (Figure 2). By extrapolating K562 CTCF ChIA-PET data to THP-1 cells, some 1,600 chromatin domains were defined that contain at least one VDR binding site (43). The average size of these loops was some 200 kb, but some chromatin domains are larger than 1 Mb. Similar numbers can be assumed for other 1,25(OH)2D3 responding tissues and cell types.
Since more than 20 years VDR is known to form heterodimeric complexes with the retinoid X receptor (RXR) on sequences that are direct repeats of hexameric motifs with three intervening nucleotides (DR3) (2, 51). By performing so-called “de novo motif searches” all VDR ChIP-seq studies confirmed the preferential occurrence of DR3-type sequences below the summits (±100 bp) of VDR peaks (Figure 3A). However, by far not below all VDR peaks a DR3-type motif could be identified. Transcription factor binding site screening algorithms, such as HOMER (20), allow different threshold settings, so-called “scores”, for detecting more or less deviations from the consensus sequence. The screening of the total set of 23,409 non-overlapping VDR binding sites with HOMER scores between 4 and 9 (Figure 3B) demonstrates that even with a score of 4, which represents rather degenerated sequence motifs (Figure 3C), only some 40% of all VDR peaks have DR3-type sequence below their summit. A more detailed analysis of the VDR binding sites of six cellular models and a differentiation between 1,25(OH)2D3-treated cells and unstimulated samples (49) showed that ligand-stimulated samples have a higher rate of DR3-type sequences than that of non-treated cells. Moreover, cellular models, such as monocytes, for which a lower total number of VDR peaks have been identified, show a higher DR3 percentage than B cells, for which a large number of peaks have been reported. Importantly, the top 200 VDR sites of all six ChIP-seq datasets have a DR3 rate of more than 60%. This suggests that the most prominent VDR binding sites, which are often also the most ligand responsive loci (Figure 1A), play an important role in the response of the respective cellular system to vitamin D. Moreover, this implies that the total number of VDR peaks is less important than the quality of the most responsive sites.
DR3-type VDR binding motifs. All ChIP-seq studies identified by de novo motif screening DR3-type binding sites as the predominant sequence motif below the summits of VDR peaks (A). The transcription factor motif screening software HOMER was used in different settings (referred to as “scores”), in order to determine the percentage of DR3-type motifs in the total set of 23,409 non-overlapping VDR binding sites (B). Random examples of DR3-type motifs demonstrate that lower HOMER scores represent more degenerate sequences (C).
Vitamin D effects on chromatin. The IGV browser was used to visualize 1,25(OH)2D3-dependent chromatin opening determined by FAIRE-seq (44) (grey for solvent-treated (−) and light blue for 1,25(OH)2D3-stimulated (+) samples) and ligand-dependent VDR binding measured by ChIP-seq (19) (red) at a locus 10 kb downstream of the TSS of the long non-coding RNA gene LINC00634 in THP-1 cells. The gene structure is shown in blue and the sequence of the DR3-type VDR binding motif is indicated below the peak.
The results of the transcription factor binding site screening (Figure 3B) suggest that at least 60% of all presently known VDR binding sites do not contain a DR3-type sequence. This indicates that at these loci VDR uses a different mode of interaction with genomic DNA and implies that it partners with an alternative protein at these sites. Together with these partners VDR may recognize different binding sequences or it may even bind “backpack” of a DNA-binding transcription factor (3). No prominent alternative VDR partner has yet been characterized, but HOMER searches indicate that the receptor may use a divergent group of proteins, such as the transcription factors PU.1 (also called SPI1), ESRRB (also called NR3B2) and GABPA (49). A partnership of VDR with the well-known pioneer factor PU.1 (59) has already been suggested in the context of monocytic differentiation (32).
Taken together, in each vitamin D-responsive tissue or cell type there are more than 1,000 chromatin domains that combine VDR with its target genes. The more prominent and ligand-responsive VDR binding sites have a higher rate of DR3-type sequences. In contrast, on the majority of its binding loci VDR interacts directly or indirectly with genomic DNA in a presently uncharacterized fashion.
Chromatin Effects of Vitamin D
Due to the intrinsic repressive nature of chromatin, accessible genomic DNA is an essential condition for the binding of a regular, so-called “settler” transcription factor, such as VDR (37). The recently developed method FAIRE-seq is a rather simple and straightforward approach to detect genome-wide accessible, i.e. not protein-bound, DNA within chromatin (15). Like in ChIP-seq, the technique involves crosslinking and sonication of chromatin but replaces the antibody immunoprecipitation by a simple phenol extraction. A FAIRE-seq peak represents open chromatin, while regions without a peak are within non-accessible closed chromatin. In average a cell has some 50,000 to 100,000 FAIRE-seq peaks, i.e. this number of accessible chromatin regions (10, 44).
FAIRE-seq was used to monitor the dynamic response of chromatin after a stimulation of THP-1 cells with 1,25(OH)2D3 through measurements every 20 min over a time period of 2 h (43, 44). The vast majority of all VDR binding sites (87% in THP-1 cells) are associated with open chromatin but only at some 200 genomic regions a significant increase of accessible chromatin was detected. Figure 4 displays the example of a ligand-inducible VDR binding locus (comparable to that of the ASAP2 gene shown in Figure 1A) located 10 kb downstream of the TSS of the long non-coding RNA gene LINC00634, at which significant opening of chromatin can be observed. Interestingly, VDR binding sites at dynamically opening chromatin regions contain a far higher rate of DR3-type sequences below the peak summits (66% in THP-1 cells) as an average site (20% in THP-1 cells) (44). This suggests that the association with 1,25(OH)2D3-triggered chromatin opening is another indication a prominent VDR binding site.
In summary, FAIRE-seq is a technically simpler but more comprehensive approach than performing VDR ChIP-seq. The method allows for identification of the most prominent VDR binding loci via monitoring dynamically opening chromatin regions after 1,25(OH)2D3 stimulation.
Shifting from Cultured Cells to Primary Human Tissues and Cell Types
Experiments in cultured cells of the past 25 years discovered many of the principles of gene regulation by 1,25(OH)2D3. In particular, the above described genome-wide insight from ChIP-seq, ChIA-PET and FAIRE-seq assays provided a major advance in understanding. However, it should be noted that most of these experiments were performed with cancer cells under conditions that do not reflect the reality of vitamin D endocrinology. For example, in most assays the culture medium was depleted from lipophilic molecules, such as vitamin D, and a pharmacologically high dose of 1,25(OH)2D3 (10 to 100 nM) was used, in order to activate VDR within a few hours. However, in reality the levels of vitamin D3 and its metabolites 25-hydroxyvitamin D3 (25(OH)D3) and 1,25(OH)2D3 do not undergo rapid changes (8, 31). Serum 25(OH)D3 concentrations, which are the widely accepted indicator of the vitamin D3 status of the human body (21), change only in the order of weeks and months due to seasonal variations in sun exposure (56). Nevertheless, the vitamin D status of human individuals varies widely due to differences in diet, sun exposure, age, level of adiposity and (epi)genetic polymorphisms (11, 34, 47). As a consequence, worldwide billions of people have a serum 25(OH)D3 concentrations below 50 nM and are vitamin D-deficient (22).
The genome-wide impact of different serum 25(OH)D3 levels was investigated first by VDR ChIP-seq with primary T-cells, which were isolated from nine human individuals showing a variant vitamin D status (16). Interestingly, the number of observed VDR peaks ranged from 200 for a vitamin D-deficient individual to more than 7,000 for a person with high circulating 25(OH)D3 concentrations. Unfortunately, the raw data of this study is not available, in order to perform a harmonized re-analysis in comparison with the six VDR ChIP-seq dataset from cell culture models. Nevertheless, for the sum of the nine individuals 14,044 unique VDR peaks were reported, from which HOMER analysis identified only 442 (3.1%) to carry a DR3-type sequence within its summit area. This represents a 6.7-times lower DR3 rate than observed with cultured cells under identical settings (HOMER score 7) (Figure 3B). This may be due to the fact that in vivo ChIP-seq is technically more challenging and may have led to a sub-optimal data quality of this study (16). However, it could also mean that gene regulation by vitamin D is in vivo less complex than in vitro. The latter possibility was impressively demonstrated by the FANTOM5 project (7) that compared gene expression in 750 primary human samples with that of 250 cell lines. In general, the FANTOM5 data suggest that the transcriptome-wide gene expression profile in primary human tissues and cell types differs significantly from their cell culture surrogates. This also implies that in future experiments should be preferably performed in vivo, since the culture of primary cells in vitro is inducing changes in their epigenome,
Human in vivo experiments are an ethically-difficult issue. However, vitamin D3 is a save micronutrient that is taken daily as a supplement by millions of people and endogenously produced by everyone after sun exposure. This provides numerous possibilities for the investigation of gene regulatory effects of vitamin D3. One of these was a study that used samples donated by 71 elderly, pre-diabetic individuals that participated in a 5-month vitamin D3 intervention trial (VitDmet) during Finnish winter (5). Peripheral blood mononuclear cells (PBMCs) (5, 58) and adipose tissue biopsies (39) were obtained both at the start and the end of the study and were investigated for changes in mRNA expression of a series of primary VDR target genes, such as CD14, NRIP1 and DUSP10. These gene expression changes were correlated with alterations in serum 25(OH)D3 levels during the 5-month intervention and allowed a classification of human individuals based on their responsiveness to vitamin D3. Interestingly, only for some 60% of the individuals a significant correlation between gene expression and vitamin D status could be obtained. This suggests that, at least on the level of gene expression, vitamin D3 supplementation was unnecessary for the remaining studied subjects (5).
Taken together, results of the FANTOM5 project strongly suggest shifting the study of gene regulation by vitamin D from in vitro cell culture experiments to primary cells stimulated in vivo. First results suggest that both the number of genome-wide VDR binding sites and the expression of VDR target genes correlate with the vitamin D status of the studied individuals.
Conclusion
The genome-wide view on VDR locations via ChIP-seq or on 1,25(OH)2D3-induced chromatin opening via FAIRE-seq have significantly broadened the understanding of gene regulation by vitamin D. Some previous knowledge, such as the preferential binding of VDR to DR3-type sequences, had been confirmed but also challenged at the same time, since only a minority of all VDR loci contain these sequences. The large number of 23,000 non-overlapping VDR sites known so far from six human cellular models or the 14,000 VDR loci identified in primary T-cells from nine human individuals is initially overwhelming. However, not all these VDR sites seem to be equally important. In cell culture models only some 10% to 30% of the VDR loci carry a DR3-type sequence (translating to 300 to 1,000 sites), some 200 VDR sites are at 1,25(OH)2D3-triggered chromatin regions and only at some 50 genomic positions sites VDR is found in all tested cellular models. This suggests that in a given cell type only a few hundred VDR sites play a critical role, while many of the other VDR loci may rather represent “noise”, than having a specific function. Interestingly, this number is in the same order as the sum of primary 1,25(OH)2D3 target genes. Therefore, a combination of genome-wide assessment of VDR loci by ChIP-seq, monitoring of 1,25(OH)2D3-inducible chromatin sites and screening for DR3-type motifs below the peak summits may be presently the best experimental approach to understand the effects of vitamin D on the genome of a given cell type. Applying these methods to primary human tissues and cell types that have been stimulated in vivo may be the best way to evaluate the responsiveness to, and needs for, vitamin D of a human individual on the molecular level.
Acknowledgements
The Author thanks the Academy of Finland and the Juselius Foundation for their support.
- Received August 30, 2014.
- Revision received October 2, 2014.
- Accepted October 9, 2014.
- Copyright© 2015 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved