Abstract
Background/Aim: In the past decade, diffuse intrinsic pontine glioma (DIPG), the most common childhood brainstem glioma, has benefitted from an increase in tissue-based research because of improved biopsy collection techniques. However, the adaptive immune receptor (IR) features represented by tumor material and tumor infiltrating lymphocytes have remained poorly understood. Materials and Methods: Herein, we characterized the adaptive immune parameters of DIPG through the recovery of IR recombination reads from RNAseq files representing initial and progressive DIPG samples. Results: An elevated level of immunoglobulin gene expression in the progressive DIPG sample files and a reduced number of bacterial sequencing read recoveries in comparison to RNAseq files representing the initial form of DIPG, was found. Furthermore, the RNAseq files representing both initial and progressive DIPG samples had significant numbers of reads representing Cutibacterium acnes, a bacterium previously linked to prostate cancer development. Results also indicated an opportunity to distinguish overall survival probabilities based on IGL complementarity determining region-3 amino acid sequence physicochemical parameters. Conclusion: Genomics analyses allow for a better understanding of adaptive IR features and bacterial infections in the DIPG setting.
- Diffuse intrinsic pontine glioma
- immunoglobulin recombination reads
- bacterial DNA sequences
- progressive disease
- Cutibacterium acnes
Diffuse intrinsic pontine glioma (DIPG) is a highly malignant pediatric brainstem tumor with some of the worst outcomes compared to other childhood cancers. Representing about 15% of childhood brain tumors (1), DIPG has a median age of diagnosis in children of about 6 years, with 90% loss of life within two years of diagnosis (2, 3). Current standard of care consists of radiation therapy, given that the tumor location is generally not amenable to resection and that chemotherapy has not consistently led to improved outcomes (3, 4). However, DIPG-related clinical trials have been increasing in number in the past decade (5), offering some hope for improved patient outcomes with new pharmacological treatments.
Although often not resected, advances in biopsy safety have allowed for increased availability of tumor tissue samples for study. Consequently, important finds related to cancer genetics and epigenetics followed; for example, the identification of the Histone-3 mutant (H3-K27M), which has been estimated to represent about 80% of DIPG cases (6), led to considerations of treatment based on molecular classifications (7).
To further advance the immunogenomic understanding of DIPG, in this report, we mined the tumor RNAseq files of patients diagnosed with DIPG for adaptive immune receptor (IR) recombination reads. An extensive body of work has demonstrated how the recovery of such adaptive IR reads can reveal multiple features of the immune response, a connection that has been benchmarked in a variety of ways. As just one of many examples, high levels of recovery of TCR recombination reads from metastatic melanoma genomics files correlates with an increased level of expression of genes associated with activated T-cells and better outcomes (8, 9).
In this study, we conducted an analysis of the adaptive immune receptor recombinations present in two subtypes of diffuse intrinsic pontine glioma (DIPG) samples: initial and progressive tumors. We examined RNA sequencing data to identify and compare the immune receptor sequences found in the DIPG subtypes, quantified gene markers associated with B cell immune function, and searched the tumor samples for sequences representing bacterial signatures. Additionally, we analyzed the amino acid (AA) patterns within the complementarity-determining region 3 (CDR3) regions of the adaptive immune receptors detected in the DIPG samples, and then correlated the CDR3 patterns to better and worse patient survival. Our approach explored the potential roles of tumor-resident bacteria and B-cells in either promoting or hindering tumor growth in DIPG. Ultimately, the objective of this study was to leverage bioinformatics techniques to elucidate the role of the adaptive immune system in DIPG, a rare central nervous system tumor whose tumor microenvironment remains poorly characterized.
Materials and Methods
Recovery and characterization of adaptive IR recombination reads from DIPG RNAseq files. We obtained 51 RNAseq files representing a DIPG diagnosis from the Children’s Brain Tumor Network (CBTN) (10), using the Cavatica platform (https://cavatica.sbgenomics.com) and the seven-bridges command-line client (https://docs.sevenbridges.com/docs/command-line-interface). These CBTN files were accessed via approval reference number CBTN_D0135. The CBTN RNAseq files representing both initial and progressive DIPG tumor states were processed for recovery of IR recombination reads using previously and extensively described software (11-13) (https://github.com/bchobrut-USF/blanck_group; https://github.com/kcios/2021). Importantly, the approach used for the extraction of the IR recombination reads requires a verifiable V- and J-gene segment on one read, with therefore highly reliable, intervening CDR3 amino acid (AA) sequences. The CBTN RNAseq file datasets examined for this study were represented by three different read alignment protocols: “Aligned and Sorted”, “Original BAM”, and “Local Transcript” (The “Aligned and Sorted” RNAseq files represented the mapping protocol with the highest number of read recoveries). Two additional (non-CBTN) datasets representing RNAseq files of DIPG tumor samples (phs000900, phs001526) (14, 15) were also mined for adaptive IR read recoveries. These additional studies were accessed via National Institutes of Health, database of genotypes and phenotypes project approval number 27984. Note, for these latter studies, the term “primary” is used to describe the tumor, however, for this report, the “initial” and “primary” terms are synonymous. The DIPG tumors from the two above indicated phs datasets are referred to as initial tumors (Tables S1-S5).
Single value physicochemical assessments of the complementarity determining region-3 (CDR3) amino acid (AA) sequences represented by the IR recombination reads. The adaptive IR CDR3 AA sequences representing IG recombination read recoveries for progressive DIPG were quantitatively assessed for physicochemical characteristics using previously described software by Pappu and colleagues (16, 17). The physicochemical values based on the CDR3 AA sequences in turn represented by the IG read recoveries of “Aligned and Sorted” and “Original BAM” files are provided in Table S6. Overall survival (OS) differences were assessed for the upper and lower 50th percentiles for the case-averaged physicochemical parameter value indicated, for each IG receptor subtype (i.e., IGH, IGK, IGL). The average physicochemical parameter value for each case ID was computed using both mapped and unmapped IG recoveries for a given alignment protocol and receptor subtype (i.e., both mapped and unmapped IGL read recoveries from the files representing progressive DIPG samples represented by the “Aligned and Sorted” and “Original BAM” mapping protocols, respectively, were used in Results). The Kaplan-Meier (18) analyses were conducted with GraphPad Prism version 8. Input and output for the KM analyses are provided in Table S7.
Assessment of AA k-mer and chemical sequence motif k-mer distribution differences between DIPG initial and progressive tumors. The CDR3 AA sequences of adaptive IR recombinations were fragmented into AA sequences of variable lengths (termed “AA k-mers”). AA k-mer frequency distribution was compared between DIPG initial and progressive tumor samples using a two-population proportion test to assess whether certain AA k-mers were over- or under-represented in either the initial or progressive collections of CDR3s based on the recovery of the IR recombination reads from the RNAseq files. The DIPG CDR3 AA sequences were also converted to chemical sequence motifs. To do so, each AA was replaced with a letter corresponding to its individual AA, R-group biochemical profile (hydrophobic, aromatic, charged positive, charged negative), to replicate the approach of Chobrutskiy et al. (19). The chemical sequence motifs were then fragmented into variable length sequences (termed “chemical sequence motif k-mers”), and the distribution was compared for initial and progressive forms of DIPG using a two-population proportion test. Note that in our assessment of both AA sequence and chemical sequence motif k-mers, the CDR3 AA sequences from both mapped and unmapped portions of “Aligned and Sorted” RNAseq files were used for analyses.
Quantification of B-cell and T-cell markers. Gene expression values for B- and T-cell markers were established by counting the sequencing reads within the “Aligned and Sorted” RNAseq file slices for both initial and progressive DIPG. Each indicated gene was normalized to the read counts for the gamma-actin gene. Gene expression was compared between the two DIPG tumor states using a student t-test (While this approach will not allow a comparison of the expression of one gene to another, it does provide for an accurate comparison of the same gene in different samples).
Bacterial sequencing read recoveries from the DIPG RNAseq files. The unmapped regions of the “Aligned and Sorted” RNAseq files corresponding to both initial and progressive DIPG samples were first converted from BAM to FASTQ formats using the SAMtools package (20). The FASTQ files were then parsed with the Kraken metagenomic alignment tools to identify reads corresponding to bacterial sequences (21). Taxonomic data were imported from the report files generated via the Kraken software, into Pavian, a web-based application that allows for visualization of taxonomic data (22). The Pavian reports were then saved as Excel files for subsequent analysis. Using the Pavian generated taxonomic Excel files (22), each case ID was evaluated for bacterial sequencing read recoveries expressed as a percentage of the total collection of unmapped reads. Comparison of percent recoveries of bacterial sequences in DIPG initial and progressive samples was performed using a Student’s t-test. Bacterial genera and bacterial species were quantified for the initial and progressive DIPG samples (Table S8).
Results
Increased IG gene expression with a transition to progressive DIPG. We quantified the adaptive IR recombination reads from the CBTN dataset, including six initial and eleven progressive DIPG samples, representing 15 cases total (i.e., two cases represented both initial and progressive samples) (Figure 1). An initial inspection of the data suggested high levels of IGH, IGK and IGL recombination reads in both initial and progressive DIPG (Table I). To compare these two tumor states, we normalized the total number of IGH, IGK and IGL recombination reads to the number of TRB recombination reads, which indicated a substantial increase in IGH, IGK, and IGL recombination reads among the progressive samples of DIPG (Figure 2). The RNAseq file datasets examined for this study were represented by three different read alignment protocols. Results were similar for all three alignment protocols (with files aligned using the “Local Transcript” protocol showing the least number of total IR recombination read recoveries). Results for the “Aligned and Sorted” protocol are shown in Figure 2. Results for the “Original BAM” and “Local Transcript” alignment protocols are in Table S5 (Methods).
Overview of analyses of DIPG tumor sample gene expression, adaptive immune receptors, and bacterial sequences.
Summary of the adaptive IR recovery counts, from RNAseq files, representing DIPG tumor samples of CBTN-DIPG, phs000900, and phs001526 studies.
Bar graph comparing IG recombination read recovery counts, normalized to TRB recombination read recoveries, for DIPG and control RNAseq datasets. Datasets representing IG recombination read recoveries for DIPG included CBTN initial/progressive tumor samples (“Aligned and Sorted” RNAseq), phs000900 tumor samples, and phs001526 tumor samples. The DIPG IR recovery counts were compared to IR recombination read recoveries from RNAseq datasets representing TARGET-NBL (neuroblastoma; tumor samples) and TARGET-WT (Wilms tumor; tumor samples). IG: Immunoglobulin.
We also mined the adaptive IR recombination reads from RNAseq files of two other DIPG studies (phs000900, phs001526), which represented only initial tumor samples. To determine whether the IGH, IGK, and IGL levels were consistent with the levels in the initial samples for the above CBTN dataset, we normalized the IGH, IGK and IGL recombination reads counts to the TRB recombination read counts. Results indicated consistency with the initial samples for CBTN as well as consistency with the level of IGH, IGK, and IGL recoveries from RNAseq files representing primary (initial) tumor samples for neuroblastoma (NBL) and Wilms tumor (WT) datasets (Figure 2).
We also normalized recoveries of the TRA, TRG, and TRD recombination reads to TRB recombination read recoveries with results indicating no difference among any of the datasets under study (CBTN DIPG progressive, CBTN DIPG initial, phs000900 DIPG initial, phs001526 DIPG initial, WT primary, and NBL primary) (Figure S1).
Physicochemical features of the IGL CDR3s representing progressive DIPG samples correlate with OS probability differences. Given the significant increase in the IG recombination reads in progressive DIPG, we hypothesized that certain attributes of the CDR3 AA sequences would correlate with increased overall survival (OS) probabilities. Cases representing the upper 50th percentile for IGL CDR3 AA sequence aromaticity represented a better OS probability, in comparison to cases representing the lower 50th percentile (Figure 3A; median OS: 13 months versus 4 months; Log-rank p=0.0044). That initial test of the hypothesis (Figure 3A) represented IGL recombination reads recovered from the “Aligned and Sorted” RNAseq file collection. We, thus, repeated the OS probability assessments using IGL recombination reads obtained from the Original BAM (Methods) RNAseq files, with results indicating that cases representing the upper 50th percentile of the aromaticity feature also represented a better OS probability than did the cases in the lower 50th percentile for the IGH, IGK, and IGL CDR3 aromaticity value (Figure 3B; median OS: 11 months versus 3.5 months; Log-rank p=0.026). The upper 50th percentile of cases for the secondary structure-helix feature, for the IGL CDR3 AA sequences, represented a better OS probability than did the lower 50th percentile for the secondary structure-helix feature (Figure 3C; median OS: 13 months, 4 months; Log-rank p=0.015), based on IGL recombination read recoveries from the “Aligned and Sorted” RNAseq files. Likewise, the upper 50th percentile of cases for the secondary structure-helix feature had a better OS probability than did the lower 50th percentile for the secondary structure-helix feature (Figure 3D; median OS: 11 months, 3.5 months; Log-rank p=0.0067) based on IGL recombination read recoveries from the “Original BAM” RNAseq files.
Kaplan-Meier analyses comparing OS of cases with IGL recombination reads recovered from RNAseq files: survival analyses based on case-averaged CDR3 physicochemical values. (A) “Aligned and Sorted” RNAseq files; upper (black) and lower (grey) 50th percentile CDR3 aromaticity values. Cases representing the upper 50th percentile of the aromaticity values (median OS, 13 months) had a higher OS probability than cases representing the lower 50th percentile (median OS, 4 months). Log-rank p=0.0044. (B) “Original BAM” RNAseq files; upper (black) and lower (grey) 50th percentile CDR3 aromaticity values. Cases representing the upper 50th percentile of the aromaticity values (median OS, 11 months) had a higher OS probability than cases representing the lower 50th percentile (median OS, 3.5 months). Log-rank p=0.0266. (C) “Aligned and Sorted” RNAseq files; upper (black) and lower (grey) 50th percentile CDR3 secondary structure – helix values. Cases representing the upper 50th percentile of the secondary structure – helix values (median OS, 13 months) had a higher OS probability than cases representing the lower 50th percentile (median OS, 4 months). Log-rank p=0.0153. (D) “Original BAM” RNAseq files; upper (black) and lower (grey) 50th percentile CDR3 secondary structure – helix values. Cases representing the upper 50th percentile of the secondary structure – helix values (median OS, 11 months) had a higher OS probability than cases representing the lower 50th percentile (median OS, 3.5 months). Log-rank p=0.0067. OS: Overall survival.
Higher CD20 expression in the progressive DIPG samples. We assessed the expression of two T-cell and two B-cell markers in the “Aligned and Sorted” RNAseq files of both initial and progressive DIPG samples (Table II). The B-cell marker, CD20, showed significantly higher expression in the progressive versus initial DIPG tumor samples (p=0.02), with a mean normalized CD20 expression about 3-fold higher in the progressive samples, consistent with the increased level of IG gene expression in the progressive DIPG samples.
B- and T-cell gene expression markers, based on the “Aligned and Sorted” RNAseq files, for initial and progressive DIPG.
Specific IG CDR3 AA k-mers with increased frequency in progressive DIPG. The IGH, IGK, IGL CDR3 AA sequences were programmatically fragmented to AA sequences of variable lengths, generating “AA k-mers”, and the k-mer distributions were assessed and compared for initial versus the progressive forms of DIPG. Table III lists the AA k-mers with the greatest differences in distribution for IGH, IGK, and IGL, when comparing DIPG initial and progressive samples.
Comparison of the distribution of IG CDR3 AA k-mers, for initial and progressive CBTN-DIPG samples, with the CDR3s based on IG recombination reads recovered from both the mapped and unmapped sections of the “Aligned and Sorted” RNAseq files.
IG CDR3 chemical sequence motif k-mers over-represented in the progressive form of DIPG. We assessed the distributions of fragments (k-mers) of IG CDR3 chemical sequence motifs (Methods) among the initial and progressive tumor samples. Table IV lists the chemical sequence motif k-mers of length five with the highest distribution differences for the initial versus progressive forms of DIPG.
Comparison of the distribution of IG CDR3 chemical sequence motif k-mers, for initial and progressive CBTN-DIPG samples, with the CDR3s based on IG recombination reads recovered from both the mapped and unmapped sections of the “Aligned and Sorted” RNAseq files.
Reduced recovery of bacterial sequences from the RNAseq files representing the progressive DIPG samples. We compared the number of unmapped reads corresponding to bacteria in the initial and progressive DIPG samples for “Aligned and Sorted” RNAseq files (Methods). The initial DIPG samples had a mean recovery of 1.63% reads representing bacteria, whereas the progressive DIPG samples had a mean recovery of 0.51% of unmapped reads representing bacteria (p-value=0.046) (Table V). The top genera and species are in Table VI. Of note, the bacterial genus, Cutibacterium, and the species, Cutibacterium acnes, was the only bacteria found in all samples (RNAseq files) for both initial (representing 6 cases) and progressive DIPG (representing a total of 11 cases) (Table S8, Methods).
Comparison of bacterial sequencing read recoveries in the unmapped portions of “Aligned and Sorted” RNAseq files. Note, each row represents a separate RNAseq file.
Report showing top bacterial genera and top bacterial species, based on sequencing reads recovered from the unmapped regions of “Aligned and Sorted” RNAseq files (see also Table S8).
Discussion
Reports of DIPG’s immunological landscape have remained limited (23, 24), and to our knowledge, this is the first report describing adaptive IR features in DIPG tumor samples. The DIPG tumor samples included in this study revealed a significant expression of antibody genes, presumably from a B-cell infiltration. Importantly, the transition from initial to progressive DIPG also had a significant increase in the expression of IG genes. This transition was accompanied by a decrease in the level of bacteria in the DIPG progressive samples, compared to the initial tumor samples. Pediatric central nervous system (CNS) tumors of high-grade have been documented as having, in general, a relatively low immune infiltrate (25). For DIPG, previous reports have shown tumor-associated macrophages form the larger portion of the immune infiltrate, with T-cells representing comparatively lower proportions of the immune infiltrate. Such macrophages have had a documented role in the low inflammatory milieu of DIPG, with relatively low expression of inflammatory cytokines (26). While it is understood that DIPG does not represent a strong inflammatory environment, there remains controversy as to whether the environment is actively immunosuppressive (27), leaving questions about the basic foundation for immunotherapy unresolved.
It is apparent that questions about immune status of the DIPG tumor environment should apply separately to initial and progressive samples, with the progressive samples certainly distinguished by dramatically elevated levels of immunoglobulin gene expression. Of note, one previous report has indicated increased B-cell levels in the peripheral blood of DIPG patients at their initial presentation with the disease (28), but no previous studies have reported increased B cells in the progressive form of DIPG.
Given the rapid recurrence of disease in patients post-radiation and their poor survival after disease recurrence (29), it is important to consider whether the increase in IG gene expression, and the possible increase in B-cell numbers, as indicated by increased CD20 expression, observed with the progressive form of the disease, could be part of a pro-tumor environment. Specific B-cell subtypes have been shown to lead to a reduction of the anti-tumor effects of T-cells (30, 31). Furthermore, the B-cell microenvironment can reduce the apoptotic effect of interferon-gamma on tumor cells (32), which may link the basic correlation of IG recombination reads and worse outcomes seen in certain tumor types (33).
Our study indicated a potential opportunity to distinguish cases with better OS probabilities based on IGL CDR3 physicochemical features. Furthermore, we observed a potential opportunity to distinguish initial and progressive DIPG based on IG CDR3 AA sequence subsets, i.e., k-mers (Table III and Table IV). These data raise the question of whether the transition from initial to progressive DIPG leads to alterations in tumor antigens, with the shift in k-mers representing antibody targeting of the tumor?
Every sample queried for both initial and progressive DIPG had a substantial number of sequencing read recoveries for the genus and species Cutibacterium and Cutibacterium acnes respectively. While the bacterial read recovery approach has its advantages, particularly in a big data or computational setting, it would be important to follow up such results with 16S rRNA or histological assessments for the presences of specific bacterial species in DIPG tumor samples.
Cutibacterium acnes has been linked to psoriasis (34) and to prostate and thyroid cancer (35, 36) and has been thought to play an immunosuppressive role in prostate cancer through increased T-regs recruitment (37). Thus, the role of this bacterium in DIPG merits further investigation. And although the above data do not represent a direct connection linking antibodies to an anti-bacterial response, our results raise the question of whether the presence of bacteria in the tumor stimulated a B-cell response, and whether that B-cell response was responsible for the reduction in the levels of bacteria in the progressive samples? Furthermore, the question is raised, is there overlap between an anti-bacterial antibody response and an anti-tumor antibody response that may impact the survival odds of patients with DIPG?
Acknowledgements
The Authors thank USF research computing; Ms. Corinne Walters for extensive administrative support in obtaining database approvals; and the taxpayers of the State of Florida. Dedicated to John, Bonnie, Lucy.
Footnotes
Authors’ Contributions
ECG: Conceptualization; Formal analysis; Methodology; Visualization; Writing – review & editing; TIH: Formal analysis; Methodology; Software; JUQ: Formal analysis; Resources; Conceptualization; Methodology; Software; MV: Formal analysis; Conceptualization; Methodology; Software; JED: Resources; Software; GB: Formal analysis; Conceptualization; Project administration; Resources; Supervision; Writing – review & editing.
Supplementary Material
The supporting online material figure and tables can be downloaded from: https://usf.box.com/s/2btqkd3plj7zs24cszj0n8n9cw471s3v
Conflicts of Interest
The Authors have nothing to declare.
- Received March 5, 2024.
- Revision received April 19, 2024.
- Accepted April 29, 2024.
- Copyright © 2024 The Author(s). Published by the International Institute of Anticancer Research.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).









