Abstract
Background/Aim: While there has been a rapid development in genomic data mining approaches for T-cell receptor recombinations (TcR), less emphasis has been placed on B-cell receptor (BcR) recombinations. Materials and Methods: We obtained lung cancer exome files from the cancer genome atlas (TCGA) and mined the files for TcR and BcR recombination reads. Results: There was a robust detection of BcR light chain recombination reads in lung adenocarcinoma (TCGA-LUAD) samples, and there was a correlation between the detection of light chain recombination reads and a more favorable outcome. This result was supported by analyses of the expression of B-cell markers as indicated by LUAD RNASeq files. Conclusion: BcR and TcR recombination reads recovered from LUAD WXS files, either alone or in combination with the human leukocyte antigen (HLA) type, are likely to have prognostic value.
- BcR gene recombinations
- IGK
- lung adenocarcinoma
- tumor specimen exomes
- CD19
- CD20
- CD79 B-cell markers
- survival rates
Tumor immunoscoring has become important for both prognosis and therapy decisions. For example, in numerous instances, high lymphocyte counts in the tumor micro-environment have been associated with a better prognosis (1-4). And, immunoscoring can facilitate decisions regarding expanding tumor infiltrating lymphocytes in culture, for re-administration to the patient (5, 6); and the use of immune checkpoint blockade therapies (7-10). The use of genomics technologies for immunoscoring may be more precise and cost-efficient. For example, in many settings, a whole exome sequence (WXS) is obtained from a surgically removed tumor specimen for identifying mutations for targeted therapies. Thus, the simultaneous opportunity to use WXS files for immunoscoring, i.e., the recovery of immune receptor recombination reads, could be more efficient, and in some cases potentially more accurate, than using other methods (11, 12). In such cases, there is the presumption that the use of the genomics files for obtaining expression levels of immune cell markers, or for obtaining indications of immune receptor recombinations, reflects immune activity. This presumption has been extensively substantiated by correlative data (13-20), although occasional situations of tumor cell expression of immune cell markers, and even recombinant immune function genes in tumor cells, could be an issue. In short, high quality benchmarking of genomics-based immunoscoring remains an important goal.
Thus, the future development of genomics-based immunoscoring is dependent on continued indications that genomics files can be processed in a way to acquire important information about patients and patient tumor growth. As noted, a substantial effort has been made along these lines for TcR recombinations and RNASeq based immune biomarkers (13-20). However, BcR gene recombinations, detected by genomics approaches, have been less well-studied. Past work has focused on such recombinations in tumors with a B-cell origin, and in two recent cases, breast cancer WXS indicated a relatively high level of BcR V(D)J recombinations (14, 19), compared to several other solid cancers. Here we extend the assessment of BcR recombination reads present in a solid tumor, namely lung adenocarcinoma, with clear indications that these assessments have potential for prognoses, consistent with prognosis information related to BcR VJ recombinations for a recent pancreatic cancer study (17).
Materials and Methods
Processing exome files for reads representing V(D)J recombinations. The nucleotide sequences for both V and J regions of human BcR were collected from NCBI, and additional allele sequences were obtained from IMGT/GENE –DB. Example V and J sequences, used by the search script, are listed in Table S1 of the supplementary material. Whole exome sequences (WXS) of lung adenocarcinoma were collected from the Genomic Data Commons (TCGA; LUAD), using a key file provided for dbGaP approved project number 6300. The download-manifests for the LUAD WXS files are shown in Table S2. A Module_Search_IgTcR shell script (Table S3) was written to execute individual search scripts that identify candidate V and J region sequences specific for each B-cell receptor gene. The search script contains a list of unduplicated 10 nucleotide long sequences close to the 3’ end of the V regions. The V region sequences used were 3, 5, 7, 9, and 11 nucleotides away from the 3’ end of the V sequence to avoid losses in the search due to N-region diversity. Similarly, 10 nucleotide long sequences from 3, 5, 7, 9, and 11 nucleotides away from the 5’ end of the J sequence were used (Table S1). The search script compared each read in the chromosomal segments of the TcR gene regions with the set of V and J sequences, and the reads containing a match for both V and J regions were deposited into a TSV file and were then the object of the script, IMGTSearchTCR.php. This php process queried the Immune GeneTics (IMGT) tool for determining whether the read was productive or unproductive and for determining how many nucleotides were an exact match to V and J regions, as well as the quality of the match. Complete match lists for detections of productive IGK and productive IGL recombination reads are shown in Tables S4 and S5. For this project, only recombinations containing greater than 20 nucleotide matches for both V and J regions were accepted as a productive or unproductive read. This relatively high standard was used to ensure the validity of the recombinations.
RNASeq data. The RNASeq Version 2 RNA-Seq by Expectation Maximazation (RNASeq V2 RSEM) scores for the genes indicated in the Results section were downloaded directly from cBioPortal.org (21, 22) (Table S6). The LUAD barcodes (samples) outputted by the above processing steps for recovering V-J recombination reads, were grouped (using Microsoft Excel) into the following categories: (i) all barcodes (samples), (ii) barcodes with productive IGL, (iii) barcodes with productive IGK, (iv) all barcodes excluding barcodes with productive IGK, (v) barcodes with co-detection of productive IGL and IGK, (vi) all barcodes excluding barcodes with co-detection of IGL and IGK. No group was established representing barcodes without IGL because the IGL barcode group, when compared with all remaining barcodes, in preliminary analyses, did not yield significant survival rate distinctions, as further indicated in the Results section. Moreover, the number of barcodes representing recovery of IGL was substantially less than the number of barcodes representing IGK. Thus, we emphasized IGK for single chain recovery comparisons in this report. The RNASeq V2 RSEM scores for each gene of interest (further discussed in the Results section) were then matched to the barcodes in each recombination read category indicated above, using the VLOOKUP function if Microsoft Excel. A t-test was performed to compare the RNA expression levels of each category, for each gene of interest, to provide the p-values. Box and whisker plots (Figure S1) were created for each gene of interest when comparing RNASeq values for the following four categories, representing recovery of IGK and IGL recombination reads from the WXS files, detailed above: iii, iv, v, and vi. The data for the CD19 gene are provided as an example (Table S7). (See also Tables S8-S11 for additional raw RNASeq data used in the Results section).
Survival data. Survival analyses were conducted in one of two ways, either by use of the cBioPortal.org web tool or the IBM statistical package for the social sciences (SPSS). In the case of the web tool, when needed for analyses, a subset of LUAD barcodes was compared against all remaining samples. When comparing two subsets of barcodes (Tables S8-S11) the IBM SPSS software was used. For the IBM SPSS software, the survival data for the different categories (of RNASeq values, Tables S8-S11, detailed in the previous Methods section) were downloaded from cBioPortal.org, and used as SPSS input (Table S12). Note, in most cases, where the web tool was used, the SPSS software was used for verification.
Obtaining the human leukocyte antigen (HLA) class I types for the LUAD barcodes. The HLA class I alleles for the LUAD barcodes were obtained from WXS files using the Optiype software (23) and results were verified with an independent analysis of several RNASeq files representing matching WXS barcodes.
Results
Results of processing the TCGA-LUAD WXS files for light chain, VJ BcR recombination reads indicated a robust detection of B-cell infiltration in the primary tumor specimen represented by the LUAD barcodes (samples) (Table I). Examples of productive IGK and IGL recombination reads, with the N-region nucleotides and the respective V- and J-gene segment usages are shown in Figure 1. The entire collection of productive recombination reads for IGK and IGL, along with the V- and J-gene segment usages, is available in Tables S4 and S5.
To support the tentative conclusion that the recovery of BcR VJ recombination reads is a measure of B-cell infiltration, RNASeq V2 RSEM values were obtained for the following genes (HUGO symbols), representing B-cell markers: CD19, C22, CD24, CD38, CD72, CD79A, CD79B, CD83, CD86, CR2, FCGR2, MS4A1, TNFRSF13B, TNFRSF13C, and TNFRSF17 (17, 24, 25). The expression level for each gene was then compared across these barcode groups: (i) all barcodes, (ii) barcodes representing productive IGL recombination reads, (iii) barcodes representing productive IGK recombination reads, (iv) all barcodes excluding those with productive IGK recombination reads, (v) barcodes representing co-detection of productive IGK and IGL recombination reads, and (vi) all barcodes excluding those with co-detection of IGK and IGL recombination reads. Results indicated that RNASeq values in the IGK and IGL groups (ii), (iii), and (v) were significantly higher than the RNASeq values in the corresponding control groups (i), (iv), and (vi), for all of the above B-cell marker genes, except for CD24 and CD83 (Table II, with a subset of comparisons in Figure 2; see also Figure S1). In the case of CD24, RNASeq value averages were higher in the group of barcodes representing co-detection of IGL and IGK recombination reads and in the barcode group representing recovery of productive IGK reads. However, there was no statistically significant difference indicated for these two comparisons. In the case of CD83, although the group of barcodes representing co-detection of IGK and IGL recombination reads had a higher RNA expression average, there was not a statistically significant difference. For the group of barcodes representing recovery of IGK productive recombination reads, there was a significantly higher CD83 expression level, in comparison to the remaining barcodes (p<0.033) (Table II, Figure 2, Figure S1).
To determine whether the co-detection of productive IGK and IGL recombination reads correlated with a better or worse survival outcome, we employed the cBioPortal.org web tool allowing a comparison of this co-detection barcode group (v) against all remaining LUAD barcodes, group (vi), using a Kaplan–Meier (KM) analysis. The results indicated that the IGK/IGL recombination read, co-detection group (v) had a significantly better overall survival rate (p<0.015), compared to all remaining barcodes (Figures S2, S3). To understand the drivers for the positive survival trend, barcodes containing productive IGK were plotted similarly against the remaining population in a KM curve, also using the above cBioPortal web tool, and a significantly better survival rate was observed for this IGK barcode group (p<0.018). To verify the results obtained from the cBioPortal tool, the above survival rate determinations were obtained independently using the IBM Statistical Package for the Social Sciences (SPSS) (Figures S3, S4).
To confirm the survival rate increases observed for the barcode groups representing the recovery of productive BcR VJ recombination reads, survival rates were obtained for barcode groups representing RNASeq values for the B-cell markers, CD19, CD20, CD79A, and CD79B. For each gene, the 20 percent of barcodes with the highest level of RNASeq values were plotted against the remaining population in a KM curve. The KM curve analyses were repeated with the 20 percent of barcodes representing the lowest level of expression for the B-cell markers, whereby these barcodes were plotted against all remaining LUAD barcodes (Figures S5-S8). The results indicated that the barcodes representing the top 20% of RNASeq values corresponded to significantly higher overall survival rates in the case of all four genes. Conversely, the barcodes representing the bottom 20% of the RNASeq values, for the B-cell marker genes, indicated significantly reduced overall survival rates, except in the case of CD79B (Table III). KM curves were then constructed to directly compare the barcode group associated with the top 20% to the group representing the bottom 20% of RNA expression for each of the four B-cell marker genes (Figure 3). A statistically significant difference in survival between the groups was found for all of the genes: The barcodes with high B-cell marker expression had better survival rates than the barcodes with low expression.
To determine whether additional factors are associated with better survival for LUAD patients with productive IGK detection in the WXS files, we focused on a barcode set representing an HLA-C*702 allele and TcR-β-J2-gene segment usage, as determined by recovery of TcR-β-J2 recombination reads from the LUAD WXS files. We considered this approach because of related projects whereby significant survival differences were associated with a particular HLA class I allele in combination with either J1 or J2 usage for TcR-β (13, 15, 16, 26, 27). In particular, patients with the HLA-C*702 allele/TcR-β-J2 usage combination have been shown to have significantly higher survival than the remaining patients (p<0.031; Figure S9) for LUAD. Barcodes representing both productive IGK and HLA-C*702 allele/TcR-β-J2 usage were identified, and KM analyses results indicated that this combination group had a significantly higher survival (Figure S9). Average survival time for this population increased compared to the barcodes with HLA-C*702 allele/TcR-β-J2 usage detection in the absence of IGK detection. Average overall survival in HLA-C*702 allele/TcR-β-J2 usage samples without IGK detection in the WXS files was about 30 months, and overall survival represented by barcodes with productive IGK, when the barcodes representing HLA-C*702 allele/TcR-β-J2 usage were removed, was about 34 months. Samples with recovery of IGK V-J recombination reads and the HLA-C*702 allele/TcR-β-J2 usage combination represented an average overall survival of about 50 months, i.e., a statistically significant difference when compared to either of the other two groups indicated above (Tables IV, S13).
Given that the cytotoxic T cell response is more commonly associated with an active immune response component for certain cancers, KM analyses were employed to compare barcodes representing productive TcR -α, -β, and unproductive TcR- β to the remaining population. No significant differences were found for any of these comparisons (Figures S10-11).
To further confirm and extend the basic findings above, we re-searched the LUAD tumor WXS files with a revised algorithm for identification of the recombination reads. In this revised, scripted algorithm, we allowed the assessment of a V-gene segment match to make use of any available germline nucleotides 3’ of the second V-gene segment, conserved cysteine. This is in contrast to the first approach used above (Methods), whereby all V-gene segment matches were made only with the available nucleotides, within the candidate recombination read, on the 5’ side of the conserved cysteine. This second approach was more robust and allowed more recovery of recombination reads representing IGH, where the read must have room for verifiable V- and J-gene segments, as well as the IGH D-gene segment and the remainder of the junction, non-germline amino acid sequences (28). Thus, we assessed survival rates represented by the LUAD barcodes representing recovery of all three BcR gene recombination reads in comparison to the survival rate represented by all remaining barcodes. Results indicated that the barcodes representing the recovery of IGH, IGL or IGK recombination reads, as a group, also represented a statistically significant greater survival rate (log rank p=0.030, Figure S12).
Given the robustness of this second BcR read recovery algorithm, we next searched the LUAD blood WXS files for BcR recombination reads and compared survival rates represented by barcodes also representing BcR recombination read recovery and all remaining barcodes. The results indicated that blood sample WXS files yielding IGH recombination reads also represented LUAD patients with a higher survival rate (KM log rank p=0.030, Figure S13).
Discussion
Immunoscoring, being both cost-effective and timely, has demonstrated its value in diagnosis, prognosis, and treatment of cancer. With increasing prevalence of tumor exome sequencing in both lab and clinical settings, the data available for genomics-based immunoscoring is growing rapidly. This provides further opportunities to discover useful functions of immunoscoring in general, as well as to better identify patient sub-populations based on parameters uniquely available from genomics approaches.
This study reports increased algorithm efficiency and result output, as well as correlations of specific B-cell receptor recombinations with statistically significant increases in survival. The B-cell receptor recombination extraction software utilized in this research was a modification of the method used in previous studies (11, 12). The current software is capable of detecting more recombinations by increasing V and J sequence searches near the N-region diversity section of the recombination (28, 29).
The above WXS file searches yielded recovery of productive IGK recombination reads for nearly 25% of the lung adenocarcinoma barcodes, and over 22% of the WXS files contained unproductive IGK recombination reads, in addition to productive and unproductive IGL recombination reads. The presence of the IGK and IGL recombinations were consistent with the high level of RNA expression of B-cell specific genes, such as CD19, CD20, and CD79. Overall, there was a significantly higher level of B-cell specific gene expression for the barcodes representing recovery of IGK and IGL recombination reads.
The B-cell RNASeq data provided survival correlations that mirrored the survival correlations with the recovery of BcR IGK/IGL recombination reads, when the barcodes were distinguished by upper and lower RNASeq quintiles. This is in contrast to a recent case of pancreatic cancer, where BcR IGK/IGL recombination reads strongly correlated with a worse survival outcome and where B-cell marker RNASeq data did not reveal statistically significant associations with worse outcome (17). In this latter case there was the tentative conclusion that the recovery of the BcR recombination reads represented a higher standard for B-cell presence in the tumor and thereby provided the opportunity for the statistically significant survival correlation. In this current case of LUAD, the recovery of BcR recombination reads represents an absolute standard, whereas the comparison to RNASeq involved the arbitrary selection of the barcodes at the upper and lower quintiles. Thus, in both cases, there is an apparent extra value in using the recovery of the BcR recombination reads for the patient-survival assessments. In addition, the WXS provides a simultaneous opportunity to obtain TcR recombination reads and HLA types, which in this case, when overlapped with BcR recombination read recovery data, indicated yet another distinct, survival rate group (Figure S9).
The association between the better survival rates and recovery of the BcR recombination reads in LUAD is consistent with a previous study whereby IGKC (constant region) expression in non-small cell lung cancer also correlated with a better outcome (30). Indeed, the association between better outcomes and B-cell genomics markers may not be limited to lung cancer. Previous work has indicated that breast cancer exome files are significantly enriched in BcR recombination reads (19), and subsequent work has correlated recovery of BcR recombination reads from breast cancer exome files with better survival rates (14). And, another report has also indicated the association of IGKC with better outcomes in breast cancer (31). Overall, these results raise the question of what might distinguish cancer with B-cell infiltrates where the infiltrate represents a better outcome, versus cancers where the B-cell infiltrate represents a worse outcome? Also, what antigen(s) might a BcR be targeting in the lung adenocarcinoma setting?
Finally, the above results also indicated the potential for blood exome files to be used for recovery of BcR recombination reads, followed by the correlation of such recoveries with clinical features (Figure S13). This result is reported for the first time, and raises questions about the links between the detection of the lymphocyte BcR recombination in blood exome files and the course of disease. For example, does detection of the BcR recombination reads simply represent higher numbers of B-cells present in the blood samples of certain patients, such that the preparation of a typical exome has higher likelihood of including BcR recombination DNA? And if so, what are the mechanistic connections between the higher numbers of B-lymphocytes and disease course? Regardless, these initial approaches using blood exome files indicate that, at least when conducting a sufficiently large study, it is possible to generate statistically significant correlative data, i.e., with immune receptor recombinations and clinical features.
Acknowledgements
The Authors wish to thank USF research computing and the taxpayers of the State of Florida. Blake M. Callahan was a recipient of a Bonati research stipend.
Footnotes
Authors' Contributions
Yaping N. Tu conceived of plan, conducted most of the basic analyses, and wrote several early drafts of the manuscript. Wei Lue Tong, Blake M. Callahan, and Boris I. Chobrutskiy wrote and applied software; and provided some analyses of data. George Blanck supervised the project, assisted in the data analyses, and finalized manuscript draft.
This article is freely accessible online.
Supplementary Material
All supplementary material (Tables S1-13 and Figures S1-13) is freely available at: http://www.universityseminarassociates.com/media/Tu%20et%20al%20supporting%20online%20material%202020.pdf
Conflicts of Interest
The Authors declare that they have no conflicts of interest regarding this study.
- Received January 27, 2020.
- Revision received February 24, 2020.
- Accepted March 4, 2020.
- Copyright© 2020, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved