Original Article Open Access
Copyright ©2011 Baishideng Publishing Group Co., Limited. All rights reserved.
World J Gastroenterol. Apr 28, 2011; 17(16): 2096-2103
Published online Apr 28, 2011. doi: 10.3748/wjg.v17.i16.2096
Chemometrics of differentially expressed proteins from colorectal cancer patients
Lay-Chin Yeoh, Saravanan Dharmaraj, Boon-Hui Gooi, Manjit Singh, Lay-Harn Gam
Lay-Chin Yeoh, Lay-Harn Gam, School of Pharmaceutical Sciences, Universiti Sains Malaysia, Penang, 11800, Malaysia
Saravanan Dharmaraj, Centre for Drug Research, Universiti Sains Malaysia, Penang, 11800, Malaysia
Boon-Hui Gooi, Manjit Singh, Department of Surgery, Penang General Hospital, Penang, 10990, Malaysia
Author contributions: Gam LH conceived the design of the study and edited the manuscript; Yeoh LC carried out the experimental work and manuscript writing; Dharmaraj S carried out the statistical analyses; Gooi BH and Singh M provided the colorectal cancer specimens and patient information.
Supported by Research Universiti Grant, Grant No. 1001/PFARMASI/815007
Correspondence to: Lay-Harn Gam, PhD, School of Pharmaceutical Sciences, Universiti Sains Malaysia, Penang, 11800, Malaysia. layharn@usm.my
Telephone: +60-4-6533888 Fax: +60-4-6570017
Received: August 13, 2010
Revised: September 18, 2010
Accepted: September 25, 2010
Published online: April 28, 2011

Abstract

AIM: To evaluate the usefulness of differentially expressed proteins from colorectal cancer (CRC) tissues for differentiating cancer and normal tissues.

METHODS: A Proteomic approach was used to identify the differentially expressed proteins between CRC and normal tissues. The proteins were extracted using Tris buffer and thiourea lysis buffer (TLB) for extraction of aqueous soluble and membrane-associated proteins, respectively. Chemometrics, namely principal component analysis (PCA) and linear discriminant analysis (LDA), were used to assess the usefulness of these proteins for identifying the cancerous state of tissues.

RESULTS: Differentially expressed proteins identified were 37 aqueous soluble proteins in Tris extracts and 24 membrane-associated proteins in TLB extracts. Based on the protein spots intensity on 2D-gel images, PCA by applying an eigenvalue > 1 was successfully used to reduce the number of principal components (PCs) into 12 and seven PCs for Tris and TLB extracts, respectively, and subsequently six PCs, respectively from both the extracts were used for LDA. The LDA classification for Tris extract showed 82.7% of original samples were correctly classified, whereas 82.7% were correctly classified for the cross-validated samples. The LDA for TLB extract showed that 78.8% of original samples and 71.2% of the cross-validated samples were correctly classified.

CONCLUSION: The classification of CRC tissues by PCA and LDA provided a promising distinction between normal and cancer types. These methods can possibly be used for identification of potential biomarkers among the differentially expressed proteins identified.

Key Words: Colorectal cancer, Proteomics, Marker protein, Principal component analysis, Linear discriminant analysis



INTRODUCTION

Proteomic research has made great achievements in biomarker discovery, especially when incorporated with high-throughput analytical tools and technology, for example 2D-PAGE and LC-MS/MS[1]. Two-dimensional gel electrophoresis is a fundamental tool for protein analysis to detect alterations in protein expression between control and disease states of cells, which can lead to the discovery of various biomarkers that contribute to pathogenesis or carcinogenesis[2]. Biomarkers can be used to discriminate variables for subsequent classification of normal and diseased groups[3]. The complexity of variables generated by mass spectra, microarray and immunohistochemistry often requires advanced statistical techniques or chemometrics to evaluate their clinical value.

Multivariate analyses including the dimension reduction method known as principal component analysis (PCA), and classification methods such as linear discriminant analysis (LDA) are often employed in proteomic studies. PCA reduces the number of variables for further data analysis and interpretation while identifying the variables that retain most of the data variance[4]. A principal component (PC) is defined as a new variable to explain the maximum amount of variance in the original data and corresponds to a linear combination of the original variables. PCs are presented orthogonally to each other, which provides a more effective representation of the data than the original variables[2]. LDA is a multivariate technique to classify observations into groups or categories. LDA forms new variables from the original data and identifies the variables that provide the best discrimination between the groups[5].

Djidja et al[6] have used a novel approach that combines matrix-assisted laser desorption ionization-ion mobility separation-mass spectrometry (MALDI-IMS-MS) and PCA-discriminant analysis (PCA-DA) to generate tumor classification models based on pancreatic cancer protein patterns. Furthermore, Kamath et al[7] have used PCA-based k-nearest neighbor analysis to classify normal and cancerous autofluorescence spectra of colonic mucosal tissues. Zwielly et al[8] have investigated the use of Fourier transform infrared microscopy for colon cancer diagnosis. Their model uses PCA to define spectral changes among normal and cancerous human biopsied colon tissues. Ragazzi et al[9] have reported the use of multivariate techniques on plasma proteins to diagnose colorectal cancer (CRC). The plasma protein profile generated by MALDI-MS is analyzed by PCA and LDA to discriminate ionic species from normal subjects and CRC patients.

In this study, we carried out the comparison of 2-D images of cancerous and normal colorectal tissues. The differentially expressed proteins from Tris and thiourea lysis buffer (TLB) extractions were respectively tested on a PCA-LDA model to find out the possibility of using protein expression to classify the disease and non-disease tissues of CRC.

MATERIALS AND METHODS
Tissue specimen collection

Matching pairs of normal colonic mucosa and cancerous colonic tissue (located 10 cm from each other) from 26 CRC patients were collected after surgery at the Penang General Hospital, Penang, Malaysia. The study was approved by the Human Ethical Committee of Universiti Sains Malaysia. Informed written consent was received from all patients before the study was conducted. Prior to surgery, the patients did not receive preoperative neoadjuvant chemotherapy and radiotherapy. The tissues were confirmed as cancerous and normal, respectively, by the hospital’s pathologist. The cancerous tissues were classified using the TNM system. Surgically removed samples were stored at -80°C until use.

Protein analysis

The method of protein analysis was as described in Yeoh et al[10]. Frozen tissue (250 mg) was rinsed in distilled water to remove cell debris and excess blood. The tissues were homogenized in ice-cold Tris buffer (0.5 g tissue/mL buffer) [40 mmol/L Tris and 1 × Protease Inhibitor Cocktail (Sigma, St Louis, MO, USA)] and centrifuged at 12 000 rpm for 15 min at 18°C. The supernatant was recovered and labeled as Tris extract. The pellet was subjected to further extraction using TLB (1 g tissue/1 mL buffer) [8 mol/L urea, 2 mol/L thiourea, 4% (w/v) CHAPS, 0.4% (w/v) carrier ampholytes and 50 mmol/L dithiothreitol] and centrifuged at 12 000 rpm for 15 min at 18°C. The supernatant was recovered and labeled as TLB extract. The extracts were subjected to 2D gel separation on 11 cm ReadyStrip™ IPG strip (linear pH 4-7, Bio-Rad, USA) followed by separation on 10% (w/v) PAGE at a constant voltage of 200 V. The gels were stained with Coomassie Blue. The images obtained were analyzed by PDQuest version 7.3 (Bio-Rad). Comparison of the protein expression levels was carried out between cancerous and normal tissues. Differentially expressed proteins were defined as proteins with a spot intensity that was 1.5-fold higher or lower in cancerous tissues when compared to that in the corresponding normal tissues. A differentially expressed protein was defined as upregulated when it was found at greater intensity in cancerous tissue than in the corresponding normal tissue. The downregulated proteins were detected at greater intensity in normal tissues than in the corresponding CRC cancerous tissues.

Protein identification

The differentially expressed proteins were excised from the gel and subjected to in-gel digestion using trypsin and the tryptic peptides were analyzed by LC/MS/MS using an electrospray ionization ion trap mass analyzer (Agilent Technologies, Santa Clara, CA, USA). The MS/MS data were subjected to the MASCOT protein database search engine for protein identification. The identities of a few proteins (dependent on the availability of antibodies) were further confirmed using western blotting.

Statistical analysis

The differential expression of the proteins was tested by the paired Student’s t test that is included in PDQuest, to determine their statistical significance (P < 0.05). For PCA and LDA, the protein spot intensities were exported out from PDQuest and imported into SPSS version 15.0 (Chicago, IL, USA) to perform multivariate analyses. Protein spot intensities were used as variables.

RESULTS

The tissues specimens from each patient were collected in pairs of cancerous and normal tissues. Table 1 shows the details of the tissues used in the analysis. The tissues were subjected to a sequential extraction method to extract aqueous soluble proteins and membrane-associated proteins in two different fractions using Tris and TLB, respectively. Tables 2 and 3 show the 37 and 24 differentially expressed proteins identified in Tris and TLB extracts, respectively. The average fold change indicates the degree of differentiation in expression levels of the protein in cancerous tissues compared to normal tissues in all the patients tested, where a positive sign indicates a greater expression level in cancerous tissues, whereas a negative sign indicates a greater expression level in normal tissues. The MOWSE score refers to the score values given by the MASCOT search. Tables 4 and 5 show the mean intensity of spots and SD, and percentage coefficient of variation (%CV) of spot intensity of differentially expressed proteins in all patients for Tris and TLB extracts, respectively. An example of the differentially expressed protein, as represented by different intensities of protein spots between normal and cancerous tissues for glutathione S-transferase P (GST-P), is shown in Figure 1; the bar chart was plotted according to the intensity of the respective protein spots. GST-P was detected as upregulated in cancerous tissues.

Table 1 Clinicopathological features of 26 colorectal cancer patients involved in study.
Patient No.Age (yr)RaceSexpTNMStageDegree of differentiationTumor location
162MalayMalepT3N1MxIIIBMDSigmoid colon
279MalayMalepT2NoM0IMDDescending colon
374MalayMalepT3N0M0IIAMDAscending colon
4-MalayMalepT3N2MxIIICMDRectum
537MalayMalepT3N0M0IIAMDTransverse colon
658MalayFemalepT3N0MxIIAMDRecto-sigmoid
759MalayFemalepT4N2MxIIICMDIleocecal
869MalayMalepT3N0MxIIAMDSigmoid colon
963MalayFemalepT3N0MxIIAMDRecto-sigmoid
1084ChineseFemalepT4N0M0IIBMDRectum
1158ChineseMalepT3N0MxIIAMDRecto-sigmoid
Figure 1
Figure 1 Comparison of protein spot intensity between normal and colorectal cancer tissues for glutathione S-transferase P.
Table 2 List of proteins found in 2D gel of Tris extracts.
Spot No.Protein nameSwissprot No.1MOWSE score2MW (Da)pISequence coverage (%)GRAVYAverage fold change3
1Proteasome subunit β type 6P28072134255734.80160.034-2.967
214-3-3 protein ζP63104336355676.9740-0.74411.659
3Tropomyosin α-3C-like proteinA6NL28127274074.7131-0.99244.183
4Rho GDP-dissociation inhibitor 1P52565167231205.0329-0.700-7.607
514-3-3 protein ζP63104282279194.7316-0.6214.127
6Tubulin β-2C chainP68371524503044.8340-0.362-52.184
7Cathepsin BP0785874229815.2018-0.43333.149
8Rho GDP-dissociation inhibitor 2P5256648229015.1018-0.799-10.625
9SEC13 homologP5573578360405.229-0.3726.873
10Hsc70-interacting proteinP50502164284648.9221-0.65320.959
11Apolipoprotein A-IP02647143307775.5626-0.717-4.478
12Proteasome subunit α type 3P25788201159586.82410.0084.249
13Actin, cytoplasmic 2P63261105261695.6514-0.15628.601
1460 kDa heat shock proteinP10809151613485.7014-0.074131.219
15Peroxiredoxin-2P32119283219355.6742-0.2101.250
16Guanine nucleotide binding protein subunit β 2P62879112379545.6011-0.183-14.442
17F-actin-capping protein subunit βP47756259341876.0237-0.57433.554
18GST-PP09211730234425.4460-0.1314.834
19Haptoglobin-related proteinP0073949395296.423-0.30856.209
20Cathepsin ZQ9UBR2100277875.4815-0.545-60.766
21F-actin-capping protein subunit βP47756245212807.9334-0.54013.278
22Actin-related protein 3P61158148477045.6127-0.27115.881
23Abhydrolase domain-containing protein 14BQ96IU4200254296.8226-0.0230.765
24Nucleoside diphosphate kinase AP1553187198735.4236-0.07573.120
25L-lactate dehydrogenase B chainP07195228369285.71140.0563.513
26Fibrinogen β chainP02675151566248.5422-0.75841.329
27Leukocyte elastase inhibitorP30740170428575.9011-0.24910.458
28PDI A3P30101674572025.9835-0.5067.579
29GelsolinP06396238861035.9020-0.415-11.917
30Heat shock 27 kDa proteinP04792256228405.9847-0.567-1.508
31DJ-1 proteinQ99497122200796.33540.0044.981
32Fibrinogen β chainP0267575566248.5422-0.758-72.722
33Selenium-binding protein 1Q13228502529715.9321-0.254-26.544
34Selenium-binding protein 1Q13228592529385.9330-0.25427.403
35Selenium-binding protein 1Q13228979529385.9337-0.254-1.887
36Leukotriene A-4 hydrolaseP09960215697925.8022-0.25929.759
37Proteasome subunit α type 6P6090071209888.5739-0.2470.768
Figure 2
Figure 2 Principal component plot of Tris proteins.
Table 3 List of proteins found in 2D gel in thiourea lysis buffer extracts.
Spot No.Protein nameSwissProt No.1MOWSE score2MW (Da)pISequence coverage (%)GRAVYAverage fold change3
1Tropomyosin α-4 chainP67936139285064.6733-1.033-51.151
2Putative tropomyosin α-3-chain-like proteinA6NL2853274074.7125-0.9924.922
3GC1q-R, mitochondrialQ07021123317684.7420-0.461-3.333
4CalreticulinP2779773470924.3011-1.1911.394
5ProhibitinP35232421298905.57410.0240.032
6Heat shock 70 kDa proteinP11021775724885.0742-0.487-32.940
7Tubulin β-2C chainP68371299481424.7025-0.347-9.060
8PDIP07237266575104.8242-0.450-1.515
9ATP synthase subunit β, mitochondrialP065761096565595.26430.018-15.661
10ATP synthase D chainO75947117184065.2232-0.569-5.129
11Chloride intracellular channel protein 1O00299299271235.0930-0.29320.288
12Tubulin α-1 chainQ71U3661508004.946-0.229-30.291
13Apolipoprotein A-IP02647129280785.2737-0.84078.135
14Actin, cytoplasmic 2P6326152420095.314-0.205-26.716
15Actin, aortic smooth muscleP62736261421545.2321-0.23346.181
16Stomatin-like protein 2Q9UJZ1151386446.8828-0.161-29.709
1760 kDa heat shock protein, mitochondrialP10809451613865.7028-0.07414.023
18Triosephosphate isomeraseP60174167268286.5124-0.12616.757
19Annexin A5P08758195359944.9439-0.330-2.019
20Cytochrome b-c1 complex subunit 1, mitochondrialP3193096533425.9418-0.14113.151
21Annexin A3P12429140363965.6322-0.43031.244
22Annexin A4P09525165359835.8533-0.44711.890
23α-enolaseP06733143473856.9912-0.22685.960
24Lamin-A/CP02545198651926.4025-0.947-3.378
Figure 3
Figure 3 Scree plot showing principal components and their eigenvalues in Tris extracts.
Table 4 mean ± SD and percentage coefficient of variation of spot intensities of Tris proteins.
Protein spot No.Intensity of spots(mean ± SD)% CV of spot intensity
12565.84 ± 2247.8687.60
23865.47 ± 3766.1197.42
32424.01 ± 1847.7176.23
44957.17 ± 2923.4958.97
53901.55 ± 3900.5299.97
62105.64 ± 2444.14116.08
72572.91 ± 1765.2868.61
82959.95 ± 2177.8673.58
92478.29 ± 1697.9868.51
101253.48 ± 1472.88117.50
113373.93 ± 2451.3572.66
123247.26 ± 2519.2677.58
13  9413.58 ± 10 685.11113.51
142735.49 ± 2665.8597.45
158354.35 ± 4824.5957.75
167370.39 ± 7935.67100.34
1714 200.72 ± 16 194.91114.04
186254.81 ± 5105.5481.63
1914 364.73 ± 10 849.7775.53
2010 753.33 ± 14 509.06134.93
215171.49 ± 3304.1263.89
223230.12 ± 1905.2458.98
232114.69 ± 1164.1955.05
242331.41 ± 2122.5691.04
259254.07 ± 4830.0152.19
269118.41 ± 9336.23102.39
273750.45 ± 3869.35103.17
288098.16 ± 5450.7967.31
293984.55 ± 2658.1266.71
304236.70 ± 4229.7499.84
313932.80 ± 2507.8863.77
321681.49 ± 2019.10120.08
336600.04 ± 4860.8573.65
343121.51 ± 2694.5886.32
358587.77 ± 5871.4068.37
36939.46 ± 1682.25179.07
373780.67 ± 1967.0552.03
Figure 4
Figure 4 Principal component plot of thiourea lysis buffer proteins.
Table 5 mean ± SD and percentage coefficient of variation of spot intensities of thiourea lysis buffer proteins.
Protein spot No.Intensity of spots(mean ± SD)% CV of spot intensity
110 918.80 ± 8005.0973.31
2  8516.42 ± 7898.3392.74
3  3986.45 ± 3471.5187.08
4  36 146.18 ± 24 859.8468.78
513 329.50 ± 7123.2053.44
6  4091.51 ± 4636.51113.32
7  6512.40 ± 6048.7392.88
813 401.28 ± 8031.4359.93
9  24 196.99 ± 14 907.6461.61
10  4861.29 ± 4327.7189.02
11  4128.52 ± 3764.1891.18
12  3522.46 ± 2821.8480.11
13  9624.81 ± 8295.5286.19
14  5407.19 ± 5270.1797.47
15  4683.89 ± 6994.94149.34
16  2633.26 ± 2593.9198.51
17  10 104.77 ± 10 369.91102.62
18  16 086.82 ± 19 928.39123.88
19  6791.99 ± 5063.2174.55
20  7596.19 ± 4759.4962.66
21  2685.37 ± 3298.54122.84
22  5022.01 ± 3735.7474.39
23  5957.62 ± 7526.42124.65
24  2323.67 ± 2269.6297.67
Figure 5
Figure 5 Scree plot showing principal components and their eigenvalues in thiourea lysis buffer extracts.
Data analysis

The significance of the expression levels of the differentially expressed proteins in both Tris and TLB extracts was analyzed by Student’s t test. After univariate analysis was performed, the normalized intensities of 37 differentially expressed protein spots in Tris extracts were subjected to PCA. The PCA reduced the original data to 12 PCs based on an eigenvalue of > 1, and these 12 PCs contributed 76.43% of the total data variance of the Tris extract data. Figure 2 shows the 3D PC plot with the x- y- and z-axes representing the first, second and third PC number. The variables that had the highest loadings were those that contributed most to the differentiation of the disease state. Figure 3 shows the scree plot of Tris extracts. Six PCs were chosen and these components contributed 53.97% of the total variance of the Tris extract data. Table 6 shows the LDA results for Tris extract proteins, where 22 out of 26 original normal tissues, and 21 out of 26 original cancer tissues were correctly classified. In cross-validated samples, 22 out of 26 normal tissues and 21 out of 26 cancer tissues were correctly classified. Both original and cross-validation samples had an average 82.7% correct classification.

Table 6 Percentage of correct classification of normal and colorectal cancer tissues in Tris extracts using linear discriminant analysis.
TypePredicted group membership
% correct classification
CancerNormal
Original count
Cancer (26)21582.7
Normal (26)422
Cross-validated count
Cancer (26)21582.7
Normal (26)422

Figure 4 shows the 3D view of the PCs plot for the TLB extract. PCA reduced the original data of the TLB extract to seven PCs based on an eigenvalue one of > 1, and the seven PCs accounted for 72.46% of the total data variance. The 3D view indicates that tissues can be grouped according to CRC disease state. Figure 5 shows the scree plot of the TLB extracts. Six PCs were chosen based on the slope of scree plot, which contributed 67.61% of the total data variance of TLB extracts. Table 7 shows the LDA results of TLB extracts, where 22 out of 26 original normal tissues, and 19 out of 26 original cancerous tissues were correctly classified. In cross-validated samples, 21 out of 26 normal tissues and 16 out of 26 cancerous tissues were correctly classified. The average percentages of correct classification for original and cross-validation samples were 78.8% and 71.2%, respectively.

Table 7 Percentage of correct classification of normal and colorectal cancer tissues in thiourea lysis buffer extracts using linear discriminant analysis.
TypePredicted group membership
% correct classification
CancerNormal
Original count
Cancer (26)19778.8
Normal (26)422
Cross-validated count
Cancer (26)161071.2
Normal (26)521
DISCUSSION

The expression levels of the differentially expressed protein between colorectal cancerous and normal tissues were analyzed using PCA based on a multivariate analysis approach, to assess their usefulness in classifying colorectal tissues as cancerous or normal. The differentially expressed proteins identified showed good consistency in their expression levels in cancerous and normal tissues. The proteins were extracted in two fractions according to their polarities. In the PCA-LDA model, the selected proteins from the first few PCs were able to discriminate colorectal tissues with and without CRC.

A scree plot was derived by plotting the eigenvalues against the PC number. The shape of the plot was used to evaluate the number of PCs to be retained. In general, the point at which the scree plot straightens out indicates the number of PCs to be extracted[11]. Cross-validation is a method to estimate the accuracy of a predicted classification model if performed using new future data sets (samples); this is because a classification model is considered incomplete until the prediction error is estimated[12]. One method of cross validation is leave-one-out cross-validation, where one sample from the data set of N samples is removed, the discriminant rule is recalibrated, and a classification model is built based on the remaining N - 1 data. The one sample that is left out is classified in this model and the process repeated N times[12].

PCA and LDA results from Tris extract indicated that six out of 37 proteins were reliable to determine the tissues with CRC. The proteins comprised five upregulated proteins, namely GST-P, tropomyosin α-3C-like protein, F-actin capping protein subunit β, selenium binding protein 1 and DJ-1 protein, and one downregulated protein, namely, proteasome subunit β type 6. DJ-1 protein and GST-P contributed the most to the first PC based on the weight of their loadings. This was followed by the tropomyosin α-3C-like protein and proteasome subunit β type 6 that contributed to the second PC, while F-actin capping protein subunit β and selenium binding protein 1 contributed to the third PC. The initial PCA reduced the original data and therefore enabled LDA to be carried out because LDA is sensitive to the number of variables. In LDA, the six PCs chosen were shown to be capable of predicting whether the tissues were with or without CRC. Two-way validation by using original and cross-validation analyses was applied to validate the state of the tissues, where the cancerous and normal tissues were classified correctly at 82.7% for both original and cross-validation samples.

Two proteins that contributed most to PC1 in Tris extract were DJ-1 and GST. DJ-1 is a putative oncoprotein that is able to transform cells with H-Ras[13]. Overexpression of DJ-1 activates protein kinase B, which subsequently increases cell survival. Furthermore, increased DJ-1 expression also activates Nrf2 (nuclear factor erythroid 2-related factor), which in turn increases expression of antioxidant enzymes that confer a survival advantage to tumor cells[14]. Upregulation of DJ-1 protein in esophageal squamous cell carcinoma is correlated with lymph node metastasis[15]. Although there is no reported role of DJ-1 in CRC, its upregulation in CRC is undeniable, and we have shown that its expression can be used to discriminate between CRC cancerous and normal tissues.

GST catalyzes the conjugation of reduced glutathione to electrophiles[16]. GST functions to remove peroxides from endogenous compounds such as lipids and DNA[17]. Overexpression of GST-P1 in CRC may be involved in cell proliferation, differentiation and apoptosis[18]. GST-P1 is overexpressed in liver cancer cells[19].

In TLB extract, six of the 24 differentially expressed proteins identified were found to be useful in discriminating CRC cancerous from normal tissues. These proteins were protein disulfide isomerase (PDI), complement component 1 Q subcomponent-binding protein (GC1q-R), chloride intracellular channel protein 1, triosephosphate isomerase, annexin A5 and actin cytoplasmic 2. All the proteins were downregulated in TLB extracts, except chloride intracellular channel protein 1 and triosephosphate isomerase. PDI and GC1q-R contributed the most to the first PC based on the weight of their loadings. This was followed by the chloride intracellular channel protein 1 and triosephosphate isomerase that contributed the most to the second PC, while annexin A5 and actin cytoplasmic 2 contributed most to the third PC. In LDA, the six PCs that explained 67.61% of the total variance were able to distinguish CRC cancerous from normal tissues. The leave-one-out cross-validation obtained 71.2% correct classification of normal and cancerous tissues. The value for original grouped samples was higher with 78.8% correct classification.

Two proteins that contributed most to PC1 in TLB extracts were PDI and GC1q-R. PDI catalyzes the formation and breakage of disulfide bonds between two cysteine residues[20]. PDI regulates cell transformation and intracellular and extracellular redox activities via its reductase activity[21]. PDI regulates STAT3 signaling and proliferation, which is thought to induce malignancy[22]. PDI is upregulated in CRC cell lines and its upregulation is correlated with cancer cell differentiation[23,24].

GC1q-R is a cell surface glycoprotein, which binds to the globular heads of C1q molecules[25]. C1q molecules bind to a variety of cells such as B cells, monocytes, macrophages, endothelial and smooth muscle cells[26]. C1q elicits responses such as phagocytosis in monocytes and activation of tumor cytotoxicity of macrophages[27,28]. GC1q-R is overexpressed in colon cancer cells and may be involved in tumor metastasis. However, PDI and GC1q-R were downregulated when using average fold change to determine their expression levels.

Proteins are the expression components that regulate cell activity. Differential expression of proteins is expected upon transformation of normal cells to cancerous cells. These differentially expressed proteins are useful in diagnosis and prognosis of the disease. In the present study, the specimens used in the analysis comprised tissues from female and male patients who were diagnosed with various stages, grades and locations of CRC. Regardless of the sex of the patients and pathological specification of the tissues, we showed that the differentially expressed protein identified from 2D protein profiles of cancerous and normal tissues could be used to separate and classify normal and cancerous tissues by combining PCA and LDA. The data reduction technique of PCA was sufficient to provide a classification of tissues according to CRC disease state. These statistical models simplify the data management through the reduced dimensionality of protein spots from the 2D gel images. Therefore, multivariate analysis of differentially expressed proteins identified from cancerous and normal tissues may be used as a tool for diagnosis and prognosis of CRC disease state.

COMMENTS
Background

Colorectal cancer (CRC) is one of the leading causes of death worldwide. Differentially expressed proteins between cancerous and normal colonic tissues were identified using 2D gel separation followed by LC/MS/MS analysis. The protein spot intensities of the 2D gel images were analyzed using principal component analysis (PCA) and linear discriminant analysis (LDA) for their possible use in classification of disease state.

Research frontiers

Multivariate analyses, including the dimension reduction method known as PCA and classification methods such as LDA, are used in cancer proteomic studies to identify the protein variables that provide the best discrimination between the cancerous and normal tissues.

Innovations and breakthroughs

The authors used sequential protein extraction to extract aqueous soluble and membrane-associated proteins from colorectal tissues. Differentially expressed proteins were analyzed using a combination of PCA and LDA to determine their usability in differentiating normal and cancerous colonic tissues. Using this method, the authors successfully classified the tissues according to their respective types. DJ-1 protein and glutathione S transferase P1 of the aqueous soluble proteins, protein disulfide isomerase and complement component 1 Q subcomponent-binding protein of the membrane-associated proteins gave the best classification of the tissues.

Applications

The identified biomarkers may be used for the diagnosis and prognosis of CRC.

Terminology

Chemometrics is defined as the information aspects of complex biological and chemical systems. Chemometrics utilize mathematical, statistical or formal logic-based methods to extract chemical information, which in this case, is for biomarker discovery.

Peer review

This study investigated the use of PCA and LDA of differential protein expression between normal and cancerous tissues for classification of disease state. The method gave good classification of cancerous and normal colonic tissues.

Footnotes

Peer reviewer: Ki-Baik Hahm, MD, PhD, Professor, Gachon Graduate School of Medicine, Department of Gastroenterology, Lee Gil Ya Cancer and Diabetes Institute, Lab of Translational Medicine, 7-45 Songdo-dong, Yeonsu-gu, Incheon, 406-840, South Korea

S- Editor Sun H L- Editor Kerr C E- Editor Zheng XM

References
1.  Cowan ML, Vera J. Proteomics: advances in biomarker discovery. Expert Rev Proteomics. 2008;5:21-23.  [PubMed]  [DOI]  [Cited in This Article: ]
2.  Rodríguez-Piñeiro AM, Rodríguez-Berrocal FJ, Páez de la Cadena M. Improvements in the search for potential biomarkers by proteomics: application of principal component and discriminant analyses for two-dimensional maps evaluation. J Chromatogr B Analyt Technol Biomed Life Sci. 2007;849:251-260.  [PubMed]  [DOI]  [Cited in This Article: ]
3.  Hilario M, Kalousis A. Approaches to dimensionality reduction in proteomic biomarker studies. Brief Bioinform. 2008;9:102-118.  [PubMed]  [DOI]  [Cited in This Article: ]
4.  Karson MJ Multivariate statistical methods: An introduction. Iowa: Iowa State University Press 1982; 159, 191.  [PubMed]  [DOI]  [Cited in This Article: ]
5.  Giri NC Multivariate statistical analysis. New York: Marcel Dekker 1996; 293-294.  [PubMed]  [DOI]  [Cited in This Article: ]
6.  Djidja MC, Claude E, Snel MF, Francese S, Scriven P, Carolan V, Clench MR. Novel molecular tumour classification using MALDI-mass spectrometry imaging of tissue micro-array. Anal Bioanal Chem. 2010;397:587-601.  [PubMed]  [DOI]  [Cited in This Article: ]
7.  Kamath SD, Mahato KK. Principal component analysis (PCA)-based k-nearest neighbor (k-NN) analysis of colonic mucosal tissue fluorescence spectra. Photomed Laser Surg. 2009;27:659-668.  [PubMed]  [DOI]  [Cited in This Article: ]
8.  Zwielly A, Mordechai S, Sinielnikov I, Salman A, Bogomolny E, Argov S. Advanced statistical techniques applied to comprehensive FTIR spectra on human colonic tissues. Med Phys. 2010;37:1047-1055.  [PubMed]  [DOI]  [Cited in This Article: ]
9.  Ragazzi E, Pucciarelli S, Seraglia R, Molin L, Agostini M, Lise M, Traldi P, Nitti D. Multivariate analysis approach to the plasma protein profile of patients with advanced colorectal cancer. J Mass Spectrom. 2006;41:1546-1553.  [PubMed]  [DOI]  [Cited in This Article: ]
10.  Yeoh LC, Loh CK, Gooi BH, Singh M, Gam LH. Hydrophobic protein in colorectal cancer in relation to tumor stages and grades. World J Gastroenterol. 2010;16:2754-2763.  [PubMed]  [DOI]  [Cited in This Article: ]
11.  McGarigal K, Cushman S, Stafford S. Ordination: Principal component analysis. Multivariate statistics for wildlife and ecology research. New York: Springer-Verlag 2000; 41-42.  [PubMed]  [DOI]  [Cited in This Article: ]
12.  Dziuda DM. Biomarker discovery and classification. Data mining for genomics and proteomics: Analysis of gene and protein expression data. Hoboken: John Wiley and Sons 2010; 110-112.  [PubMed]  [DOI]  [Cited in This Article: ]
13.  Nagakubo D, Taira T, Kitaura H, Ikeda M, Tamai K, Iguchi-Ariga SM, Ariga H. DJ-1, a novel oncogene which transforms mouse NIH3T3 cells in cooperation with ras. Biochem Biophys Res Commun. 1997;231:509-513.  [PubMed]  [DOI]  [Cited in This Article: ]
14.  Clements CM, McNally RS, Conti BJ, Mak TW, Ting JP. DJ-1, a cancer- and Parkinson's disease-associated protein, stabilizes the antioxidant transcriptional master regulator Nrf2. Proc Natl Acad Sci USA. 2006;103:15091-15096.  [PubMed]  [DOI]  [Cited in This Article: ]
15.  Yuen HF, Chan YP, Law S, Srivastava G, El-Tanani M, Mak TW, Chan KW. DJ-1 could predict worse prognosis in esophageal squamous cell carcinoma. Cancer Epidemiol Biomarkers Prev. 2008;17:3593-3602.  [PubMed]  [DOI]  [Cited in This Article: ]
16.  Mannervik B, Danielson UH. Glutathione transferases--structure and catalytic activity. CRC Crit Rev Biochem. 1988;23:283-337.  [PubMed]  [DOI]  [Cited in This Article: ]
17.  Park HJ, Lee KS, Choo SH, Kong KH. Functional studies of cysteine residues in human glutathione S-transferase P1-1 by site-directed mutagenesis. Bull Korean Chem Soc. 2001;22:77-83.  [PubMed]  [DOI]  [Cited in This Article: ]
18.  Lo HW, Antoun GR, Ali-Osman F. The human glutathione S-transferase P1 protein is phosphorylated and its metabolic function enhanced by the Ser/Thr protein kinases, cAMP-dependent protein kinase and protein kinase C, in glioblastoma cells. Cancer Res. 2004;64:9131-9138.  [PubMed]  [DOI]  [Cited in This Article: ]
19.  Tsuchida S, Sato K. Glutathione transferases and cancer. Crit Rev Biochem Mol Biol. 1992;27:337-384.  [PubMed]  [DOI]  [Cited in This Article: ]
20.  Wilkinson B, Gilbert HF. Protein disulfide isomerase. Biochim Biophys Acta. 2004;1699:35-44.  [PubMed]  [DOI]  [Cited in This Article: ]
21.  Hirano N, Shibasaki F, Sakai R, Tanaka T, Nishida J, Yazaki Y, Takenawa T, Hirai H. Molecular cloning of the human glucose-regulated protein ERp57/GRP58, a thiol-dependent reductase. Identification of its secretory form and inducible expression by the oncogenic transformation. Eur J Biochem. 1995;234:336-342.  [PubMed]  [DOI]  [Cited in This Article: ]
22.  Coe H, Jung J, Groenendyk J, Prins D, Michalak M. ERp57 modulates STAT3 signaling from the lumen of the endoplasmic reticulum. J Biol Chem. 2010;285:6725-6738.  [PubMed]  [DOI]  [Cited in This Article: ]
23.  Katayama M, Nakano H, Ishiuchi A, Wu W, Oshima R, Sakurai J, Nishikawa H, Yamaguchi S, Otsubo T. Protein pattern difference in the colon cancer cell lines examined by two-dimensional differential in-gel electrophoresis and mass spectrometry. Surg Today. 2006;36:1085-1093.  [PubMed]  [DOI]  [Cited in This Article: ]
24.  Stierum R, Gaspari M, Dommels Y, Ouatas T, Pluk H, Jespersen S, Vogels J, Verhoeckx K, Groten J, van Ommen B. Proteome analysis reveals novel proteins associated with proliferation and differentiation of the colorectal cancer cell line Caco-2. Biochim Biophys Acta. 2003;1650:73-91.  [PubMed]  [DOI]  [Cited in This Article: ]
25.  Ghebrehiwet B, Lim BL, Peerschke EI, Willis AC, Reid KB. Isolation, cDNA cloning, and overexpression of a 33-kD cell surface glycoprotein that binds to the globular "heads" of C1q. J Exp Med. 1994;179:1809-1821.  [PubMed]  [DOI]  [Cited in This Article: ]
26.  Ghebrehiwet B. Functions associated with the C1q receptor. Behring Inst Mitt. 1989;204-215.  [PubMed]  [DOI]  [Cited in This Article: ]
27.  Bobak DA, Frank MM, Tenner AJ. C1q acts synergistically with phorbol dibutyrate to activate CR1-mediated phagocytosis by human mononuclear phagocytes. Eur J Immunol. 1988;18:2001-2007.  [PubMed]  [DOI]  [Cited in This Article: ]
28.  Leu RW, Zhou AQ, Shannon BJ, Herriott MJ. Inhibitors of C1q biosynthesis suppress activation of murine macrophages for both antibody-independent and antibody-dependent tumor cytotoxicity. J Immunol. 1990;144:2281-2286.  [PubMed]  [DOI]  [Cited in This Article: ]