Letter to the Editor Open Access
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Meta-Anal. Jun 18, 2025; 13(2): 104382
Published online Jun 18, 2025. doi: 10.13105/wjma.v13.i2.104382
Importance of data collection and subgroup analyses in research methodology
Sunny Chi Lik Au, Department of Ophthalmology, Tung Wah Eastern Hospital, Hong Kong 999077, China
ORCID number: Sunny Chi Lik Au (0000-0002-5849-3317).
Author contributions: Au SCL designed the research, performed the research, acquired the data, analyzed the data, drafted the manuscript, and revised the manuscript.
Conflict-of-interest statement: All the authors have disclosed no conflicts of interest. This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Sunny Chi Lik Au, Chief Physician, Research Fellow, Department of Ophthalmology, Tung Wah Eastern Hospital, 9/F, MO Office, Lo Ka Chow Memorial Ophthalmic Centre, No. 19 Eastern Hospital Road, Causeway Bay, Hong Kong 999077, China. kilihcua@gmail.com
Received: December 19, 2024
Revised: March 29, 2025
Accepted: April 11, 2025
Published online: June 18, 2025
Processing time: 179 Days and 20.1 Hours

Abstract

Data collection serves as the cornerstone in the study of clinical research questions. Two types of data are commonly utilized in medicine: (1) Qualitative; and (2) Quantitative. Several methods are commonly employed to gather data, regardless of whether retrospective or prospective studies are used: (1) Interviews; (2) Observational methods; (3) Questionnaires; (4) Investigation parameters; (5) Medical records; and (6) Electronic chart reviews. Each source type has its own advantages and cons in terms of the accuracy and availability of the data to be extracted. We will focus on the important parts of the research methodology: (1) Data collection; and (2) Subgroup analyses. Errors in research can arise from various sources, including investigators, instruments, and subjects, making the validation and reliability of research tools crucial for ensuring the credibility of findings. Subgroup analyses can either be planned before or emerge after (post-hoc) treatment. The interpretation of subgroup effects should consider the interaction between treatment effect and various patient variables with caution.

Key Words: Data collection; Methodology; Research; Journal; Academic

Core Tip: A variety of methods exist to assess the normality of continuous data. Among these tests, the Shapiro–Wilk test, Kolmogorov–Smirnov test, skewness, kurtosis, histogram, box plot, P–P plot, Q–Q plot, and mean with standard deviation are commonly employed. Of these, the widely recognized Shapiro–Wilk test is suitable for small sample sizes (< 50 samples), although it can also accommodate larger sample sizes. Conversely, the Kolmogorov–Smirnov test finds utility when dealing with n ≥ 50, making it another prominent technique for evaluating data normality.



TO THE EDITOR

Data collection serves as the cornerstone of any investigation. Whether primary or secondary, the collected information is pivotal in shaping the outcome of a study. Understanding the methods used for data collection, as well as their implications, is vital to maintain the integrity and validity of the research. When researchers are planning for the type of data to be collected, two types of data are commonly utilized: (1) Qualitative; and (2) Quantitative[1]. Most studies integrate both types of information to provide comprehensive insights. While quantitative data are relatively straightforward to analyze and are considered reliable, qualitative data offer a more profound description of the sample being studied.

RESEARCH METHODOLOGY FOR DATA COLLECTION

Several methods are commonly employed to gather data, each with its own set of advantages and disadvantages. Interviews, for instance, offer face-to-face interactions that can yield in-depth responses; however, they may be susceptible to interviewer influence and response distortion[2]. Observational methods allow direct insight into behaviors and situations, but the observer's involvement or lack thereof can impact the gathered information[3]. Questionnaires, on the other hand, are simple and cost-effective, although they can be subject to observer bias and confidentiality breaches[4]. Last, medical records or electronic chart reviews provide an unobtrusive means of data collection, but accuracy, authenticity, and availability can pose challenges[5,6].

Once data are collected, systematic organization is needed to facilitate presentation and analysis. A common mistake or step to be missed is testing the normality of the data. Numerous statistical methodologies rely on assumptions of data normality, including correlation, regression, t tests, and analysis of variance. While the central limit theorem mitigates concerns about normality violation in datasets with 100 or more observations, upholding the assumption of normality remains crucial for substantive conclusions, regardless of sample size. Adherence to a normal distribution allows for the meaningful presentation of data through representative mean values, which are instrumental in comparing and calculating significance levels (P values) among various groups. In cases where the data deviate from a normal distribution, utilizing the resultant mean as a representative value may yield misleading interpretations. Consequently, assessing data normality precedes determining the appropriateness of means for group comparisons; if not applicable, medians and nonparametric methods should be used.

A variety of methods exist to assess the normality of continuous data. Among these tests, the Shapiro–Wilk test, Kolmogorov–Smirnov test, skewness, kurtosis, histogram, box plot, P–P plot, Q–Q plot, and mean with standard deviation are commonly employed[7]. Of these, the widely recognized Shapiro–Wilk test is suitable for small sample sizes (< 50 samples), although it can also accommodate larger sample sizes. Conversely, the Kolmogorov–Smirnov test finds utility when dealing with n ≥ 50, making it another prominent technique for evaluating data normality. Various visualization tools, such as tables, charts, and graphs, aid in summarizing and presenting the data effectively.

Equally important in the research process is the development and validation of research tools. These tools, including observation forms, interview schedules, and questionnaires, lay the foundation for collecting essential information. In particular, validation of questionnaires from the universal English version to the mother language of the studied subjects is essential before their application[8]. Constructing these tools involves conceptual development, specification of concept dimensions, selection of indicators, and the formation of an index. Errors and biases across investigators, instruments, and subjects could weaken the validity and reliability of the study.

SUBGROUP ANALYSES OF THE RESEARCH METHODOLOGY

Subgroup analyses can either be planned before or emerge after (post hoc) treatment. The former holds more credibility if it is based on prespecified hypotheses, justified directions of overall and subgroup effects, and appropriate statistical testing. Conversely, post hoc analyses are often considered exploratory and are compromised by factors such as data-driven interpretations and reduced statistical power.

Multiplicity issues arising from simultaneous subgroup analyses can lead to inflated significance levels and spurious results. To address this, it is recommended to prespecify relevant subgroups, use appropriate statistical tests for interactions, and adjust P values for multiple testing[9]. Additionally, the interaction test examines whether treatment effects differ between subgroups, providing valuable insights into the credibility of subgroup effects.

The interpretation of subgroup effects should consider the interaction between treatment effect and various patient variables with caution. Furthermore, the reporting of all conducted subgroup analyses, regardless of their statistical significance and consistency of treatment effects across related outcomes, is vital. The consistency of the subgroup effects across well-designed trials strengthens the validity of these analyses[10]. Additionally, planning subgroups based on the current understanding of biological mechanisms and anticipating heterogeneity are crucial for reliable analyses.

CONCLUSION

The meticulous collection, organization, and validation of data are fundamental elements in any research endeavor. Upholding the standards of validity, reliability, and practicality ensures that the research tools and data collected are robust and dependable, ultimately contributing to the credibility and effectiveness of the study's outcomes.

Footnotes

Provenance and peer review: Invited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Medicine, research and experimental

Country of origin: China

Peer-review report’s classification

Scientific Quality: Grade B

Novelty: Grade B

Creativity or Innovation: Grade B

Scientific Significance: Grade B

P-Reviewer: Nwabo Kamdje AH S-Editor: Luo ML L-Editor: A P-Editor: Yu HG

References
1.  Noyes J, Booth A, Moore G, Flemming K, Tunçalp Ö, Shakibazadeh E. Synthesising quantitative and qualitative evidence to inform guidelines on complex interventions: clarifying the purposes, designs and outlining some methods. BMJ Glob Health. 2019;4:e000893.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 156]  [Cited by in RCA: 223]  [Article Influence: 37.2]  [Reference Citation Analysis (0)]
2.  Jamshed S. Qualitative research method-interviewing and observation. J Basic Clin Pharm. 2014;5:87-88.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 243]  [Cited by in RCA: 251]  [Article Influence: 22.8]  [Reference Citation Analysis (0)]
3.  Busetto L, Wick W, Gumbinger C. How to use and assess qualitative research methods. Neurol Res Pract. 2020;2:14.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 62]  [Cited by in RCA: 235]  [Article Influence: 47.0]  [Reference Citation Analysis (0)]
4.  Kishore K, Jaswal V, Kulkarni V, De D. Practical Guidelines to Develop and Evaluate a Questionnaire. Indian Dermatol Online J. 2021;12:266-275.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 2]  [Cited by in RCA: 50]  [Article Influence: 12.5]  [Reference Citation Analysis (0)]
5.  Chen W, Xie F, Mccarthy DP, Reynolds KL, Lee M, Coleman KJ, Getahun D, Koebnick C, Jacobsen SJ. Research data warehouse: using electronic health records to conduct population-based observational studies. JAMIA Open. 2023;6:ooad039.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 5]  [Reference Citation Analysis (0)]
6.  Vassar M, Holzmann M. The retrospective chart review: important methodological considerations. J Educ Eval Health Prof. 2013;10:12.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 262]  [Cited by in RCA: 338]  [Article Influence: 28.2]  [Reference Citation Analysis (0)]
7.  Mishra P, Pandey CM, Singh U, Gupta A, Sahu C, Keshri A. Descriptive statistics and normality tests for statistical data. Ann Card Anaesth. 2019;22:67-72.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 316]  [Cited by in RCA: 745]  [Article Influence: 149.0]  [Reference Citation Analysis (0)]
8.  Ranganathan P, Caduff C. Designing and validating a research questionnaire - Part 1. Perspect Clin Res. 2023;14:152-155.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 12]  [Reference Citation Analysis (0)]
9.  Dijkman B, Kooistra B, Bhandari M; Evidence-Based Surgery Working Group. How to work with a subgroup analysis. Can J Surg. 2009;52:515-522.  [PubMed]  [DOI]
10.  Wang X, Piantadosi S, Le-Rademacher J, Mandrekar SJ. Statistical Considerations for Subgroup Analyses. J Thorac Oncol. 2021;16:375-380.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 11]  [Cited by in RCA: 42]  [Article Influence: 10.5]  [Reference Citation Analysis (0)]