Evaluation and refinement of Self-Directed Learning Readiness Scale for medical students
Article information
Abstract
Purpose
This study evaluated the underlying subdomain structure of the Self-Directed Learning Readiness Scale (SDLRS) for medical students and refined the instrument to measure the subdomains to provide evidence for construct validity. Developing self-directed learners is a well-recognized goal amongst medical educators. The SDLRS has been frequently used, however, lack of construct validity makes it difficult to interpret results.
Methods
To identify the valid subdomains of the SDLRS, items were calibrated with the graded response model (GRM) and results were used to construct a 30-item short form. Short-form validity was evaluated by examining the correspondence between the total scores from the short form and the original instrument for individual students.
Results
A five-subdomain model explained the SDLRS item response data reasonably well. These included: (1) initiative and independence in learning, (2) self-concept as an effective learner, (3) openness to learning opportunity, (4) love of learning, and (5) acceptance for one’s own learning. The unidimensional GRM for each subdomain fits the data better than multi-dimensional models. The total scores from the refined short form and the original form were correlated at 0.98 and the mean difference was 1.33, providing evidence for validation. Nearly 91% of 179 respondents were accurately classified within the low, average, and high readiness groups.
Conclusion
Sufficient evidence was obtained for the validity and reliability of the refined 30-item short-form targeting five subdomains to measure medical students’ readiness to engage in self-directed learning.
Introduction
Self-directed learning (SDL) has long been considered a cornerstone in adult education [1] and has become a guiding principle for all levels of medical training [2,3]. The self-directed learner is able to formulate learning goals, plan how to accomplish them, assess success, and modulate learning styles to meet personal and contextual demands [4]; qualities that are essential for the safe and effective delivery of healthcare [2,3]. As described by the staged SDL model of Grow [1], student proficiency in SDL is heterogeneous, ranging from students who are quite dependent upon their instructors’ guidance to those who are independent learners. The ability to assess a student’s position within this spectrum is valuable for the learner, faculty who design instructional approaches, and those who evaluate curricular innovations meant to promote self-direction. This has inspired the development of a number of instruments constructed to measure attributes considered to be associated with the capacity to be a self-directed learner [5,6].
The Self-Directed Learning Readiness Scale (SDLRS) by Guglielmino [5] is one such instrument that has been used by medical schools to assess students’ potential to function as self-directed learners [7]. This 58-item instrument was introduced in the late 1970s following a Delphi study which sought to establish a consensus view of the attributes of a self-directed learner. This inductive approach generated a survey that was piloted on 307 US high school students, college undergraduates, and adults enrolled in continuing education. Principal component analysis with varimax rotation led to the development of a 48-item instrument [5]. The instrument was later expanded to include 10 additional items that reportedly measure eight different factors: (1) openness to learning opportunities, (2) self-concept as an effective learner, (3) initiative and independence in learning, (4) acceptance of responsibility for one’s own learning, (5) love of learning, (6) creativity, (7) positive orientation to the future, and (8) ability to use basic study and problem-solving skills [8]. Upon completion of the survey, each SDLRS respondent receives a single score between 59–290. Scores have been shown to have a normal distribution with a mean of 214, and standard deviation of 25.59 [9].
The SDLRS has not been without controversy but has nonetheless been used in over 260 studies across a variety of educational settings [9]. Its construct validity has been repeatedly challenged (Table 1), in most cases by the application of principal component analysis and exploratory factor analysis (EFA) [7,10-12]. Hoban et al. [13] explored the construct validity of the SDLRS by performing EFA with promax rotation in which factors are correlated, followed by confirmatory factor analysis on the responses of 975 medical students. Like others, they concluded that the instrument lacks construct validity. They propose a four-factor model that retains 41 of the 58 items that measure (1) learning is a tool for life, (2) self-confidence in the abilities and skills for learning, (3) responsibility for one’s own learning, and (4) curiosity. However, using different medical student responses, we were unable to discern consistent results following repeated analysis with the four-factor model of Hoban et al. [13] (chi-square [773]=1,469.27, p=0.00, root mean squared error of approximation [RMSEA]=0.07, non-normed fit index [NNFI]=0.68, comparative fit index [CFI]=0.69, standardized root mean squared residual [SRMR]=0.08; see Analyses section for the interpretation of the fit indices). Such a lack of construct validity can lead to the misinterpretation and misuse of SDLRS scores.
The importance of SDL in medical education has been articulated by leading thought leaders [2] and has been added to the elements required for the accreditation of allopathic medical schools [3]. This reflects the recognized societal need for physicians to practice lifelong learning in a landscape of fast-paced biomedical and technological progress [2]. To ensure that graduates of medical school are prepared to take ownership of their career-long educational journey, medical educators have employed a number of instruments to assess the readiness [5] and attributes of SDL [6] among their students. Given the easy accessibility and wide use of the SDLRS among all types of learners [9], including medical students [14], we sought to determine if evidence of validation of the SDLRS could be obtained within the context of medical education and medical students, who represent a unique population of adult learners selected, in part, for demonstration of high academic achievement. Like previous studies, the purpose of this study is to explore the underlying subdomain (i.e., factor) structure of the SDLRS in medical education.
This study evaluated the underlying subdomain structure of the SDLRS [5] for medical students and refined the instrument to measure the subdomains to provide evidence for construct validity. We applied the item response theory (IRT) model, which unlike factor analysis is not dependent on the data one collects in determining subdomains, rather it is independent of responses and explores the unidimensionality of the items themselves. Specifically, we employed Samejima’s graded response model (GRM) [15], which facilitates the analysis of Likert scale responses, to refine the SDLRS.
Methods
1. Participants and instrument administration
The SDLRS [5] was administered to first- and second-year medical students from 2014 to 2016 as a mandatory scale. A total of 179 students completed the SDLRS, which features 5-point Likert scale items ranging from almost never true of me: I hardly ever feel this way (=1) to almost always true of me: there are very few times when I don’t feel this way (=5). Seventeen negatively worded items were included, and those scores were reversed in the analysis.
Among respondents, 47.49% students (n=85) were male, and 50.84% students (n=91) were female. The majority of students possessed only a bachelor’s degree (84.36%, n=151); some had a master’s degree (7.82%, n=14), and a few had earned a PhD (Doctor of Philosophy; 2.23%, n=4). The majority of students were under 25 years old (79.89%, n=143), some were in the range of 24 to 35 years (17.88%, n=32); one student was in the range of 36 to 45 years (0.56%, n=1). Three students did not reply to all demographic questions. This study was deemed exempt by the Internal Review Board of Hofstra University.
2. Analyses
IRT was applied in part because sample sizes as small as 100 are often adequate for estimating stable 1-parameter IRT model parameters [16]. For more complex models, the required sample size is not clear. More specially, Orlando and Marshall [17] and Thissen et al. [18] suggest that 200 or fewer sample size can be adequate. Thissen [19] indicates a smaller sample size is needed for the item response data satisfying the IRT model assumptions.
One of the assumptions of IRT is unidimensionality, or the presence of valid subdomains. Gugliemino [5] developed each item of SDLRS to measure only a single factor (i.e., each item was assigned to a single subdomain). We evaluated this assumption by EFA with varimax rotation. We assessed fit based on five indices: (1) the model chi-square (p-value >0.05 for a good fit), (2) the RMSEA <0.08 for a good fit, (3) the NNFI ≥0.95 for a good fit, (4) the CFI ≥0.90 for a good fit, and (5) the SRMR <0.08 for a good fit [20]. Results of our EFA of the 58-item scale supported the unidimensionality of each item, which means that each item response probability is a function of a single dominant factor. Nonetheless, 10 redundant item pairs whose inter-item correlation ranged between 0.93 and 1.00 were observed.
As the next step, we calibrated the subdomain items using the two-parameter logistic GRM [15], an item response model for ordinal item responses, such as those in a Likert scale. The marginal maximum likelihood estimation procedures were used to estimate a slope parameter (i.e., item discrimination), “a,” and four location parameters (i.e., item difficulty), “b,” for each 5-point Likert item. We also evaluated model fit based on the G2 (p-value >0.05 for a good fit), RMSEA, CFI, Tucker Lewis index >0.95 for a good fit, and SRMR as well as item fit based on the chi-square statistic (p-value >0.05 for a good fit). GRM-based marginal reliability (r) was estimated for subscales. We used information from the GRM calibration to identify a shortened scale that maintained adequate SDL-level coverage with maximum precision. We sought evidence for short-form validation by comparing individual total scores based on the original and short SDLRS forms.
Results
1.Evaluating graded response model assumption
Unlike the original eight factors originally reported by Guglielmino [5], the scree plot of eigenvalues from our EFA suggested five factors. The varimax-rotated five-factor solution initially extracted items 9, 17, 13, 10, and 2 into each of the five factors. However, the correlation between the residuals of the items (15 and 50) in the fifth factor of “responsibility for one’s own learning” was inadequate at 0.28 (<0.1 for a good fit). We removed the more problematic item 15 and then the varimax-rotated five-factor solution extracted 12, 12, 12, 10, and 9 items into each of the five factors. Item 15 was not included in further analyses. Item factor loadings were all positive, with the exception of six items (23, 48, 53 in the first factor; 9, 19, 33 in the second factor; 6 in the third), ranging from the absolute values of 0.35 to 0.83.
The residual correlation matrix was inspected visually to identify evidence of local dependence indicated by the presence of additional factors. The fit measures were improved to within good fit ranges after removing problematic items with residuals that were related to more than one item (item 9, 13, 14, 16, 17, 18, 20, 22, 23, 28, 34, 35, 36, 37, 53). Based on these results, we determined five subdomains within the scale shown in Table 2, consisting of the remaining 43 items (8, 10, 10, 9, and 6 items for each of the five factors), were sufficiently unidimensional with the five-factor model for further analysis (chi-square [661]=715.61 with p-value=0.07, RMSEA=0.04, NNFI=0.91, CFI=0.94, SRMR=0.04). Solutions for up to 16 factors were examined following the Kaiser-Guttman criterion. These solutions did not yield superior fit relative to the five-factor model with correlated errors.
2. Graded response model calibration
IRT differs from classical test theory (i.e., factor analysis) in that item characteristics in IRT are not group-dependent, which means that parameters of an item are invariant across groups of respondents [21]. Item parameter invariance is desirable when results may be obtained from medical students at different institutions. Hence, it would be possible to compare different groups’ readiness to engage in SDL on a set of items comprising a single scale.
While many IRT models can be applied to rating data [21], the GRM is a widely accepted approach to calibrating ordinal items within subdomains [22]. To evaluate and refine items measuring each of the factors, we applied five different GRMs (each with different item characteristics) to each subscale. The results of the EFA reveal negative and low correlations between factors, ranging between -0.51 and 0.48 (Appendix 1); each subdomain was conceptually treated separately from one another in the following analyses.
Two items (item 31 and 47) did not fit well with the unidimensional GRM model, so they were removed from further analysis. Table 2 presents model fit indices and reliability coefficient “r” for each subscale. The models fit well with each of the subscales. We also examined fit at the item level using the polytomous extension of S–chi-square as shown in the Appendix 2. None of the items were identified as misfitting. Fig. 1A shows item characteristic curves (ICCs) for items 27 and 21. These items illustrate how ICCs varied depending on the discrimination parameter (item 27’s is moderate, a=1.44; item 21’s is low, a=0.88), as well as the difficulty parameter (item 27 is endorsed at relatively higher levels of SDLRS; however, response categories for item 21 measured a relatively lower level of SDLRS, for example, “usually true of me” measures SDLRS level (θ) 0.5 max for item 27, but -0.5 (θ) max for item 21. Furthermore, in item 21, “Almost never true of me” did not sufficiently differentiate) (Table 2, Fig. 1).
Seven items had negative slope parameter estimates (items 6, 10, 19, 33, 39, 48, 56). Among them, four items were reversed-scored. To evaluate the assumption of local item independence (zero correlation between items when conditioned on the measured factor score), the residual correlation matrix was inspected visually to identify evidence of local dependence indicated by the presence of a second factor. Three problematic items (item 10, 19, 48) whose residuals were >0.1 were detected and dropped from the final shortened version.
3. Instrument short version selection and evidence for validation
A shortened version of the scale with 30 items was developed based on the results from the GRM calibration and validation procedures. More specially, the item information functions in Fig. 1B demonstrate how we used the results to construct the short form. Items 25 and 41 were initially selected for measuring the second factor: Self-concept as an effective learner. We subsequently selected item 41 over 25 because item 41 reliably measured the SDL level of -3 and +2 which includes the SDL level of -3 to +1.5 measured by item 25. Additionally, item 41 produced more information, I(θ), increasing reliability r. To maintain balanced subscale coverage and to reduce any possible bias from uneven items per domain, the same number of items (six) were included in each subscale.
Table 2 presents the model fit indices and reliability coefficient r of the short version scale, which measures five factors. The five factors were labeled: (1) initiative and independence in learning, (2) self-concept as an effective learner, (3) openness to learning opportunities, (4) love of learning, and (5) acceptance of one’s own learning, according to the themes of items (Tables 2, 3). These five subscales fit very well with unidimensional GRMs although a slight decrease in reliability coefficients was observed (i.e., for the first factor: Imitative and independence in learning, r=0.75 to 0.71). The parameter estimates are listed in Appendix 2. The slope estimates, a, ranged from 0.4 to 3.62 (except item 56’s a=-1.80 in which a response category: almost always true of me was missing), indicating the desired variation in item discrimination. The negatively worded item 56 (learning does not make any difference in my life) had a negative value of the slope parameter, which means that the item with increasing levels of positive SDL readiness is less likely to endorse more severe response options. Such an item could be problematic, especially for knowledge items. However, this item was included because, in this scale, the item properly discriminated between levels of SDL readiness. The location parameters for the 30 items reflect a wider range of underlying SDL readiness (range, -5.78 to 4.71), but the majority of item response categories were only endorsed by medical students who had higher than average levels of SDL, implying that the item set as a whole is most useful in discriminating individuals at the high end of the SDL readiness continuum.
We examined the validity of the short scale by comparing the short and long forms. An individual student’s total scores based on the 58- and 30-item scales were generated. The correlation between these two scores was excellent at 0.98. We conducted a linear transformation to convert the range of 30-item scale total score (82,148) to the range of 58-item scale total score (162,282) for each student. The mean difference in scores was 1.33 (30-item scale mean=237.69, 58 item scale mean=236.36). This difference was insignificant (t[355.98]=-0.57, p=0.57), lending support for the validity of the short form. Based on the scoring and cut score from the 58-item scale, we also determined that an observed total score of 150 on the 30-item five-category response scale could be considered indicative of three different SDL levels: below average, average, and above average (Table 3). Maintaining a similar percent of the total score, the corresponding observed total score for the 30-item short form is about 5. The cross-classification of respondents above and below the 3 cut scores is displayed in Table 3 and indicates that 90.5% of the sample is classified in the same way regardless of the form used. The kappa statistic for this concordance is 8.24 (95% confidence interval, 0.06 to 0.83; p=0.00). The extent of classification correspondence between the two versions lends additional support for the use of the 30-item form, especially in applications where researchers wish to decrease respondent burden (Table 3).
Discussion
The importance of SDL in medical education has stimulated the development of multiple instruments, each designed to assess different aspects of SDL. For example, the Self-Regulation of Learning Self-Report Scale of Lucieer et al. [23] and the Scale on Self-Regulation in Learning of Erdogan and Senemoglu [24] use the framework of self-regulated learning skills (planning, monitoring, assessment, reflection, and self-efficacy) to define subscales. Other scales focus on the metacognitive aspects of SDL [25]. Finally, some, such as the Self-Rating Scale of Self-Directed Learning [6], combine Self-Regulation of Learning skills-based subscales with those that probe metacognition. When first introduced by Knowles [4], SDL was described as a series of traits that a student might develop along their educational journey. These traits form the subscales in the original SDLRS [5]. Thus, medical educators interested in assessing their learners’ capacity for SDL need to first identify which framework, and therefore which type of instrument, is most relevant. Indeed, there may be value in assessing learners’ traits, skills, and capacity for metacognition. This study was designed to optimize a scale for learners’ traits in the setting of medical education.
Construct validity is fundamental to scale development, as it provides evidence that the scale successfully measures the target constructs [21]. Without construct validity evidence, the generalizability of the SDLRS score is severely limited in the sense that it is difficult to link SDLRS scores to the eight attributes that Guglielmino [5] intended to measure [8]. This study examined whether the SDLRS indeed measures the intended constructs for medical student SDL readiness.
Based on our analyses, we developed a shorter version of SDLRS. The development followed the steps of construct validation using an IRT model (GRM). Factor analysis and internal consistency indices, which are traditionally used to assess the performance of items, can be misleading [21], and the IRT-based approach more easily determines the effect of adding or deleting an item or set of items by examining the change in the shape of the item and test information functions after adding/deleting items, and comparing this to the desired performance curve [21,26]. Overall, these results support the psychometric construct validity and reliability of the short version of SDLRS.
The SDLRS as originally developed by Guglielmino [5] has been re-evaluated by multiple investigators using a variety of approaches [7,10-13]. As shown in Table 1, only the “love of learning” subdomain is commonly supported by all studies that used adult participants. The method of analysis used by Hoban et al. [13] is most similar to this work. Their study resulted in reducing the number of items from 58 to 41 within four subdomains. We used IRT, specifically GRM calibration, to establish a 30-item scale. We found that these items are unidimensional within five of the eight subdomains proposed by Guglielmino [5] and Bonham [8]. Thus, the refined SDLRS subdomains include: (1) initiative and independence in learning, (2) self-concept as an effective learner, (3) openness to learning opportunities, (4) love of learning as a life tool, and (5) acceptance of one’s own learning. Constructs not measured include: (1) creativity, (2) future orientation, and (3) ability to use basic study and problem-solving skills (Table 1).
The implications of a shorter, SDLRS for medical researchers and educators are two-fold. More precise and reliable measurement of the intended constructs will lead to more accurate research on SDL readiness in medical education, which will in turn produce more relevant recommendations for practical guidance to support students’ SDL. Also, an improved SDLRS survey may resolve some of the ambiguities of the previous research that examined the instrument itself [7,10-13] as well as those that applied it to measure SDL in their learners [14].
This study is not without limitations. First, our analysis was based on students from a single medical school. Most student scores were either average or above average, so there was little representation of responses typical of a below-average self-directed learner. Second, some item response categories had quite low or zero responses, and the estimates of item threshold parameter were reversed. This leads to a decrease in the reliability coefficient estimates [27]. Third, the same data was used to identify subfactors and to construct and validate the short version of SDLRS. To overcome this limitation, we used various models to develop the short version of the SDLRS. Nonetheless, to establish additional evidence of validity, this short version needs to be re-evaluated with different samples in which participants have a wider range of self-directed readiness. Finally, like Field [7], we found that negatively worded items were problematic. Some negatively worded items were positively correlated with positively worded items. A subsequent qualitative study would be required to examine whether students understand items correctly.
Assessment in medical education is dynamic [28] with the emerging recognition that not all assessment is of learning. Rather, assessment for learning may contribute to student success [29]. A validated and accessible instrument to measure SDL readiness should help students consider areas for personal growth and can be used by faculty for their own development as well as assess curricula designed to foster learning independence.
Acknowledgements
None.
Notes
Funding
No financial support was received for this personal view article. Conflicts of interest:
Conflicts of interest
No potential conflict of interest relevant to this article was reported.
Author contributions
Dr. Lim conceptualized and designed the study, conducted data analysis, and played a key role in shaping the research methodology. Dr. Willey contributed significantly by collecting the dataset. The manuscript was collaboratively written by both authors, reflecting their shared insights and expertise in the field.