Data management software applications specifically designed for the clinical research environment are increasingly available from commercial vendors and open-source communities, however, general-purpose spreadsheets remain widely employed in clinical research data management CRDM.
Spreadsheet suitability for this use is controversial, and no formal comparative usability evaluations have been performed. We report on an application of the UFuRT User, Function, Representation, and Task analyses methodology to create a domain-specific process for usability evaluation. Through this domain-specific operationalization of UFuRT methodology, we successfully identified usability differences and quantified task and cost differences, while delineating these from socio-technical aspects.
UFuRT can similarly be generalized to other domains. Clinical and translational researchers are increasingly adopting formal approaches to data collection and management. Many use specialized systems, while others resort to more general tools, such as ubiquitous spreadsheet programs to manage clinical research data. Debate continues unabated over the relative merits of the two approaches: the perceived rigor of specialized CDMSs versus the affordability of general spreadsheets.
There are no definitive studies in the published literature that might resolve the argument, or guide institutional decision-making regarding provision of data collection and management software. In order to identify and clarify differences, we need a method for comparing competing products for a particular research setting.
We report on the development of an operational process for the U ser, Fu nction, R epresentation, and T ask analyses UFuRT methodology [ 1 , 2 ] and demonstrate this process in the context of a usability comparison of two software packages used to manage clinical research data: Microsoft Excel and Phase Forward Clintrial.
The application of usability evaluations to the domain of clinical and translational research has been limited.
Schmier and colleagues argue for applying usability theory to the clinical research domain [ 4 ] and describe how the framework of Constantine and Lockwood [ 5 ] might be applied to clinical data management systems CDMSs.
This framework was designed to elucidate context and potential usability issues and consists of six categories: environment, device constraints, information, interaction, incumbent, and operational risk profile [ 5 ].
Further, Litchfield et al. To date, however, there are no published reports of applications of formal usability methods in the clinical research data management CRDM domain, and the operationalization of the UFuRT analysis in this domain therefore represents a significant contribution to this field.
UFuRT was chosen because 1 it encompasses functionality evaluation, commonly applied when selecting software; 2 the framework is grounded in WCD; and 3 UFuRT can provide quantitative assessments regarding necessary investments of both time and financial resources.
What are Independent and Dependent Variables?
Thus, UFuRT establishes both a qualitative and a quantitative context that supports the decisions clinical and translational researchers will make when evaluating and selecting systems. UFuRT enables characterization and direct comparison of users, functionality, representation, and tasks of software applications.
In the operationalization framework, user roles were identified according to domain expertise and characterized by research-related responsibilities, work environment, level of expertise, and education Table 1. Research characteristics that affected roles were identified from Clinicaltrials.
Functions activities and their associated objects were identified for the CRDM domain and added to the work domain ontology. Terms and definitions from existing standards were used where available [ 11 — 14 ]. Where no formal definitions existed, domain expertise was used to define terms, assuring domain coverage. Fifty-five functions from the ontology were identified as critical to the domain.
Software products were rated against this list using exhaustive and mutually exclusive categories by four domain expert raters Table 2. Software was evaluated as initially purchased, without modification. In any instance where programming was required to add functionality, the function was rated as Not Supported. Indirectly Supported functions were those that could be accomplished with the software through the addition of manual steps or procedural controls.
Most applications employed in the CRDM domain support intensive data capture from key entry. Thus, representation at the form page level and field level are important for such applications.
A typical page from an example clinical trial Table 3 case report form CRF was selected for our analysis. Representation of data elements at both field and form level was assessed. Data collection structure and format of fields in the applications were also compared.
Form-level comparisons included view orientation [ 16 ] and spatial proximity. Spatial proximity was measured as distance from the visual center the average [x,y] coordinates of all fields displayed on the form. In addition, a spatial proximity map was created by measuring distance from each field on a typical form to every other field on the form.
Operationalization of Concepts
Mapping was performed between spatial and semantic proximity to compare software applications. The operationalization of UFuRT task analysis in the CRDM domain required identification of select activities from the functional analysis for more detailed evaluation. To operationalize the UFuRT methodology within our domain, the following tasks were used: 1 an example form was entered; 2 a discrepancy identification rule was carried out; 3 a data update was made; 4 a data file was loaded; 5 a term was coded; and 6 a data extract and transfer were performed in both software applications.
Individual steps required to accomplish the function tasks were identified, labeled as internal or external, and counted to quantify the mental and physical steps needed to carry out each function. Thus, software applications could be evaluated based on total number of steps and number of mental versus physical steps to assess both the amount of time needed to complete tasks, as well as relative cognitive load on the user.
For our UFuRT operationalization, differences in numbers of steps and expert experience were used to estimate costs associated with each system.
Our assumptions included a time of 1 minute for a user to read a data validation check and apply the logic to identify data discrepancies, and 5 minutes for a user to review a discrepancy, document it, and communicate it to the clinical investigational site. We estimated an additional 2 minutes per discrepancy to manually track the process, where applicable. This metric, low for double data entry and high for single data entry, includes time needed to select the file from storage, log into the system, enter data, and return the file to storage.
The coding assumption for both systems was 5 minutes per manually coded term.
Importantly, operationalization of UFuRT in this domain requires that information from the task analysis be used to inform the cost analysis. We accomplished this by a priori definition of operational metrics described above for the analyzed tasks and subsequent scaling based on step number.
Keystroke-Level Modeling KLM or Goal, Operator, Method, Selection GOMS modeling [ 18 , 19 ] which take into account different times for different types of tasks, would further refine the cost analysis through a more direct coupling of the task analysis to the cost analysis.
Although Web-based EDC is increasingly ubiquitous, we chose to use a paper-based CDMS because the work processes more closely resembled the data processing model employed by Excel users. Additionally, a comparison of Excel and Clintrial may help characterize the relative advantages and disadvantages of spreadsheets and CDMSs in a manner meaningful to investigators and research teams.
Such comparisons within the CRDM domain require specification of an example research project for which data are to be collected and managed Table 3. However, we sought to address a real and current problem within the domain; i.
Excel is a robust general spreadsheet, rich in functionality and used across many industries to capture, store, and analyze data.
It is easily obtained and costs a few hundred dollars, and can be installed on a desktop computer with a compatible operating system by a novice without assistance or other infrastructure. Data are stored in individual files with the application-specific extension. The Clintrial system was developed specifically to manage clinical research data. Clintrial is touted as the market leader among CDMSs, with over installations in the life sciences [ 20 ].
Clintrial is a client-server application that uses the Oracle relational database management system RDBMS for database transactions. Thus, application and data are separate, with data independence achieved through storage in the Oracle relational database. Clintrial supports multiple schemas wherein different sets of tables are created for each clinical trial. The system is neither simple nor quick to obtain, and requires RDBMS and network infrastructure to set up and maintain the application server and database.
Products such as Clintrial, along with more recent Web-based EDC software, are mainstays of data management for regulated, industry-sponsored clinical trials, while spreadsheets are often used in small-scale, investigator-initiated clinical research. Because our primary focus was operationalization of the UFuRT in the CRDM domain as noted above, we present only high-level results of our demonstration, with the intent of exemplifying the type of information that future applications of this method in this domain should expect.
The work domain ontology developed from the UFuRT analysis contains total classes and 16 relationships. The user analysis resulted in the identification and classification of ten roles or classes of users Table 1.
Fifty-five key functions were identified, defined, and rated for Clintrial and Excel.
Operationalization of variables pdf files
Inter-rater reliability was measured as average percentage agreement among four senior data managers; inter-rater reliabilities of Neither product supported optical scanning. The scale representation results provided in Table 5 show that the paper CRF tended to collect data at a lower scale than the inherent scale of the represented data element.
Further, data reduction was not observed between the CRF and the Clintrial or Excel representations, as both applications had sufficient functionality to maintain the scale of the data as represented on the CRF. Field structure, however, differed between the products due to variations in available functionality. In Clintrial, data in drop-down fields can be entered by keying the first letter of the choice, thereby minimizing keystrokes and other motion. There is significant optionality lack of field structure in Excel for data entry, including the option of entry as all text and entry with no field limits.
Clintrial data entry interface facilitated replication of all three form-level orientations source, time, and concept , while the Excel data entry interface could not visually represent three orientations in the two-dimensional worksheet.
For example, time was usually recorded as a record row identifier in Excel, whereas in Clintrial, data taken at different time points were represented as different patient visits in the navigation tree and, thus, on different data entry screens. Excel also exhibited a lesser degree of spatial proximity than Clintrial Table 6. For our demonstration CRF with 20 variables, users were obliged to transfer their hands from keyboard to mouse several times and visually scan for the correct column, slowing data entry and adding mental steps.
Semantic proximity is mapped to spatial proximity in Fig. Further, as the number of variables on a form increases, the spatial proximity decreases more rapidly in the table-based representation and semantic and spatial proximity are further decoupled.
Thus, whereas spatial and semantic proximity correlate closely on the CRF and Clintrial representations, they do not do so in the Excel table-based view. The user must therefore compensate, mapping the form view of the CRF to the table view in Excel and increasing the amount of visual scanning, resulting in higher cognitive loads [ 21 , 22 ].
The task analysis results displayed in Table 7 show the number of steps required to process data for the example clinical trial in each system. In addition, for forms with more than 10 fields, the table-based view of Excel forces either multiple keyboard-to-mouse hand movements physical steps , or visual mapping from the data source to a more normalized spreadsheet structure mental steps resulting in a higher cognitive load on the user [ 21 , 22 ]. Because spatial proximity issues, a greater number of total steps, and a higher proportion of internal steps in Excel would yield slower entry times with that utility, we correspondingly adjusted our data entry metric to 4 minutes as the time required to data-enter a CRF page in Excel, versus 3.
Pro forma and operationalization
On the basis of these results we also added 2 minutes per discrepancy in Excel to account for manually applying the rule and tracking each discrepancy. Importantly, these task analysis results thus inform an associated cost analysis.
In addition to the data processing metrics described in the Methods section, time required for programming, user testing, and coordination of data processing were included to yield a more comprehensive cost analysis. Programming, including user testing for the database and data validation checks, was estimated at 1. We assumed 20 hours per month for managing data collection and an additional 10 hours per month for additional administrative tasks daily back-ups; creation of status reports; reconciliation of manual tracking with data needed for the Excel system.
Our model assumed a month enrollment period at a rate of 1 enrollment per month. All of the data management costs here are variable; the particular cost drivers for each of the six categories of data management tasks, however, are different.
To access Lynda.com courses again, please join LinkedIn Learning
Database set-up costs are determined by number of total and unique CRF pages as well as number of data validation checks programmed. Data entry costs are driven by the number of data fields to be entered, often assessed at the page level by assuming a standard number of fields per page e.
Data cleaning costs are driven ultimately by the number of queries generated.