Data Management - an overview
This article is part of the network’s archive of useful research information. This article is closed to new comments due to inactivity. We welcome new content which can be done ...
This is a fundamental part of planning any clinical trial. Planning how the trial data is going to be recorded from the source data, entered onto a data base, checked, cleaned and then extracted for analysis is very important. It should therefore be considered alongside all the other set-up activities and a data management SOP should be written early on.
Ultimately the outcome of any clinical research needs to be reported, and therefore the quality of data generated plays an important role in clinical trials.
Clinical Data Management (CDM) is the process of collection, cleaning, and management of subject data in compliance with regulatory standards. The primary objective of CDM processes is to provide high-quality data by keeping the number of errors and missing data as low as possible for data analysis.
The objectives of good clinical data management are to ensure:
The European Clinical Research Infrastructures Network (ECRIN) has written some helpful and straightforward guidance on GCP-compliant data management in multinational clinical trials. It’s available here.
High-quality data should be accurate and suitable for statistical analysis. They should meet the protocol-specified parameters and comply with the protocol requirements. This implies that where there is a deviation or the data does not meet the protocol specifications, the data may be excluded from the final database. High-quality data should have minimal or no missing items, and should have only an arbitrarily acceptable level of variation that would not affect the conclusion of the study on statistical analysis. The data should also be able to meet the applicable regulatory requirements specified for data quality.
It is often helpful to work through a ‘data journey’ when writing a data management plan. This starts from the first data point that is to go in the database and ends at the last. This process would determine what is the original, or source, data for each data point and then plan how this information would be transferred from the source data into the database. Examples of source data are patient diaries, clinic registries, patients notes and laboratory reports. In some settings there is no existing source data. If this is the case then source data forms will be needed.
Paper CRF (pCRF) vs Electronic CRF (eCRF)
Over the years, the manner in which data has been collected has changed in line with increasing technological advancement. Previously, data from clinical studies were mainly collected on paper CRF (pCRF). Transitioning from paper-based data collection to eCRF systems has evolved over time and has numerous benefits including; being able to ease the burden associated with organising paper work, reducing time and cost as well as being able to collect and transmit data in real time, where good internet connections exist. This means data can be analysed almost immediately and missing data highlighted in the database often before it is too late to rectify the issue. More importantly, data can be queried and resolved without monitors visiting the site, meaning that the monitor’s time can be used more efficiently.
Another aspect to electronic data capture is that data can be collected centrally by specialised teams – for example, patient’s randomisation data; patients reported outcome (ePRO), radiological data, validated questionnaires etc. The main advantage of this is that the vendors can programme edits checks and reminders to minimise data error or missing data respectively within their devices.
When using electronic or remote data systems it is important that the following steps are in place:
o Ensure and document that the electronic data processing system(s) conforms to the sponsor’s established requirements for completeness, accuracy, reliability, and consistent with the intended performance (i.e. validation).
o Maintain SOPs for using these systems.
o Ensure that the systems are designed to maintain and edit an audit trail
o Maintain a security system that prevents unauthorised access to the data.
o Maintain a list of the individuals who are authorised to make data changes.
o Maintain adequate backup of the data.
Regulations, Guidelines and Standards
As with any clinical data, there is always the need to ensure that the data collected is acceptable and can achieve the required standard accepted by the regulatory authorities. The Society for Clinical Data Management (SCDM) publishes the Good Clinical Data Management Practices (GCDMP) guidelines, which provides the standards for good practice within CDM. The Clinical Data Interchange Standards Consortium (CDISC), a multidisciplinary non-profit organization, has also developed standards to support acquisition, exchange, submission, and archival of clinical research data. The guidelines also covers entry or changes to the clinical data, the date and time of entry/change and details of the changes that have been made.
Electronic records have to comply with other regulatory requirements; for example studies which work with a US sponsor or collaborator would probably have to comply with the US Code of Federal Regulations (CFR), 21 CFR Part 11. The FDA is primarily concerned about records held in electronic format that are created, modified, maintained, archived, retrieved, or transmitted. This means that only validated systems with secure computer-generated, time-stamped audit trails that independently records the date and time of operator entries can be used. Adequate procedures and controls should also be put in place to maintain the integrity, authenticity, and confidentiality of data.
For multinational studies, it is important to note that the local implementation of the rules has to be taken into consideration when dealing with regulatory authorities in different countries.
DM Processes
It is true that data is not collected until patients are enrolled in a study, and therefore rather tempting to leave data management (DM) activities until much later in the planning stages. However, this can prove to be very detrimental to obtaining clean and timely data and researchers should start to think about data very early on in the design process as this will inform: the type of data to be collected, when they will be collected and whether or not it is relevant to collect the data.
Data management process should be designed to deliver an error-free, valid, and statistically sound database. This means that the whole process is designed keeping the deliverable in view.
Data entry - Data entry should be done according to the guidelines specified in the Data Management Plan (DMP). This is applicable in the transcription of data from paper CRF into the database and can be entered in mainly in two forms, namely: single entry where data is entered by one person or double entry where data is entered by two persons separately. The second data entry helps verify that the original data is correct. Double data entry is not required by regulations but makes good practice and helps with the verification and reconciliation of data by identifying transcription errors. Overall double data entry ensures that data is clean with little or no errors.
Data validation - Data validation is a process of testing the validity of data in accordance with the protocol specifications. The level of quality controls applied to data must be transparent and any procedure involved in data cleaning will need to be validated by a process defined in the Data Validation Plan (DVP). The DVP will also state the type of checks to be performed on the data and this can either be a manual or computerised check.
Manual checks - Involves a visual check of the CRF with a manual review of the data for illogical and inconsistent data e.g. performing a medical consistency check of significantly high or low lab data with a corresponding AE or that an AE has a corresponding concomitant medication.
Computerised checks - are done through edit checks, which are programmes written to identify discrepancies in the data. This means that there is an immediate check of data during data entry. These checks can be run in batches, e.g. at the end of a visit module or at the end of the CRF. Edit check programs are initially tested with dummy data containing discrepancies. If there are any errors in the data, the system will highlight the error - or fire queries.
For example, if the inclusion criteria specifies that only patients between the ages of 18 and 65 years should be enrolled, an edit program is written for two conditions: age <18 and >65. If for any patient, the condition becomes true, a discrepancy will be generated. Discrepant data may be due to inconsistent data, missing data, range checks, and deviations from the protocol.
Resolution of the data is dependent on the type of CRF. For e-CRF based studies, discrepancies are resolved by investigators after logging into the system whereas for paper CRF, data queries are resolved by raising Data Clarification Forms (DCF) on paper; which is signed by the principal investigator prior to sending them to the DM team.
Database lock/unlock - This is a controlled procedure to freeze data to prevent write/edit access to users of the system. Database lock is done once all data entered has been cleaned with no outstanding queries or discrepancies. Also, all external data such as the safety lab database or other vendor databases should have been cleaned and reconciled. Once completed, all stake holders will have to approve the lock and no further modification can be done to the data.
In exceptional cases, where critical missing data has been received after database lock, the appropriate SOP should be followed but essentially, only privileged users should be granted access and proper documentation of the audit trail. This should also include justification for updating the locked database. Following a successful lock, the data will be extracted for statistical analysis.
Data Monitoring & Site visit
Constant monitoring of trial data and site performance help to make visits more efficient and purposeful. A CRA is able to pinpoint exact areas that need to be addressed while on-site.
This allows visits to become more strategic in nature, where the CRA and site personnel work together to solve specific problems and set future goals.
Site visits involve resolving all the outstanding data management issues, source data verification, understanding the efforts of data entry and helping the site personals toward improving the quality data collection.
Medical coding - Medical coding helps in identifying and properly classifying the medical terminologies associated with the clinical trial. In a typical trial, Adverse Events, Medical History and Concomitant Medications are collected and recorded in the CRFs. Data generated in these trials will ultimately need to be analysed particularly in multi centre studies, where there are a number of countries and sites involved. In order to interpret data in a standardised format, medical coding helps to classify medical terms using a medical dictionary which are available online. The most commonly used medical dictionary is the Medical Dictionary for Regulatory Activities (MedDRA). This is used for the coding of adverse events whilst the World Health Organization – Drug Dictionary Enhanced (WHO-DDE) is used for coding medications. Other medical dictionaries include: Coding Symbols for Thesaurus of Adverse Reaction Terms (COSTART), the International Classification of Diseases 9 Revision Clinical Modification (ICD9CM) etc. Most clinical trials will normally run for a long period of time and it is important that the latest dictionary is used and that appropriate plans are in place to update data once the coding dictionary is updated. Coding can be done automatically or manually. Auto Coding is a term used to code data automatically and will code if the data recorded exactly matches the appropriate term in the medical dictionary. Manual Coding is where auto coding fails; in this case, the terms do not match the appropriate level of hierarchy in the medical dictionary. The data would therefore have to be coded manually by the medical coder who searches for the appropriate match for the term in the medical dictionary and manually assigns the code.
The right coding and classification of adverse events and medication is crucial as an incorrect coding may lead to masking of safety issues or highlight the wrong safety concerns related to the drug. This does not mean that all terms reported and recorded in the CRF / eCRF get coded without any issues. There are some terms which are unclear or for which it is not very easy for a coder to find matching terms within the dictionary. Unclear term(s) with insufficient details are queried to site and the investigator must provide signed appropriate updates/details to the data management team. Following this update, the coder reviews the information and codes the term appropriately.
Useful templates are available here: http://globalhealthtrials.tghn.org/articles/data-management/
For information about the guidelines governing data management, and how to code data, you can read the article here: http://globalhealthtrials.tghn.org/articles/data-management-overview-coding-and-guidelines/
And you can also find more information and discuss with experts on The Global Health Network’s Data Management member area, ADMIT (the Assosciation for Data Management in the Tropics): http://admit.tghn.org/
References
o Daven Babre “Medical Coding in Clinical Trials” Perspect Clin Res. 2010 Jan-Mar 1(1): 29–32.
o Binny Krishnankutty et al “Data Management in Clinical Research: An Overview” Indian J Pharmacol. 2012 Mar-Apr; 44(2): 168–172.
o Christian Ohmann “GCP-compliant data management in multinational clinical trials version 1”, 15th Sept 2008 European Clinical Research Infrastructures Network - Transnational Working Groups - (ERCIN - TWG)
Print all information
This article is part of the network’s archive of useful research information. This article is closed to new comments due to inactivity. We welcome new content which can be done ...
This eLearning course offers a comprehensive overview of the issues to consider within data management for clinical research studies of all types.
This article is part of the network’s archive of useful research information. This article is closed to new comments due to inactivity. We welcome new content which can be done ...
This article is part of the network’s archive of useful research information. This article is closed to new comments due to inactivity. We welcome new content which can be done ...
This procedure describes the process and provides instruction for database construction, security and backup within a clinical trial. This procedure applies to study team members responsible fr contructing a database ...
This procedure describes the process and provides instrution for receiving, tracking and validating data within a clinical trial. this procedure applies to study team members responsible for receiving data frm ...
This procedure describes the process and provides instruction for recording and validating data (source and case record form) generated during a clinical trial. This procedure applies to study team members ...
Data management requirements
Once you have decided on your study design and data to be captured you will need to consider how the data will be captured, stored and managed. ...
Twenty Questions for Research Data Management Planning: a data management research group has come up with twenty questions (available in an online form for ease of use) to assist in ...
Hi all,
I would like to ask if any of you are using OpenClinica for managing data? And what is your experience about it? Is it flexible? Is it better ...
Data management requirements
Once you have decided on your study design and data to be captured you will need to consider how the data will be captured, stored and managed. ...
Clinical data management system
Setting up an appropriate Clinical Data Management System is essential for the smooth running of the trial and for compliance with ICH-GCP. This should have been ...
During March 2013, we are very fortunate to have Lesley Workman of the University of Cape Town on hand to answer any queries you have about data management, in addition ...
Twenty Questions for Research Data Management Planning: a data management research group has come up with twenty questions (available in an online form for ease of use) to assist in ...