Pharma and biotech are waiting for 21st century clinical trial data analytics engines

Date: 08. 26. 2019

As a repository of clinical trial information, few databases are as trusted – or as widely used – as A service of the U.S. National Library of Medicine, aims to provide comprehensive information on privately and publicly funded clinical studies conducted around the world. Many drug developers mine the database to facilitate decisions on trial design, investigator recruitment, and site selection. However, relying on a single source may not yield robust, reliable data: just as researchers must cite multiple sources to support their published findings, so too should industry gather information from multiple databases to make well-informed decisions about clinical trials.

To be fair, was not designed for the purposes it’s used for today, which the architects of the Food and Drug Administration Amendments Act (FDAAA) may not have foreseen. Nevertheless, for all its value, is inconsistent in the frequency of its updates, as well as in the level of detail and the quality of its uploaded data. It is limited by the fact that it relies almost entirely on trial sponsors (and the CROs that work on their behalf) to populate its database. That limitation is illustrated by a trial spreadsheet* we recently downloaded from At first glance, the spreadsheet appears to supply important information on individual sites in a multicenter trial: for each site, there is a field for the contact name, telephone number, email address, and status (i.e., recruiting, not yet recruiting, enrollment complete), as well as a field for the date of the most recent change to the site’s record. However, many of these fields are populated with null or otherwise useless information. For example, in the column that purportedly lists each site’s contact person, not one actual name appears; instead, the user is directed to “Please contact the [trial sponsor].” Similarly, the “contact email” column lists a sponsor email address for each trial site, as opposed to the site’s email. Additionally, for many sites the “last change” date is 2014 – one goes back to 2013 — suggesting that the site records are outdated.

That spreadsheet, with its many unpopulated fields, may be an extreme case of data deficiency, though it illustrates how some trial sponsors and CROs do not update trial data on with the requisite quality or frequency. It’s hard to make sense of such deficient data, much less base important decisions on them.

To a great extent, the phenomenon of data deficiency reflects the largely manual, 20th century process of updating the information gaps in trial listings. This should no longer be acceptable as there are 21st century solutions. While sponsors spend months manually vetting trial sites and spinning wheels, patients in need are waiting for new therapies.

It is possible to shorten patients’ waiting time through the use of a dynamic, continually updated data analytics engine powered by integrated algorithms that generate richer data than that provided by outmoded processes. For the same trial summarized in the downloaded spreadsheet, we at Phesi can create a spreadsheet that provides real-time data, including the investigator’s name, degree, affiliation, street address, city, state, zip code, and – importantly – the number of patients enrolled at each site.

To make it even easier for sponsors to utilize AI-powered technology, we recently launched ClinSite™, a Software as a Service (SaaS) solution that enables companies to do their own searching and site selection, with precise and accurate results in just a few keystrokes. Powered by our analytics platform, ClinSite™ enables forensic examination of investigator site performance data to identify consistently top-performing sites. It can help sponsors determine key factors to optimize site selection and understand why individual investigators consistently under- or over-perform versus their peers.

Our system cross-compares data points from as many as 80,000 sources – including – a feature that enhances data potency. Our vast and growing database, powered by data from 4.2 million physicians, 600,000 investigator sites, and 330,000 clinical trials in 240 countries, enables direct comparisons of similarly sized trials in the same therapeutic area. The database can also reveal individual investigators who specialize in, for example, newer biologic agents (as opposed to chemical agents), a benefit that can allow sponsors to build a pool of highly specialized and efficient research sites.

The limited utility of resources such as underscores how we can’t afford to spend a lot of time poring over voluminous and incomplete data. With advanced trial-modeling technology, it is now possible for trial sponsors to identify and activate top-performing sites while closing down under-performing sites, thereby shortening enrollment cycle time and potentially saving millions of dollars. Advanced data analytics allow you to use integrated datasets to your advantage, and to do it right the first time. After all, patients’ lives are at stake. What are you waiting for?

*NOTE to readers: If you would like to see the above-mentioned spreadsheet and a comparison with ClinSite™ , please reach out to

Related posts
04. 28. 2022
Why do nine out of ten clinical trials fail, and how can the industry learn from these mistakes? Read more
04. 01. 2022
What steps can clinical development organizations take to mitigate the risks after the war in Ukraine? Read more
03. 02. 2022
Phesi’s reflections of SCOPE 2022 Read more
12. 01. 2021
New diversity data from Phesi shows inclusion of Black and African American patients in clinical trials improving, but Asian, Hispanic and Latino groups significantly and consistently underrepresented Read more
12. 05. 2019
Delivering nash trials in a competitive world using clinsite Read more
10. 06. 2019
Using data analytics to evaluate clinical trial network performance: a look “under the hood” Read more