Skip to main content

Text mining offers business benefits

Written by: Renato Rjavec
Published on: 11 Nov 2021

Renato Rjavec [square]Since the advent of IDMP, data has become a key asset for life sciences companies. At present, pharma companies are focusing on data collection, but over time, data maintenance is set to become a bigger issue. This can only be tackled by paying attention to the quality and integrity of data and this is where advanced text-mining technologies can help, says Amplexor’s Renato Rjavec.

The target operating model for EMA regulatory submissions requires that original product data must be submitted alongside eCTD dossiers, and that means that collating and cleaning up data is a key issue for European and global life sciences organisations.

The road to full IDMP compliance does not end with initial registrations. Many marketing authorisation holders (MAHs) are still trying to locate source data, vet its quality and plug any gaps. The information they need may straddle regulatory information management (RIM) systems, Excel spreadsheets and static documents (labeling, CMC documents, and so on), across a range of functions with each department employing its own formatting and terminology. Extracting and cleaning up all these fragments of data is a huge task.

Yet this is just the tip of the iceberg. The job of maintaining and updating all this information, and keeping it tightly aligned with anything appearing in document form, is ongoing. Under the emerging target operating model (TOM) for regulatory submissions, once IDMP is live and mandatory in the EU, any discrepancies will immediately spark agency queries and set back registration timelines.

Ensuring that data and content remain in sync and up to date, and that FHIR messages (conforming to the Fast Healthcare Interoperability Resources standard data formats/API requirements for exchanging electronic health records) are fully aligned with the content of submitted eCTD sequences, will be essential.

Harnessing technology

Teams will need to set up processes to ensure that the contents of the dossier match the contents of the IDMP/SPOR dataset for each submission. (Under IDMP, Substances Products Organisations and Referentials – SPOR – data services provide the vehicle for implementation of ISO IDMP standards in the regulatory and e-health worlds.)

One option is to pull data from documents as an ongoing operational process, but this approach is likely to be very labour intensive and offers little additional business benefit.

Another option is to leverage well-structured data to generate content. But structured content authoring technology (in which documents are assembled automatically from pre-approved content fragments/data sets) is not yet mature enough to offer a failsafe and simple-to-use solution.

A better approach is to establish parallel processes to prepare documents and data, keeping both in tight alignment and ensuring this is the case as a quality control requirement until the final submission.

In this context, companies would do well to harness an already proven technology – advanced text mining. This has strong potential, both at a data extraction/quality checking level, and for ongoing data and content maintenance. The accuracy of such tools has now reached around 95 per cent in the context of automated data extraction.

Text-mining technology uses machine learning and natural language processing to help teams detect patterns or data points in existing documents, extract this information, encode it, and flow it into the company’s RIM system for onward processing.

The technology is helping to improve the efficiency of IDMP data extraction from a range of different documents, automatically populating RIM data records directly from those static files and providing teams with a good foundation for data enrichment.

Meanwhile, for ongoing data maintenance, advanced text-mining tools support proper data validation and user guidance to ensure that data remains complete, consistent and properly encoded.

Savings potential

The return-on-investment potential of advanced data-mining tools in both data extraction and data maintenance use cases is impressive. With a potential saving of hundreds of euros/dollars per record, companies processing tens of thousands of authorised records per year could see cost savings run into the millions.

More than that, advanced text mining has a meaningful role as part of a broader, end-to-end RIM capability – aiding planning, editing and formatting throughout, through its ability to validate data across the entire lifecycle.

Dealing with concerns

Once responsible teams are made aware of text-mining solutions, it usually takes only a small proof-of-concept study to showcase the potential and lay to rest any concerns about the technology’s accuracy and efficacy.

Ideally, text-mining technology should be deployed seamlessly as part of a broader RIM project – as part of an IDMP data migration initiative, as companies press on with data cleaning, structuring and importing, ready for the IDMP go-live date.

But whether or not teams are exposed to the technology directly, when assessing how they will accomplish their projects and keep within their allotted timeframes and budgets, text-mining is a useful tool in their arsenal.

  • In late October, Amplexor co-hosted a webinar on the potential of advanced text mining in the evolving regulatory environment, with K2 Consulting and Averbis. A recording of the session is available to view or download here.