Spark NLP for Healthcare Data Scientists

Live Online Certification Training

May 13 | 12 - 4 PM EDT

Many critical facts required by healthcare AI applications – like patient risk prediction, cohort selection, automated clinical coding, and clinical decision support—are locked in unstructured free-text data. Recent advances in deep learning have raised the bar on achievable accuracy for tasks like named entity recognition, assertion status detection, entity resolution, deidentification, and others.

Spark NLP for Healthcare provides production-grade, scalable, and trainable implementation of novel healthcare-specific natural language processing (NLP) algorithms and models. The product is licensed by John Snow Labs, the creator of Spark NLP, and provides data scientists with a library and pre-trained models for the most common medical NLP tasks. This is a hands-on workshop that will enable you to write, edit, and run Python notebooks that use the product’s functionality.

The workshop is organized in three hour-long sessions, each followed by 30 minutes of self-guided coding, on Python notebooks relevant to each section. This is a live online workshop and the instructors will be available during the self-guided sessions to answer questions.

This workshop is part of the recommended preparation for the “Certified Spark NLP Healthcare Data Scientist” certification exam.

Outline:

Part I (90 minutes): Medical Knowledge Extraction

Introduction to Spark NLP for Healthcare
Common medical NLP use cases
Clinical named entity recognition
Using the pre-trained healthcare NLP models & pipelines
Using the pre-trained healthcare embeddings
Cleaning medical text: Normalization, stop-words, clinical POS, spell checking
Training your own healthcare NER model
Assertion status detection
Using the pre-trained assertion status detection models
Training your own assertion status detection model

Part II (90 minutes): Medical Entity Resolution & De-identification

Resolving medical entities to standard terminologies
Using the pre-trained medical entity resolution models
Training your own entity resolution model
Model evaluation
Introduction to healthcare data de-identification
Unstructured data de-identification
Structured data de-identification
Removing, masking, or obfuscating PHI

Part III (60 minutes): Object Character Recognition (OCR)

Introduction to Spark OCR
Building and configuring a Spark OCR pipeline
Running OCR on PDF with text, PDF with images, and image files
Image pre-processing: binarizer, thresholding, erosion, scaling, skew correction
Image cleaning: noise scorer, remove objects, morphology (erosion + dilation)
Splitting images to segments: LayoutAnalyzer, SplitRegions, DrawRegions
Finding & highlighting the coordinates of extracted text
Unifying Spark OCR & Spark NLP pipelines

PREREQUISITE KNOWLEDGE:

A working knowledge of Python
Familiarity with the basics of machine learning, deep learning, and Apache Spark
Familiarity with basic medical terms, document types, and standard terminologies

MATERIALS OR DOWNLOADS NEEDED IN ADVANCE:

A laptop with the Spark NLP for Healthcare product & license (or trial license) installed
A laptop with the tutorial environment installed (Python, Jupyter, Spark, etc.)
Complete the setup instructions (to be emailed before the workshop)

WHAT YOU'LL LEARN:

Gain hands-on experience building complete healthcare NLP pipelines in Python
Understand the different features and tasks that healthcare NLP pipelines include
Know which pre-trained models, algorithms, and transformers are available with Spark NLP for Healthcare and how to use them
Understand when and how to train your own healthcare NLP models
Understand how to apply state-of-the-art deep learning, transfer learning, and transformers for common healthcare NLP use cases

Register now

John Snow Labs is an award-winning AI and NLP company, accelerating progress in data science by providing state-of-the-art models, data and platforms. Founded in 2015, the company helps healthcare and life sciences companies – include Roche, Kaiser Permanente, Intel, and UCB – build, scale, deploy, and operate AI products and services. The company is the winner of CIO Review’s AI Solution Provider of the Year in 2018 and CIO Application’s AI Platform of the Year in 2019. It also won the Strata Data Award in 2019 for delivering Spark NLP – the world’s most widely used NLP library in the enterprise. John Snow Labs is a global team of data specialists – a third of the team has a Ph.D. or M.D. degree and 75% hold at least a master’s degree in disciplines covering data science, medicine, data engineering, pharma, data security, and DataOps.

image (2) image (3) StrataData-2019 . .