Spark NLP for Healthcare Data Scientists

Live Online Certification Training
May 13 | 12 - 4 PM EDT

 

Many critical facts required by healthcare AI applications – like patient risk prediction, cohort selection, automated clinical coding, and clinical decision support—are locked in unstructured free-text data. Recent advances in deep learning have raised the bar on achievable accuracy for tasks like named entity recognition, assertion status detection, entity resolution, deidentification, and others.

Spark NLP for Healthcare provides production-grade, scalable, and trainable implementation of novel healthcare-specific natural language processing (NLP) algorithms and models. The product is licensed by John Snow Labs, the creator of Spark NLP, and provides data scientists with a library and pre-trained models for the most common medical NLP tasks. This is a hands-on workshop that will enable you to write, edit, and run Python notebooks that use the product’s functionality.

The workshop is organized in three hour-long sessions, each followed by 30 minutes of self-guided coding, on Python notebooks relevant to each section. This is a live online workshop and the instructors will be available during the self-guided sessions to answer questions.

This workshop is part of the recommended preparation for the “Certified Spark NLP Healthcare Data Scientist” certification exam.

Outline:

Part I (90 minutes): Medical Knowledge Extraction

  • Introduction to Spark NLP for Healthcare
  • Common medical NLP use cases
  • Clinical named entity recognition
  • Using the pre-trained healthcare NLP models & pipelines
  • Using the pre-trained healthcare embeddings
  • Cleaning medical text: Normalization, stop-words, clinical POS, spell checking
  • Training your own healthcare NER model
  • Assertion status detection
  • Using the pre-trained assertion status detection models
  • Training your own assertion status detection model

Part II (90 minutes): Medical Entity Resolution & De-identification

  • Resolving medical entities to standard terminologies
  • Using the pre-trained medical entity resolution models
  • Training your own entity resolution model
  • Model evaluation
  • Introduction to healthcare data de-identification
  • Unstructured data de-identification
  • Structured data de-identification
  • Removing, masking, or obfuscating PHI

Part III (60 minutes): Object Character Recognition (OCR)

  • Introduction to Spark OCR
  • Building and configuring a Spark OCR pipeline
  • Running OCR on PDF with text, PDF with images, and image files
  • Image pre-processing: binarizer, thresholding, erosion, scaling, skew correction
  • Image cleaning: noise scorer, remove objects, morphology (erosion + dilation)
  • Splitting images to segments: LayoutAnalyzer, SplitRegions, DrawRegions
  • Finding & highlighting the coordinates of extracted text
  • Unifying Spark OCR & Spark NLP pipelines

PREREQUISITE KNOWLEDGE:

  • A working knowledge of Python
  • Familiarity with the basics of machine learning, deep learning, and Apache Spark
  • Familiarity with basic medical terms, document types, and standard terminologies

MATERIALS OR DOWNLOADS NEEDED IN ADVANCE:

  • A laptop with the Spark NLP for Healthcare product & license (or trial license) installed 
  • A laptop with the tutorial environment installed (Python, Jupyter, Spark, etc.)
  • Complete the setup instructions (to be emailed before the workshop)

WHAT YOU'LL LEARN: 

  • Gain hands-on experience building complete healthcare NLP pipelines in Python
  • Understand the different features and tasks that healthcare NLP pipelines include
  • Know which pre-trained models, algorithms, and transformers are available with Spark NLP for Healthcare and how to use them
  • Understand when and how to train your own healthcare NLP models
  • Understand how to apply state-of-the-art deep learning, transfer learning, and transformers for common healthcare NLP use cases

Register now

 

 

johnsnowlabs_logo

John Snow Labs is an award-winning AI and NLP company, accelerating progress in data science by providing state-of-the-art models, data and platforms. Founded in 2015, the company helps healthcare and life sciences companies – include Roche, Kaiser Permanente, Intel, and UCB – build, scale, deploy, and operate AI products and services. The company is the winner of CIO Review’s AI Solution Provider of the Year in 2018 and CIO Application’s AI Platform of the Year in 2019. It also won the Strata Data Award in 2019 for delivering Spark NLP – the world’s most widely used NLP library in the enterprise. John Snow Labs is a global team of data specialists – a third of the team has a Ph.D. or M.D. degree and 75% hold at least a master’s degree in disciplines covering data science, medicine, data engineering, pharma, data security, and DataOps. 

 

image (2)   image (3)StrataData-2019.  .

DSF_Logo_bkg_CMYK-Awards

 

Logo_AI_SPY