Webinars

Upcoming webinars presented by John Snow Labs

 

Watch Live: Nov 12 at 2pm EST

John Snow Labs NLU: Become a Data Science Superhero with One Line of Python code

Learn how to unleash the power of 350+ pre-trained NLP models, 100+ Word Embeddings, 50+ Sentence Embeddings, and 50+ Classifiers in 46 languages with 1 line of Python code. John Snow Labs' new NLU library marries the power of Spark NLP with the simplicity of Python. Tackle NLP tasks like NER, POS, Emotion Analysis, Keyword extraction, Question answering, Sarcasm Detection, Document classification using state-of-the-art techniques. The end-to-end library includes word & sentence embeddings like BERT, ELMO, ALBERT, XLNET, ELECTRA, USE, Small-BERT, and others; text wrangling and cleaning like tokenization, chunking, lemmatizing, stemming, normalizing, spell-checking, and matchers; and easy visualization capabilities using your embedded data with T-SNE.

Christian Kasim Loan, the creator of NLU, will walk through NLU and show you how easy it is to generate T-SNE visualizations of 6 Deep Learning Embeddings, achieve top classification results on text problems from Kaggle competition with 1 line of NLU code, and leverage the latest & greatest advances in deep learning & transfer learning.

 

Register

 

Christian Kasim Loan

Christian Kasim Loan

Data Scientist and Spark/Scala ML engineer

 


Recorded on: Sept 16th at 2pm EST

Answering natural language questions

The ability to directly answer medical questions asked in natural language either about a single entity (“what drugs has this patient been prescribed?”) or a set of entities (“list stage 4 lung cancer patients with no history of smoking”) has been a longstanding industry goal, given its broad applicability across many use cases.

This webinar presents a software solution, based on state-of-the-art deep learning and transfer learning research, for translating natural language questions to SQL statements. An actual case study will be a system which answers clinical questions by training domain-specific models and learning from reference data. This is a production-grade, trainable and scalable capability of Spark NLP Enterprise. Live notebooks will be shared to explain how you can use it in your own projects.

 

Watch recording

 

Prabod Rathnayaka

Prabod Rathnayaka

Graduate Research Assistant and PhD Student at La Trobe University

 


Recorded on: Aug 19th at 2pm EST

Accurate de-identification, obfuscation, and editing of scanned medical documents and images

One kind of noisy data that healthcare data scientists deal with is scanned documents and images: from PDF attachments of lab results, referrals, or genetic testing to DICOM files with medical imaging. These files are challenging to de-identify, because personal health information (PHI) can appear anywhere in free text – so cannot be removed with rules or regular expressions – or “burned” into images so that it’s not even available as digital text to begin with.

This webinar presents a software system that tackles these challenges, with lessons learned from applying it in real-world production systems. The workflow uses:

  • Spark OCR to extract both digital and scanned text from PDF and DICOM files
  • Spark NLP for Healthcare to recognize sensitive data in the extracted free text
  • The de-identification module to delete, replace, or obfuscate PHI
  • Spark OCR to generate new PDF or DICOM file with the de-identified data
  • Run the whole workflow within a local secure environment, with no need to share data with any third party or a public cloud API
Watch recording

 

Alina Petukhova

Dr. Alina Petukhova

Data Scientist at John Snow Labs

 


Recorded on: July 22nd at 2pm EST

Hardening a Cleanroom AI Platform to allow model training & inference on Protected Health Information

Artificial intelligence projects in high-compliance industries, like healthcare and life science, often require processing Protected Health Information (PHI). This may happen because the nature of the projects does not allow full de-identification in advance – for example, when dealing with rare diseases, genetic sequencing data, identify theft, or training de-identification models – or when training is anonymized data but inference must happen on data with PHI.

In such scenarios, the alternative is to create an “AI cleanroom” – an isolated, hardened, air-gap environment where the work happens. Such a software platform should enable data scientists to log into the cleanroom, and do all the development work inside it – from initial data exploration & experimentation to model deployment & operations – while no data, computation, or generated assets ever leave the cleanroom.

This webinar presents the architecture of such a Cleanroom AI Platform, which has been actively used by Fortune 500 companies for the past three years. Second, it will survey the hundreds of DevOps & SecOps features requires to realize such a platform – from multi-factor authentication and point-to-point encryption to vulnerability scanning and network isolation. Third, it will explain how a Kubernetes-based architecture enables “Cleanroom AI” without giving up on the main benefits of cloud computing: elasticity, scalability, turnkey deployment, and a fully managed environment.

 

Watch recording

 

Ali Naqvi

Ali Naqvi

Ali Naqvi is the lead product manager of the AI Platform at John Snow Labs. Ali has extensive experience building end-to-end data science platform & solution for the healthcare and life science industries, using modern technology stacks such as Kubernetes, TensorFlow, Spark, mlFlow, Elastic, Nifi, and related tools. Ali has a Master’s degree in Molecular Science and over a decade of hands-on experience in software engineering and academic research.


Recorded on: June 24th at 2pm EST

Maximizing Text Recognition Accuracy with Image Transformers in Spark OCR

Spark OCR is an object character recognition library that can scale natively on any Spark cluster; enables processing documents privately without uploading them to a cloud service; and most importantly, provides state-of-the-art accuracy for a variety of common use cases. A primary method of maximizing accuracy is using a set of pre-built image pre-processing transformers - for noise reduction, skew correction, object removal, automated scaling, erosion, binarization, and dilation. These transformers can be combined into OCR pipelines that effectively resolve common 'document noise' issues that reduce OCR accuracy.

This webinar describes real-world OCR use cases, common accuracy issues they bring, and how to use image transformers in Spark OCR in order to resolve them at scale. Example Python code will be shared using executable notebooks that will be made publicly available.

 

Watch recording

 

Mykola Melnyk

Mykola Melnyk

Mykola Melnyk is a senior Scala, Python, and Spark software engineer with 15 years of industry experience. He has led teams and projects building machine learning and big data solutions in a variety of industries - and is currently the lead developer of the Spark OCR library at John Snow Labs.


Recorded on: May 27th at 2pm EST

Best Practices & Tools for Accurate Document Annotation and Data Abstraction

Are you working on machine learning tasks such as sentiment analysis, named entity recognition, text classification, image classification or audio segmentation? If so, you need training data adapted for your particular domain and task.

This webinar will explain the best practices and strategies for getting the training data you need. We will go over the setup of the annotation team, the workflows that need to be in place for guaranteeing high accuracy and labeler agreement, and the tools that will help you increase productivity and eliminate errors.

 

Watch Recording

 

Dia Trambitas

Dia Trambitas

Dia Trambitas is a computer scientist with a rich background in Natural Language Processing. She has a Ph.D. in Semantic Web from the University of Grenoble, France, where she worked on ways of describing spatial and temporal data using OWL ontologies and reasoning based on semantic annotations. She then changed her interest to text processing and data extraction from unstructured documents, a subject she has been working on for the last 10 years. She has a rich experience working with different annotation tools and leading document classification and NER extraction projects in verticals such as Finance, Investment, Banking, and Healthcare.


Recorded on: April 29th at 2pm EST

Automated Mapping of Clinical Entities from Natural Language Text to Medical Terminologies

The immense variety of terms, jargon, and acronyms used in medical documents means that named entity recognition of diseases, drugs, procedures, and other clinical entities isn't enough for most real-world healthcare AI applications. For example, knowing that "renal insufficiency", "decreased renal function" and "renal failure" should be mapped to the same code, before using that code as a feature in a patient risk prediction or clinical guidelines recommendation model, is critical to that's model's accuracy. Without it, the training algorithm will see these three terms as three separate features and will severely under-estimate the relevance of this condition.

This need for entity resolution, also known as entity normalization, is therefore a key requirement from a healthcare NLP library. This webinar explains how Spark NLP for Healthcare addresses this issue by providing trainable, deep-learning-based, clinical entity resolution, as well as pre-trained models for the most commonly used medical terminologies: SNOMED-CT, RxNorm, ICD-10-CM, ICD-10-PCS, and CPT.

 

Watch Recording

 

Andrés Fernández

Andres Fernandez

Andrés Fernández is a Machine Learning Engineer and Data Scientist at John Snow Labs with 10 years of experience in the Finance, Retail and Healthcare industries.

After his MSc in Software Engineering at the University of Málaga, he has been helping Latin American and USA companies conceptualize, design and build AI solutions to automate their operations in functions like Insurance Claims, Pricing, Retail Procurement, Marketing, and others. Andrés has dedicated the last 5 years of his experience to deal with real-world applications for Natural Language Processing focusing mainly on Log Processing, Text Clustering, and Entity Resolution.


Recorded on: April 8th at 2pm EST

AI Model Governance in a High-Compliance Industry

Model governance defines a collection of best practices for data science – versioning, reproducibility, experiment tracking, automated CI/CD, and others. Within a high-compliance setting where the data used for training or inference contains private health information (PHI) or similarly sensitive data, additional requirements such as strong identity management, role-based access control, approval workflows, and full audit trail are added.

This webinar summarizes requirements and best practices for establishing a high-productivity data science team within a high-compliance environment. It then demonstrates how these requirements can be met using John Snow Labs’ Healthcare AI Platform.

 

Watch Recording

 

Ali Naqvi

Ali Naqvi

Ali Naqvi is the lead product manager of the AI Platform at John Snow Labs. Ali has extensive experience building end-to-end data science platform & solution for the healthcare and life science industries, using modern technology stacks such as Kubernetes, TensorFlow, Spark, mlFlow, Elastic, Nifi, and related tools. Ali has a Master’s degree in Molecular Science and over a decade of hands-on experience in software engineering and academic research.


Recorded on: March 18th at 2pm EST

Accurate De-Identification of Structured & Unstructured Medical Data at Scale

Recent advances in deep learning enable automated de-identification of medical data to approach the accuracy achievable via manual effort. This includes accurate detection & obfuscation of patient names, doctor names, locations, organizations, and dates from unstructured documents – or accurate detection of column names & values in structured tables. This webinar explains:

  1. What’s required to de-identify medical records under the US HIPAA privacy rule
  2. Typical de-identification use cases, for structured and unstructured data
  3. How to implement de-identification of these use cases using Spark NLP for Healthcare

After the webinar, you will understand how to de-identify data automatically, accurately, and at scale, for the most common scenarios.

 

Watch Recording

 

Julio Bonis

Julio Bonis

Julio Bonis is a data scientist working on Spark NLP for Healthcare at John Snow Labs. Julio has broad experience in software development and design of complex data products within the scope of Real World Evidence (RWE) and Natural Language Processing (NLP). He also has substantial clinical and management experience – including entrepreneurship and Medical Affairs. Julio is a medical doctor specialized in Family Medicine (registered GP), has an Executive MBA - IESE, an MSc in Bioinformatics, and an MSc in Epidemiology.


Recorded on: February 26th at 2pm EST

State-of-the-art named entity recognition with BERT

Deep neural network models have recently achieved state-of-the-art performance gains in a variety of natural language processing (NLP) tasks. However, these gains rely on the availability of large amounts of annotated examples, without which state-of-the-art performance is rarely achievable. This is especially inconvenient for the many NLP fields where annotated examples are scarce, such as medical text.

Named entity recognition (NER) is one of the most important tasks for development of more sophisticated NLP systems. In this webinar, we will walk you through how to train a custom NER model using BERT embeddings in Spark NLP – taking advantage of transfer learning to greatly reduce the amount of annotated text to achieve accurate results. After the webinar, you will be able to train your own NER models with your own data in Spark NLP.

Watch Recording

 

Veysel Kocaman

Veysel Kocaman

Veysel Kocaman is a Senior Data Scientist and ML Engineer at John Snow Labs and has a decade long industry experience. He is also pursuing his PhD in CS as well as giving lectures at Leiden University (NL) and holds an MS degree in Operations Research from Penn State University. He is affiliated with Google as a Developer Expert in Machine Learning.