Spark NLP for Data Scientists

Live Online Certification Training

April 22 | 12 - 4 PM EDT

Natural Language Processing (NLP) is a key component in many data science systems that must understand or reason about text. Common use cases include knowledge extraction, question answering, entity recognition, spell correction, sentiment analysis, and document classification.

This four-hour workshop will walk you through state-of-the-art natural language processing (NLP) using John Snow Labs’ open-source Spark NLP library. This is a hands-on workshop that will enable you to write, edit, and run live Python notebooks that cover the majority of the open-source library’s functionality.

The workshop is organized in three hour-long sessions, each followed by 30 minutes of self-guided coding, on Python notebooks relevant to each section. This is a live online workshop and the instructors will be available during the self-guided sessions to answer questions.

This workshop is part of the recommended preparation for the “Certified Spark NLP Data Scientist” certification exam.

Outline:

Part I (90 minutes): Overview, Core Concepts, and Pre-Trained NLP Pipelines

Introduction to Spark NLP
Architecture and design goals
Core concepts: Pipelines, Annotators, Resources
Getting things done with pre-trained pipelines
Working with words: Sentence boundary detection, tokenization, stemming, lemmatization
Cleaning text: Normalization, stop-word remover, spell checking
Connecting words: Part of speech tagging, chunking, N-gram generator
Finding in text: Text matcher, date matcher, regex matcher, dependency parser

Part II (90 minutes): Custom NLP Pipelines & Named entity recognition

Building & configuring your own NLP pipeline
Understanding the Pipeline API and fit(), annotate(), and transform()
Named entity recognition (NER)
Using pre-trained NER models
Training your own NER model
Understanding and choosing embeddings
Using word, sentence, document, and universal embeddings

Part III (60 minutes): Document Classification and Inference

Understanding document classification use cases
Sentiment analysis annotators & models
Training your own document classifier
Integrating with other machine learning frameworks
Saving, loading, and sharing NLP models
Using LightPipeline for low-latency inference

PREREQUISITE KNOWLEDGE:

A working knowledge of Python
Familiarity with the basics of machine learning, deep learning, and Apache Spark

MATERIALS OR DOWNLOADS NEEDED IN ADVANCE:

A laptop with the tutorial environment installed
Complete the setup instructions (to be emailed before the workshop)

WHAT YOU'LL LEARN:

Gain hands-on experience building complete NLP pipelines in Python
Understand the different features and tasks that NLP pipelines include
Know which pre-trained models are available with Spark NLP and how to use them
Understand when and how to train your own NLP models
Understand how to apply state-of-the-art deep learning, transfer learning, and transformers in day-to-day NLP use

Register now

John Snow Labs is an award-winning AI and NLP company, accelerating progress in data science by providing state-of-the-art models, data and platforms. Founded in 2015, the company helps healthcare and life sciences companies – include Roche, Kaiser Permanente, Intel, and UCB – build, scale, deploy, and operate AI products and services. The company is the winner of CIO Review’s AI Solution Provider of the Year in 2018 and CIO Application’s AI Platform of the Year in 2019. It also won the Strata Data Award in 2019 for delivering Spark NLP – the world’s most widely used NLP library in the enterprise. John Snow Labs is a global team of data specialists – a third of the team has a Ph.D. or M.D. degree and 75% hold at least a master’s degree in disciplines covering data science, medicine, data engineering, pharma, data security, and DataOps.

image (2) image (3) StrataData-2019 . .