Upcoming webinars presented by John Snow Labs
Are you working on machine learning tasks such as sentiment analysis, named entity recognition, text classification, image classification or audio segmentation? If so, you need training data adapted for your particular domain and task. This webinar will explain the best practices and strategies for getting the training data you need. We will go over the setup of the annotation team, the workflows that need to be in place for guaranteeing high accuracy and labeler agreement, and the tools that will help you increase productivity and eliminate errors.
Dia Trambitas is a computer scientist with a rich background in Natural Language Processing. She has a Ph.D. in Semantic Web from the University of Grenoble, France, where she worked on ways of describing spatial and temporal data using OWL ontologies and reasoning based on semantic annotations. She then changed her interest to text processing and data extraction from unstructured documents, a subject she has been working on for the last 10 years. She has a rich experience working with different annotation tools and leading document classification and NER extraction projects in verticals such as Finance, Investment, Banking, and Healthcare.
Spark OCR is an object character recognition library that can scale natively on any Spark cluster; enables processing documents privately without uploading them to a cloud service; and most importantly, provides state-of-the-art accuracy for a variety of common use cases. A primary method of maximizing accuracy is using a set of pre-built image pre-processing transformers - for noise reduction, skew correction, object removal, automated scaling, erosion, binarization, and dilation. These transformers can be combined into OCR pipelines that effectively resolve common 'document noise' issues that reduce OCR accuracy.
This webinar describes real-world OCR use cases, common accuracy issues they bring, and how to use image transformers in Spark OCR in order to resolve them at scale. Example Python code will be shared using executable notebooks that will be made publicly available.
Mykola Melnyk is a senior Scala, Python, and Spark software engineer with 15 years of industry experience. He has led teams and projects building machine learning and big data solutions in a variety of industries - and is currently the lead developer of the Spark OCR library at John Snow Labs.
The immense variety of terms, jargon, and acronyms used in medical documents means that named entity recognition of diseases, drugs, procedures, and other clinical entities isn't enough for most real-world healthcare AI applications. For example, knowing that "renal insufficiency", "decreased renal function" and "renal failure" should be mapped to the same code, before using that code as a feature in a patient risk prediction or clinical guidelines recommendation model, is critical to that's model's accuracy. Without it, the training algorithm will see these three terms as three separate features and will severely under-estimate the relevance of this condition.
This need for entity resolution, also known as entity normalization, is therefore a key requirement from a healthcare NLP library. This webinar explains how Spark NLP for Healthcare addresses this issue by providing trainable, deep-learning-based, clinical entity resolution, as well as pre-trained models for the most commonly used medical terminologies: SNOMED-CT, RxNorm, ICD-10-CM, ICD-10-PCS, and CPT.
Andrés Fernández is a Machine Learning Engineer and Data Scientist at John Snow Labs with 10 years of experience in the Finance, Retail and Healthcare industries.
After his MSc in Software Engineering at the University of Málaga, he has been helping Latin American and USA companies conceptualize, design and build AI solutions to automate their operations in functions like Insurance Claims, Pricing, Retail Procurement, Marketing, and others. Andrés has dedicated the last 5 years of his experience to deal with real-world applications for Natural Language Processing focusing mainly on Log Processing, Text Clustering, and Entity Resolution.
Model governance defines a collection of best practices for data science – versioning, reproducibility, experiment tracking, automated CI/CD, and others. Within a high-compliance setting where the data used for training or inference contains private health information (PHI) or similarly sensitive data, additional requirements such as strong identity management, role-based access control, approval workflows, and full audit trail are added.
This webinar summarizes requirements and best practices for establishing a high-productivity data science team within a high-compliance environment. It then demonstrates how these requirements can be met using John Snow Labs’ Healthcare AI Platform.
Ali Naqvi is the lead product manager of the AI Platform at John Snow Labs. Ali has extensive experience building end-to-end data science platform & solution for the healthcare and life science industries, using modern technology stacks such as Kubernetes, TensorFlow, Spark, mlFlow, Elastic, Nifi, and related tools. Ali has a Master’s degree in Molecular Science and over a decade of hands-on experience in software engineering and academic research.
Recent advances in deep learning enable automated de-identification of medical data to approach the accuracy achievable via manual effort. This includes accurate detection & obfuscation of patient names, doctor names, locations, organizations, and dates from unstructured documents – or accurate detection of column names & values in structured tables. This webinar explains:
After the webinar, you will understand how to de-identify data automatically, accurately, and at scale, for the most common scenarios.
Julio Bonis is a data scientist working on Spark NLP for Healthcare at John Snow Labs. Julio has broad experience in software development and design of complex data products within the scope of Real World Evidence (RWE) and Natural Language Processing (NLP). He also has substantial clinical and management experience – including entrepreneurship and Medical Affairs. Julio is a medical doctor specialized in Family Medicine (registered GP), has an Executive MBA - IESE, an MSc in Bioinformatics, and an MSc in Epidemiology.
Deep neural network models have recently achieved state-of-the-art performance gains in a variety of natural language processing (NLP) tasks. However, these gains rely on the availability of large amounts of annotated examples, without which state-of-the-art performance is rarely achievable. This is especially inconvenient for the many NLP fields where annotated examples are scarce, such as medical text.
Named entity recognition (NER) is one of the most important tasks for development of more sophisticated NLP systems. In this webinar, we will walk you through how to train a custom NER model using BERT embeddings in Spark NLP – taking advantage of transfer learning to greatly reduce the amount of annotated text to achieve accurate results. After the webinar, you will be able to train your own NER models with your own data in Spark NLP.
Veysel Kocaman is a Senior Data Scientist and ML Engineer at John Snow Labs and has a decade long industry experience. He is also pursuing his PhD in CS as well as giving lectures at Leiden University (NL) and holds an MS degree in Operations Research from Penn State University. He is affiliated with Google as a Developer Expert in Machine Learning.