Papers & Conferences

Peer-reviewed academic papers and talks at prominent Data, AI and NLP Events

 

Biomedical Named Entity Recognition at Scale

November 2020
Authors: Veysel Kocaman, David Talby
Accepted to CADL 2020 (International Workshop on Computational Aspects of Deep Learning) , organized in conjunction with ICPR 2020

Read the Paper

Named entity recognition (NER) is a widely applicable natural language processing task and building block of question answering, topic modeling, information retrieval, etc. In the medical domain, NER plays a crucial role by extracting meaningful chunks from clinical notes and reports, which are then fed to downstream tasks like assertion status detection, entity resolution, relation extraction, and de-identification.
Reimplementing a Bi-LSTM-CNN-Char deep learning architecture on top of Apache Spark, we present a single trainable NER model that obtains new state-of-the-art results on seven public biomedical benchmarks without using heavy contextual embeddings like BERT. This includes improving BC4CHEMD to 93.72% (4.1% gain), Species800 to 80.91% (4.6% gain), and JNLPBA to 81.29% (5.2% gain).

Improving Clinical Document Understanding on COVID-19 Research with Spark NLP

December 2020
Authors: Veysel Kocaman, David Talby
Accepted to SDU (Scientific Document Understanding) workshop at AAAI 2021

Read the Paper

Following the global COVID-19 pandemic, the number of scientific papers studying the virus has grown massively, leading to increased interest in automated literate review. We present a clinical text mining system that improves on previous efforts.
First, it can recognize over 100 different entity types including social determinants of health, anatomy, risk factors, and adverse events in addition to other commonly used clinical and biomedical entities. We illustrate extracting trends and insights, e.g. most frequent disorders and symptoms, and most common vital signs and EKG findings, from the COVID-19 Open Research Dataset (CORD-19).
Second, the text processing pipeline includes assertion status detection, to distinguish between clinical facts that are present, absent, conditional, or about someone other than the patient. The deep learning models used improve on the previous best performing benchmarks for assertion status detection.

Spark NLP: Natural language understanding at scale

 January 2021
Authors: Veysel Kocaman, David Talby
Accepted to Software Impact Journal Elseiver

Read the Paper

Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment.
Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the world’s most widely used NLP library in the enterprise.

State-of-the-art Emotion and Sentiment Analysis with Spark NLP

Watch recording

 

 

data-science-salon-logo-black-text

November 18th, 2020
Speaker: Dia Trambitas

Automated Detection of Environmental, Social, and Governance Issues in Financial Documents

Watch Recording

 

 

data-science-salon-logo-black-text

December 10, 2020
Speaker: Alina Petrukhova

Spark NLP for Healthcare: Lessons Learned Building Real-World Healthcare AI System

Watch recording

 

Screen Shot 2020-08-31 at 4.01.03 PM-1

July 9, 2020
Speaker: Veysel Kocaman, PhD

Using Spark NLP to Enable Real-World Evidence (RWE) and Clinical Decision Support in Oncology

Watch recording

 

Screen Shot 2020-04-30 at 1.00.53 PM

April 13 - April 17, 2020
Speaker: Veysel Kocaman, PhD

Applying State-of-the-art Natural Language Processing for Personalized Healthcare

Watch recording

 

Screen Shot 2020-04-30 at 1.00.53 PM

April 13 - April 17, 2020
Speakers: David Talby, PhD

Spark NLP for Healthcare: Lessons Learned Building Real-World Healthcare AI Systems

Watch recording

 

Screen Shot 2020-04-30 at 1.00.53 PM

April 13 - April 17, 2020
Speaker: Veysel Kocaman, PhD

State-of-the-art Natural Language Processing at Scale

Watch recording

 

Screen Shot 2020-04-30 at 1.00.53 PM

April 13 - April 17, 2020
Speakers: David Talby, PhD

Apache SPARK NLP: Extending SPARK ML to Deliver Fast, Scalable & Unified Natural Language Processing

Watch recording

 

Spark + AI summit

June 4 - June 6, 2018
Speakers: David Talby, PhD

State of the Art Natural Language Processing at Scale

Watch recording

 

Spark + AI summit

June 4 - June 6, 2018

Speakers: David Talby, PhD


State of the Art Natural Language Processing at Scale

Watch recording

 

Screen Shot 2020-04-30 at 1.00.53 PM

October 26 - October 30, 2018
Speakers: David Talby, PhD

Spark NLP in Action: Learning to read Life Science research

Watch recording

 

 

Data Science Summit 2018
May 28, 2018
Speakers: Saif Addin Ellafi

Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and TensorFlow

Watch recording

 

spark summit europe

October 24 - October 26, 2017
Speakers: Alexander Thomas