Data Science Summit 2018

Spark NLP in Action: Learning to read Life Science research



Spark NLP is an open source library that natively extends Spark ML to provide natural language understanding with performance and scale that's an order magnitude better than what was available to date. This talk applies the library to classify and link research papers in life science, tackling the two main issues that make applying NLU in practice challenging.

First, human language is nuanced, fuzzy, and highly contextual, requiring domain-specific models to be trained for most tasks. Second, NLP is usually just part of a bigger machine learning or information retrieval pipeline that solves for a real business use case. The Python notebooks for this case study will be made freely available after the talk.

About the speaker


Saif Addin Ellafi is a software developer at John Snow Labs, where he’s the main contributor to Spark NLP. A data scientist, forever student, and an extreme sports and gaming enthusiast, Saif has wide experience in problem solving and quality assurance in the banking and finance industry.