Spark + AI Summit 2018

State of the Art Natural Language Processing at Scale



Natural language processing is a key component in many data science systems that must understand or reason about text. Common use cases include question answering, paraphrasing or summarization, sentiment analysis, natural language BI, language modeling, and disambiguation. Building such systems usually requires combining three types of software libraries: NLP annotation frameworks, machine learning frameworks, and deep learning frameworks. Ideally, all three of these pieces should be able to be integrated into a single workflow. This makes development, experimentation, and deploying results much easier.

Spark’s MLlib provides a number of machine learning algorithms, and now there are also projects making deep learning achievable in MLlib pipelines. All we need is the NLP annotation frameworks. SparkNLP adds NLP annotations into the MLlib ecosystem. This talk will introduce SparkNLP: how to use it, its current functionality, and where it is going in the future.

About the speaker


David Talby is a CTO at John Snow Labs, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, Agile, distributed teams.

Previously, he was with Microsoft’s Bing Group, where he led business operations for Bing Shopping in the US and Europe and worked at Amazon both in Seattle and the UK, where he built and ran distributed teams that helped scale Amazon’s financial systems. David holds a Ph.D. in computer science and master’s degrees in both computer science and business administration.