Extract from George Socha’s article “BERT, MBERT, and the Quest to Understand”
Having used supervised machine learning with discovery documents for nearly two decades, we have come to appreciate that classifiers help us find and understand the significance of important content faster. By appropriately deploying supervised machine learning variations such as Active Learning, Continuous Active Learning (CAL), TAR (Technology Assisted Review), TAR 1.0, and TAR 2.0, we have been able to train systems that often outperform more traditional approaches.
As powerful as these capabilities are, however, they can be improved. That is where the technologies we examine today, BERT and MBERT, fit in.
BERT is a form of deep learning natural language processing that can jump-start your review. It does this via a more nuanced “understanding” of text than is achievable with traditional machine learning techniques. BERT arrives at this understanding by looking beyond individual words to those words’ context. MBERT takes BERT one critical step farther. MBERT is the multilingual version of BERT, trained to work with 104 languages. For convenience’s sake, for most of this post I will just refer to BERT instead of trying to distinguish between BERT and MBERT.