Even though computers are capable of making complex calculations and storing large amounts of data, they are not yet good at analyzing and understanding the written word. Despite being capable of translating to and from several hundreds of languages, the computer is still not able to provide accurate results for human-style reading comprehension in the English language. In order to test AI’s capability, researchers from the University of Washington and the Google-owned artificial intelligence company DeepMind created nine reading comprehension tasks called ‘General Language Understanding Evaluation’ (GLUE).
Most of the machines and algorithms tested did very poorly with the highest score being 69 out of 100. These results signified that solving GLUE was beyond the capabilities of the current models and that the systems were not learning about the language itself. Rather, it was using algorithms, tricks and data fed into the system to tackle the problems at hand.
In October of 2018, Google introduced a new model that was named “Bidirectional Encoder Representations from Transformers” or BERT. BERT acquired a score of 80.5 out of a 100 on the GLUE test which was significantly better than the previous models. Working as an engineer at Google Brain, Jacob Uszkoreit and his collaborators devised a new mechanism that was based on the concept of attention. It was a mechanism that guided the AI to assign more weight to certain input words than others. For instance, the previous models were more likely to consider “A dog bites the man” and “A man bites the dog” as the same thing.
However, Jacob’s model would connect ‘bite’ and ‘dog’ as a verb and an object, interpreting or making sentences in a more expressive form. This was an extremely effective way to provide contextual meaning to sentences that had complex structures. In addition to that, BERT’s model read left to right and right to left at the same time to identify significant words to search up.
In 2018, OpenAI also introduced a system that was fed with language data from 11,038 books containing nearly a billion words and the system was able to score a 72.8 on GLUE. Recently in July 2019, two researchers at Taiwan’s National Cheng Kung University used BERT to achieve an even more impressive result and an improved model following the “Argument, Reasoning and Comprehension Task” system. This would enable the system to analyze the premise of comprehension and would find a reason for backing up a claim. This way the system would be able to rule out unrelated answer choices and increase the rate of accuracy. More impressively, the BERT based program by Microsoft was able to achieve a score of 87.6 on the GLUE test in June which passed the estimate for the human performance average score.
Even though AI is able to solve the LSAT and MCAT tests, it still lacks in language understanding. However, due to constant experimentation and advancements in newly devised systems by researchers, someday computers will be able to get a higher score on Language-based comprehension tests than humans.