Microsoft’s Solution to Offensive Generated Text

Recently, I have been made aware of a “Alex” linter, which is capable of analyzing the words used in a program and identifying ones that may be offensive or used in a harmful context. This got me interested in looking further into what other forms of software are being used to perform similar actions and where. 

In order to learn more on the topic, I have read a blog titled, “Microsoft claims its new tools make language models safer to use” by Kyle Wiggers. This article goes in depth about how Microsoft has been developing open-source tools to audit AI generated content and automatically test them for potential bugs, especially in a content moderation context, where “toxic speech” may be used. Microsoft has focused their efforts on two projects for this cause.

ToxiGen is a dataset that contains 274,000 examples of statements that may be considered “toxic” or “neutral”, acting as a massive hate speech dataset and functioning in a similar but much greater scale of what the “Alex” linter does. ToxiGen is being used by researchers on LLMs similar to ChatGPT to generate statements that are likely to be misidentified and aid in finding potential weaknesses in these generative tools. 

AdaTest is the second program Microsoft is focusing on developing and should help address larger issues with AI language modules. It functionally generates a large number of tests, steered by human guidance, and organizes them into similar groupings. It is run with the goal of adding diversity to test cases and enhancing the reliability of LLMs. 

From my perspective, generative AI does not possess cognitive function in a comparable manner to that of a human and until it does, AI will forever struggle at identifying speech that may be acceptable in one context or culture but viewed as very offensive in a separate culture or environment. I also believe that because these newly developed programs are being made in the same way that the “Alex” linter is (that being through someone providing a list of key words or phrases to be cross referenced) and is not able to generate its own list of potentially harmful or “toxic” terms without human oversight, the most these programs will likely be able to do is provide quality standards for LLMs through testing. 

Through my research, I became aware of programs being developed by Microsoft to help detect harmful speech in a similar way that the “Alex” linter does (that being through cross referencing with a dataset). I also became aware of the many forms of biases that exist even in generative AI as a result of information provided by biased human input. Moving forward, I plan on being more careful with the phrasings I give or artificially generated when working on projects. Given that AdaTest is an open-source software as well, I am interested in using it in the future to test for bias and offensive speech wherever I use generative AI.

Blog Referenced: Microsoft claims its new tools make language models safer to use | TechCrunch

Design a site like this with WordPress.com
Get started