I realised recently that the domains AskLeni.com and LeniBot.com are available for the taking as of this writing. There’s an idea right there for an interesting project or thesis for Computer Science students — building an artificial intelligence (AI) that can generate Leni-speak.
The AI could work off a digitised body of public statements from Philippine “vice president” Leni Robredo quoted verbatim on published news reports. This is called the training text. The model or algorithm for generating a Leni-ism could be as simple as any one of those probabilistic models for generating text strings based on the frequency of the occurrences of words and combinations of word and character sequences in the training text. One such model is the Markov Chain Algorithm which “determines the next most probable suffix word for a given prefix.”To do this, a Markov chain program typically breaks an input text (training text) into a series of words, then by sliding along them in some fixed sized window, storing the first N words as a prefix and then the N + 1 word as a member of a set to choose from randomly for the suffix.
That’s the easy part.
To make things even more interesting, perhaps this AI could work behind a Web front-end that allows visitors to type in any question and then have the site return, say, a 500-character Leni-ism in reply. The challenge would be to develop an algorithm to interpret what users type into, say, the “Ask Leni” text box on the site and build the appropriate text strings to “seed” the Markov chain in such a way that the generated Leni-ism comes as close to looking like a reply as possible using these simple algorithms.
Here’s what such a website might look like.
For bonus points:
(1) Have different Leni images to support variety in emotional content of the generated Leni-ism (plus 5 points).
(2) Allow users to score quality of reply to entered question (plus 5 points).
(3) Use a machine learning algorithm to use scoring data in Item 2 to improve Markov Chain seed algorithm to enable the system to “learn” how to generate better replies (plus 15 points).
Happy coding!
No comments:
Post a Comment