Blog Tutorial-Series-for-Experienced-Rails-Developers

Transforming Education: Using OpenAI and Ruby to Summarise Wikipedia for Children

Team Avatar - Kane Hooper
Kane Hooper
February 20, 2023

Personally, I am not a big fan of Wikipedia. When I was studying vector mathematics for my Masters of Machine Learning degree I tried to use Wikipedia to understand some of the key concepts. It was like walking into a buzzsaw. My head certainly felt like it had been split open.

When the first Encyclopaedia Brittanica was written, it was designed for lay people to look up and understand topics they otherwise wouldn’t be able to access.

My issue with Wikipedia, especially for technical topics, is that it is often edited by academics and individuals who want to show off their intellectual prowess. It makes reading the articles equivalent in difficulty to reading academic papers. The complexity can often be overwhelming.

Wikipedia Article

Obstetrics Wikipedia article Jan 2023

That gave me the idea to write a script that would simplify Wikipedia articles.

That inspired another thought: what if you could summarise all of the knowledge on Wikipedia so that children could understand it? This inspired the concept for WikiKids. We could make the vast array of information on Wikipedia available to children by using OpenAI natural language processing.

child oil

An impressionist painting of a child studying at a computer created by OpenAI

In this article, we’ll look at the beginnings of WikiKids with a Ruby code snippet that uses two powerful libraries — OpenAI and wikipedia-client— to retrieve and summarise a Wikipedia page’s introduction.

The OpenAI library is a powerful natural language processing (NLP) library that allows developers to access the OpenAI platform’s capabilities via a simple API. To process and understand human language, it includes a variety of features such as language modelling, machine learning, and deep learning.

The wikipedia-client gem is a Ruby library that allows developers to easily access the vast resources of Wikipedia. It provides a simple and easy-to-use API to retrieve information directly from Wikipedia.

Prerequisites

In order to follow along with this code you need an intermediate understanding of Ruby.

You should also have a basic understanding of the OpenAI API. I have written a beginners guide to the OpenAI API here

The Code

Goal: To write a Ruby script which returns the introduction section from a Wikipedia page and summarises it in a way a 10-year-old can understand.

In this example we will use the AI to summarise the introduction on Obstetrics.


Wikipedia Introduction (Obsetrics):

Obstetrics is the field of study concentrated on pregnancy, 
childbirth and the postpartum period. As a medical specialty, 
obstetrics is combined with gynecology under the discipline known 
as obstetrics and gynecology (OB/GYN), which is a surgical field.

Begin by installing the the two gems.


gem install ruby-openai
gem install wikipedia-client

You will need an OpenAI account and an API key. At this writing you can sign up for a free account at https://openai.com/api/


require 'ruby/openai'
require 'wikipedia-client'

client = OpenAI::Client.new(access_token: 'YOUR_API_KEY')

wikiPage = 'Obstetrics'

page = Wikipedia.find(wikiPage)
introduction = page.summary

prompt = "Summarize the following Wikipedia introduction text 
          so a 10-year-old can understand it: \n#{introduction}"

response = client.completions(
  parameters: {
    model: "text-davinci-003",
    prompt: prompt,
    temperature: 0.5,
    max_tokens: 2000,
  }
)

puts response['choices'][0]['text'].lstrip

After a few seconds the AI provides its response.


AI Response:

Obstetrics is a medical field that focuses on the care of pregnant women, 
the process of giving birth and what happens afterwards. It is linked to 
gynecology, which is the study of women's health.

Compare the AI output to the Wikipedia entry from earlier. I think it has done a nice job simplifying the content.

What the Code is Doing

Using your API key, the Ruby script creates an instance of the OpenAI client. It looks up the Wikipedia page on “Obstetrics” using the Wikipedia-client library. The page.summary property returns the article’s introduction. This is the section at the beginning of a Wiki page.

The code then creates a prompt. This is the instructions which will be sent to the OpenAI model. The prompt is the most important part of the code as it is effectively our instruction to the AI model. Prompt engineering (creating prompts that return the required results) is an important topic. There is an art to phrasing prompts in a way that causes the AI to provided us with the responses we want.

The code then sends this prompt to the OpenAI API and asks for a response, by using the completions method of the client object. The parameters passed to the method include the type of model to use text-davinci-003is the latest and most advanced modeal. The temperature of the response which is a number from 0 to 1 and represents the ‘creativness’ of the response. You can adjust this to manage the AIs creativity in responding. The max_tokens to be used in the response. A token is 4 characters.

Finally, the function writes the AI output to the console, which is a summary of the Wikipedia introduction about Obstetrics that a 10-year-old can understand.

WikiKids

This script is just the beginning. I foresee a website that allows kids to visit any Wikipedia page, and the AI model will render the simplified content in real-time.

Imagine giving kids the ability to ask the AI questions about the content and get meaningful answers.

I modified the prompt to respond to children questions:


prompt = "Provide a very friendly and encouraging response to the following 
          question that a 10-year-old can understand: #{question}"

Here are some real questions and answers from the AI model. Each prompt is modified so the AI gives an answer a 10-year-old could understand.

Question: What should I study at school to become an obstetrics doctor?


AI Response: 

To become an obstetric doctor, you should study science subjects like biology 
and chemistry in high school. Then you would have to go to college and get a 
degree in a related field like pre-medicine or biology. Then you would go to 
medical school to learn how to become a doctor and after that, you would 
complete a special training program called a residency in obstetrics. 
It might seem like a lot of work but remember, you're not alone, you can do 
it and you will have lot of people to help you along the way.

Question: I don’t understand what a Gynaecologist is?

AI Response:


A gynaecologist is a doctor who helps women with their reproductive health. 
It's totally normal not to know what it is, there are many different types 
of doctors and it can be hard to keep track of them all. But don't worry, 
if you're interested in becoming a gynaecologist, you have plenty of time 
to learn more about it and see if it's something you're interested in 
pursuing.

Educational Impacts

I teach Finance at Deakin University in Australia. One thing I have always wanted to solve was how do you reach and support every student regardless of their ability to study.

With AI we could provide educational content that can be modified and adjusted to an individual students needs. We are no longer constrained by the need to provide a single educational model, but rather can meet each student or child where they are at and provide them the means to drive their own learning at a pace that suites them.

Summary

I am very interested in creating educational solutions utilising AI. If you would like to help me build WikiKids, please reach out and I would be very interested to speak.