SQL is one of the most helpful languages you can learn today. But what if I told you that you could generate SQL queries without knowing how to write SQL? In fact, all you’ll need to know is how to ask a question in plain english.
We’re going to take a short cut with the help of GPT3 to generate SQL queries for us, all from some basic text input. The best part is, we’re going to do it using NBA data!
In this tutorial we’ll walk through how to create a local web app that takes in text and spits out the equivalent in SQL.
I want to give a special shoutout to Shreya Shankar, Bora Uyumazturk, Devin Stein, Gulan and Michael Lavelle who published this incredible gpt3-sandbox repo on Github. We will more or less use it right out of the box with a few edits to fit our use case.
The backend uses Flask while the frontend is in React, but don’t worry, we will mostly be using Python. Let’s jump in.
I’d recommend working within VSCode for this project. Once there, you can open up a new terminal. Within the terminal, create a new folder in your working directory
$ mkdir gpt3-sql-test
Then switch into that directory
$ cd gpt3-sql-test
Next thing we need to do is clone their gpt3-sandbox repository. We will use the git command
git clone https://github.com/shreyashankar/gpt3-sandbox.git to clone the repository. This will create a new directory called gpt3-sandbox and initialize a
.git directory inside of it.
After cloning, and once you’re in the directory, we should see all the project files in there which we will modify for our use case.
The one thing that I find helpful when git cloning is giving the cloned repository a different name. We can do this by adding a new directory name as an additional argument like this:
$ git clone https://github.com/shreyashankar/gpt3-sandbox.git demo-files
Make sure you are in the root directory of the repo we just cloned. From there we can set up our virtual environment.
Your future self will thank you for using a virtual environment whenever possible. It helps to keep code contained and make it more replicable since all the dependencies and site packages are in one place. People set up virtual environments numerous ways, but here are the commands I follow.
Create virtual environment
$ python3 -m venv venv
Activate virtual environment
$ source venv/bin/activate
You will notice a (venv) in your terminal which indicates you are in your virtual environment.
Next we need to install our requirements. This is done by accessing the
requirements.txt file within the api folder. We can do this by running the following command:
pip install -r api/requirements.txt
This will download and install all the packages we need for playing with the GPT3 sandbox. There is one other package we need to install on top of those, python-dotenv, which we will use to manage our environment variables.
We can use pip to install this package as well:pip install python-dotenv
Create a new .env file within VSCode. This is going to be where we upload our super secret api keys. You want to make sure to have a
.gitignore file if you upload this to github, with the typical python setup which will include .env files.
You will need access to OpenAI for this part. Head over to your account and go to your API Keys. From there you can copy your API key. If you don’t have one yet, click on “Create new secret key” and copy it.
Once you’ve copied your key, go ahead and paste it into your .env file like so:
export OPENAI_KEY = ‘your-secret-api-key-here’
The above code is setting your environment variable to read in the secret OpenAI API key. We will come back to this later when using our python-dotenv package.
Hopefully you’ve set up npm in another instance. If not, there are a few ways you can go about it. I’d recommend checking out the yarn installation guide.
The first is using npm (which you’d need to install first). If you want to go this route, run:
$ npm install — global yarn
The other way is to use the homebrew package manager. This will also install node.js if it is not already installed.
brew install yarn
Once you have yarn installed you can run the yarn install command in the root directory. If you get no errors then you are ready to rock and roll!
One change we are going to make right away is modifying our
demo_web_app function to use python-dotenv in order to access our OpenAI API key. Within the api/ folder open the
This file runs the web app given the GPT object and UI configuration which we will go into more detail on later. What we’re doing now is changing how we want to pass in our secret key as a config variable.
There are a few lines within the demo_web_app function we will swap out. This function is used to create a flask app to serve the react app.
demo_web_app.py we are going to add the following to the top of the file. This will import in and initialize our dotenv function:
import os from dotenv import load_dotenv load_dotenv()
Then we’ll change the KEY_NAME to run the dotenv equivalent which looks like this:
KEY_NAME = os.getenv(‘OPENAI_KEY’)
OPENAI_KEY is what we exported in our
.env file. Then within the
demo_web_app function, let’s swap out the config var and key name for this:
Now I prematurely said we are ready to rock and roll. Lets verify that everything in our environment is set up properly by running one of the examples from the cloned repo:
We should get a success message in our terminal if everything is working properly. If not, review the Getting Started steps above.
I am not going to spend a lot of time going over exactly what GPT3 is or how it works. Partially because I still don’t fully understand it and mostly because there is a ton of literature out there if you want to learn more. I will pull some excerpts from the OG Twilio blog post to help paint the picture.
“GPT-3 (Generative Pre-trained Transformer 3) is a highly advanced language model trained on a very large corpus of text. In spite of its internal complexity, it is surprisingly simple to operate: you feed it some text, and the model generates some more, following a similar style and structure.”
“GPT-3 is non-deterministic, in the sense that given the same input, multiple runs of the engine will return different responses.”
**For more information I recommend reading the following: The Ultimate Guide to OpenAI’s GPT-3 Language Model**
You can also check out two of my other tutorials using GPT3:
So again, the general idea behind everything is that you “train” (aka prime) the GPT3 engine by giving examples for it to learn from. With just a few examples, GPT3 will fill in the blanks and basically mimic what you have taught it.
For our purposes, we need to provide it with basic text inputs that have the legit SQL queries as output. After that, we can just give it the basic text input and it will auto-magically create the SQL query to match.
Let’s use the OpenAI Playground to go over our examples first. The main text area is where we provide the text example inputs. The right sidebar is where we modify variables to change the desired text output.
At the top of the playground we will add a brief overview of what we want GPT3 to do. I will then add 3 examples with the correct output. Now, I know some SQL and I’ve spent time using the NBA Stats API so I actually understand how the tables are created. I will use a variety of tables within the examples to give GPT3 a better chance at predicting the right SQL.
I will then leave a final example without the output below. After hitting submit, GPT3 will do its best to fill it in. Let’s see how it does.
Not bad! I would say the only part it got wrong was that 3pt field goal made is not “threepm” but instead “FG3M.”
For more details on the (unofficial) NBA API you can check out this repo. Or take a peek at one of my other tutorials.
Ok, let’s convert this to actual code.
If we like how the playground is performing we can actually export our code directly from the playground. If we go this route, make sure to remove the prompt because we are going to use the
add_example method to do this.
But to get a better sense of how this works, let’s go into the examples folder and copy over the code from the
run_blank_example.py file. We can create a new file and call it
Let’s paste in all the code and go over it together.
At the top we are importing the necessary constructors and methods from our api folder. Again, this is what will use the flask backend and react frontend.
First thing we need to do is construct the
gpt3 object which optionally accepts a handful of parameters. The example only uses the engine, temperature and max_tokens but as we saw in our playground export, there are a lot more variables we can pass if we want to get fancy. For now, let’s keep it simple.
gpt = GPT( engine=”davinci”, temperature=0.5, max_tokens=100)
Next is where we add the examples to help prime gpt3. You can do this with a comma to separate the input and output. I am going to use three legit SQL queries to match general text. I will try and reference a few key tables like
player_game_logs and a few key columns like
player_name. The Example constructor accepts an input string and the matching output string.
From there we can add the examples to the
GPT object by calling the
add_example() method and passing in each Example one at a time.
gpt.add_example(Example(“Get all Lebron James info from player_game_logs for last season in his wins”, “SELECT * FROM player_game_logs WHERE player_name = ‘Lebron James’ AND season_id = ‘2018–19’ AND wl = ‘W’;”)) gpt.add_example(Example(“Give me Anthony Davis field goal percentages for every game he lost last season”, “SELECT matchup, fgm, fga, (round(fgm/fga,2)) as percentage FROM nba_stats.player_game_logs WHERE season_id = ‘2018–19’ AND player_name = ‘Anthony Davis’ AND wl = ‘L’;”)) gpt.add_example(Example(“Show me the total number of times a player from the Cleveland Cavaliers had a plus or minus greater than 20 and order by the total number of times in descending order”, “SELECT player_name, count(game_id) FROM player_game_logs WHERE season_id = ‘2015–16’ AND team_abbreviation = ‘CLE’ AND plus_minus >= 20 GROUP BY player_name ORDER BY 2 DESC;”))
The same way we had the input and output in our playground example, we are doing that here using the Example constructor from the sandbox.
From there we have actually done all of the complicated GPT3 parts. Next we can focus on the web front end.
The sandbox makes it super easy to customize the frontend UI. We have a description, button text and placeholder text. That is what will go here.
Let’s update it so it reflects what we are trying to build.
# Define UI configuration config = UIConfig(description=”Learn SQL with NBA Stats”,button_text=”Query”,placeholder=”How many threes did Stephen Curry make last season?”)
You will also see the ability to show example form using a true / false boolean. If its set to true then you will see all the examples you added to the code. If its false then those will be hidden.
This can be helpful in case you want to add another example (or delete one) through the front end.
After updating the text lets rerun the script and it should now look like this:
We are now ready to run the web app and interact with it. In your terminal run a python file like you would with python3 <name of new file>. You should see this in the terminal before it launches a new local tab using your localhost.
Feel free to give it a try.
Now I can’t promise its going to be right every time but I plan to work on a production ready version that is as close to perfect as can be.
So there you have it, SQL queries without knowing how to write SQL queries. If you have any interest in me doing more demos on legit NBA SQL Data please drop me a note on twitter.
Thanks again to the gpt3-sandbox team for making such an awesome repo. They really did make it easy to create a local web app to deploy locally and showcasing ideas. And big ups to the OpenAI team. GPT3 is amazing. I can’t wait to get my hands on Dall-E.
New posts delivered to your inbox
Get updates whenever I publish something new. If you're not ready to smash subscribe just yet, give me a test drive on Twitter.