Tutorial: Opening the Lipad Dataset in PostgreSQL
If you are familiar with basic SQL but have never used PostgreSQL, the following tutorial will help you get started querying the Lipad dataset. It covers installation of PostgreSQL on a Windows machine, importation of the Lipad dataset, and basic querying in pgAdmin3 (the graphical interface for PostgreSQL).
~~1) Download the PostgreSQL 9.6.1 visual installer, available at https://www.bigsql.org/postgresql/installers.jsp~~ The Lipad database dump has been updated to PostgreSQL 10.0. You can download an installation package here: https://www.postgresql.org/download/windows/ The below instructions are for an older version of Postgres, but most steps should remain similar.
2) Run the installer. First, the Setup Wizard will ask you to select the installation directory; leave this as default and click Next to proceed.
3) On the Select Components screen, un-check BigSQL Manager II and check pgAdmin3 LTS as shown in the diagram.
4) The next screen selects the location for database storage; leave this as default and proceed.
5) The Password screen asks you to provide a password for the default database user, "postgres". Select a password you will remember and make a note of it as you will need it at a later step in this tutorial.
6) At the Summary screen, click Next to proceed. Wait for the installer to finish, then click Finished when it has completed.
B. Database Setup
7) Start pgAdmin. On a Windows machine, it should be located at Start Menu > Programs > PostgreSQL > pgAdmin3 LTS by BigSQL.
8) Click on the "plug" icon at the top left of the pgAdmin window to begin setting up the connection to our local database. This will open a New Server Registration window.
9) In the New Server Registration window, enter the following information:
Name: localhost Host: localhost Port: 5432 Username: postgres Password: (enter the password you created earlier during installation)
10) When finished, click OK. Look at Object Browser pane of the pgAdmin3 window. You will now have a connection saved called "localhost" in the Object Browser, under Servers. Click the plus sign to expand "localhost" and view its databases. By default, there is one database created for our default user, "postgres"--it is used for maintenance and should not be modified.
11) Next, we will create a new database to hold the Lipad dataset. Right click on Databases and select New Database from the dropdown menu. Fill in the following information in the New Database... window:
Name: lipad Owner: postgres
When finished, click OK and wait a few moments as the database is created.
12) Click on the plus sign beside Databases and you should see that the new database "lipad" has been created. It will appear with a red X on its icon to signify that you are not currently connected to (working with) this database. Click on "lipad" to connect to the database.
C. Loading The Dataset
13) We are now ready to load the dataset into our newly-created "lipad" database. Download the latest database dump, basehansard-2.0.0.sql.bz2 from the Data page. You may need to right click and select Save Link As... from the menu.
14) Before using the dump file, it must be uncompressed. The free program 7zip, available at http://www.7-zip.org/download.html, is a great utility for this purpose. For a quick tutorial on using 7zip, click here.
Use 7zip to uncompress the file into basehansard-2.0.0.sql and make a note of where this new file is located. It may take some time to uncompress.
15) You will need to use a terminal window to load in the database. On Windows, type "cmd" in the Start Menu search and click on cmd.exe to start a terminal window.
16) You will see a command prompt ending with >. At the prompt, type the following command in the window, replacing the path to the file with the path where your basehansard-2.0.0.sql file is located. For example, I have saved the file at D:\Downloads\basehansard-2.0.0.sql
psql lipad < D:\Downloads\basehansard-2.0.0.sql postgres
17) Hit the Enter key to start the loading process. Words like "ALTER TABLE" and "COPY" will fill the window as the load progresses.
Please note: The restore process can take a few minutes to a few hours depending on your computer speed. The terminal window must be left open until the process completes.
18) When the loading process is complete, the terminal window will return to a prompt as shown in the diagram. You can now close the terminal.
D. Using the Database
19) Return to the pgAdmin window.
First, verify you are still connected to the "lipad" database. There should be no red X next to its name; if there is, click on its name to connect (as in step 12).
20) You can now see the database's tables by clicking the plus signs to expand in the following order: lipad > Schemas > Public > Tables.
The main Lipad dataset is contained in the table dilipadsite_basehansard. Other supplementary data tables include dilipadsite_constituency, dilipadsite_member, dilipadsite_party, and dilipadsite_position.
The remainder of tables are related to website functions. They are empty and can be ignored.
21) First, we will verify that the import has been successful. Right click on the main data table, dilipadsite_basehansard, and follow the dropdown menu to View Data > View Top 100 Rows. This will show you the first 100 rows of the table in a new window.
22) To perform SQL queries from the visual interface, click on the "SQL magnifying glass" at the top of the main pgAdmin3 window. A new Query window will open and a blank query tab is ready for you to type your query.
Click the green arrow in the toolbar to run the query and view output visually in this window.
If you wish to save the results of the query to a .csv file, click on the green arrow with the disk icon. Use the following settings to output a .csv file readable in Excel:
E. More Helpful Hints
Returning to Work: When starting a new session in pgAdmin, you will need to connect to the database to work with it. Start by right clicking the "localhost" entry under Servers in the Object Browser, then select Connect. You may need to enter the password you created earlier. Next, connect to the "lipad" database by clicking on it as normal. You can now begin to query the database.
Updating the Dataset: We update the Lipad database regularly with additions and error corrections. To update your local database with a newer copy, delete the existing "lipad" database by right clicking on its name and selecting Delete/Drop..., then clicking Yes to confirm the deletion. Then, simply repeat from step 11 to create a new database and load a newly-downloaded copy of the data file.