Thursday, 30 October 2008

Assignment - Planning the Database

The Assignment


The assignment is to create a database driven website with the subject of my choice. I decided to create a site that stores information on games for old gaming consoles.

Planning the Database

The first step is to plan the database design around the information it needs to contain. For my site on retro games, at least the following information for each game will be needed:

Game Name
Game Description
Game year
Console Name
Console Details
Console year
Developer Name
Developer Details


Initial Data Structure.

There will be separate tables for Developer and Console details as these can apply to several games. This ensures that there will be no need to have duplicate data.

Each table has a primary field (ending in ID) that each entry in the table can be referenced by.









To test this structure, I create the database in Microsoft Access. The Image below shows the data type for each field.



All of the ID fields are set to Autonumber (A new number is generated for each entry in the table) and are the primary fields.


The Name and Description/Details fields are set to 'Memo'. This can be a mixture of alpha numeric characters. Memo is similar to 'varchar' in other databases.


Fields requiring numbers are set to 'number'. Number can be a small integer (game year: always a 4 digit number), or a large integer (ID fields: Number needs to go as large as the amount of entrys).

Note: I have renamed 'developer' to 'producer' for those particular fields. Also I have added a field for the url of any emulators in tbConsole

I setup the table relationships as follows:




The relationships shown above are called 'one to many' relationships. In my data structure above, the table tbProducer can carry one entry with the producerID of '2' where tbGames can carry many entries with the ProducerID of '2'. The adjoining line signifies this.

I then populated the tables with some sample data as shown below:







Test Queries


Now I have the database populated with some data, I run some test queries. Databases understand a language known as SQL (or variations of), with which we query the database. My first test query is as follows:


SELECT * FROM tbGames


This instructs to SELECT all fields(*) FROM tbGames. The result is shown below.






To test further, I alter the last query with a new line to return only those games from a specific console:


SELECT * FROM tbGamesWHERE ConsoleID = 2


This should return those games available on the console with consoleID=2 (Nintendo NES) which it does.


I repeated similar queries on tbProducer and on tbConsole with correct results.


Inner Joins


If I were to search my website for Games, the ID number for the console is useless without the information it relates to. Therefore I create another query to list the game name, and the console name that the game plays on. For this I use an 'Inner Join'. Inner joins are used for finding common values from 2 or more tables.

The following image shows my SQL query. The INNER JOIN statement tells the database that the ConsoleID field is the same for tbGames as for tbConsole. The last line speaks for itself:

ORDER (the results) BY GameName








The results from this have produced the following correct information:






Although the query has returned the correct information, there is a problem here. At least 1 game is available on more than 1 of the available consoles (Flashback, for example, appeared on at least 3 of the consoles here). The problem is that only 1 number can be stored in the Console ID field for the table tbGames.


One solution to this is to add a new entry in tbGames for each console that the game is available on. This would involve duplicated data, so a better solution is to create a new table that stores only 'GameID' and 'ConsoleID'. The new table needs a primary key so I name this 'RecordID'.


I setup this new table and alter the relationships as follows:






With the structure above, numerous console ID's can be assigned to each gameID by simply inserting numbers into the new table. I inserted some data into the new table that applies to the games on multiple consoles.






Nested INNER JOINs


I needed to retrieve a list of games that showed every console that a game appeared on. To do this I needed to create an INNER JOINs to connect the Console ID field between tbGames and tbMultiConsole. Another INNER JOIN is needed between tbConsole and tbMultiConsole.


This was harder than I thought. I refered to a handout that shows how multiple INNER JOINS should be nested within each other, otherwise they wont work.


The query below shows how 1 INNER JOIN statement is nested within another using parenthesis.( ) The way in which this is nested differs from the example on the handout. I did it this way after referring to the help files in MS Access. This difference is due to differences in the way MS Access understands SQL.


If I were using visual studio to access a database then the INNER JOINS would be seperated by enclosing the first INNER JOIN in parenthesis, with the second INNER JOIN coming after the closing parenthesis.





The WHERE clause brings back the entrys with ConsoleID (1, 2, 3, 4, 5, 6, 7, 8, 9). In other words, every console. It seems a pointless statement but it isn't as I wanted to test results by removing some numbers and leaving others to check the validity of the results.


There was a problem, if a game is available on 4 consoles, then the game is listed 4 times but the same consoleName is listed 4 times also. As shown in the image below.




I realised the problem was that my query was pulling the information for the ConsoleID from tbGame instead of tbMultiConsole. In the following query I specified to pull the information from tbMultiConsole and display the related data from tbGames and tbConsoles.





This returned a better result.






This works well but I still want to have the producer name displayed as well. This involves 3 inner joins nested within each other (similar to the way HTML tags are nested) as shown with the query below.





This connects tables:
tbMultiConsole and tbconsole on ConsoleID
tbMultiConsole and tbGames on GameID
tbGames and tbProducer on ProducerID


and produces the following information: (The query had an ORDER BY GameYear statement)






If I were browsing my site, then I would probably be browsing by console.

I added a WHERE clause to the previous query to select only those games on the Sega Master System:


WHERE tbMultiConsole.ConsoleID = 3


Resulting in the following:


I'm happy with the database structure now although the GameYear field actually represents the year the game was first released, and not the year it was released on each specific console. This will not be a problem as the first release is the important version on a retro-games website. The release history can be supplied in the description field of the relevent games.

To ensure my database design is fit for purpose, I will now need to think about the website functionality to discover any other possible queries that may be needed for the site to operate correctly.

No comments: