Thursday, 30 October 2008

Assignment - Planning the Database

The Assignment


The assignment is to create a database driven website with the subject of my choice. I decided to create a site that stores information on games for old gaming consoles.

Planning the Database

The first step is to plan the database design around the information it needs to contain. For my site on retro games, at least the following information for each game will be needed:

Game Name
Game Description
Game year
Console Name
Console Details
Console year
Developer Name
Developer Details


Initial Data Structure.

There will be separate tables for Developer and Console details as these can apply to several games. This ensures that there will be no need to have duplicate data.

Each table has a primary field (ending in ID) that each entry in the table can be referenced by.









To test this structure, I create the database in Microsoft Access. The Image below shows the data type for each field.



All of the ID fields are set to Autonumber (A new number is generated for each entry in the table) and are the primary fields.


The Name and Description/Details fields are set to 'Memo'. This can be a mixture of alpha numeric characters. Memo is similar to 'varchar' in other databases.


Fields requiring numbers are set to 'number'. Number can be a small integer (game year: always a 4 digit number), or a large integer (ID fields: Number needs to go as large as the amount of entrys).

Note: I have renamed 'developer' to 'producer' for those particular fields. Also I have added a field for the url of any emulators in tbConsole

I setup the table relationships as follows:




The relationships shown above are called 'one to many' relationships. In my data structure above, the table tbProducer can carry one entry with the producerID of '2' where tbGames can carry many entries with the ProducerID of '2'. The adjoining line signifies this.

I then populated the tables with some sample data as shown below:







Test Queries


Now I have the database populated with some data, I run some test queries. Databases understand a language known as SQL (or variations of), with which we query the database. My first test query is as follows:


SELECT * FROM tbGames


This instructs to SELECT all fields(*) FROM tbGames. The result is shown below.






To test further, I alter the last query with a new line to return only those games from a specific console:


SELECT * FROM tbGamesWHERE ConsoleID = 2


This should return those games available on the console with consoleID=2 (Nintendo NES) which it does.


I repeated similar queries on tbProducer and on tbConsole with correct results.


Inner Joins


If I were to search my website for Games, the ID number for the console is useless without the information it relates to. Therefore I create another query to list the game name, and the console name that the game plays on. For this I use an 'Inner Join'. Inner joins are used for finding common values from 2 or more tables.

The following image shows my SQL query. The INNER JOIN statement tells the database that the ConsoleID field is the same for tbGames as for tbConsole. The last line speaks for itself:

ORDER (the results) BY GameName








The results from this have produced the following correct information:






Although the query has returned the correct information, there is a problem here. At least 1 game is available on more than 1 of the available consoles (Flashback, for example, appeared on at least 3 of the consoles here). The problem is that only 1 number can be stored in the Console ID field for the table tbGames.


One solution to this is to add a new entry in tbGames for each console that the game is available on. This would involve duplicated data, so a better solution is to create a new table that stores only 'GameID' and 'ConsoleID'. The new table needs a primary key so I name this 'RecordID'.


I setup this new table and alter the relationships as follows:






With the structure above, numerous console ID's can be assigned to each gameID by simply inserting numbers into the new table. I inserted some data into the new table that applies to the games on multiple consoles.






Nested INNER JOINs


I needed to retrieve a list of games that showed every console that a game appeared on. To do this I needed to create an INNER JOINs to connect the Console ID field between tbGames and tbMultiConsole. Another INNER JOIN is needed between tbConsole and tbMultiConsole.


This was harder than I thought. I refered to a handout that shows how multiple INNER JOINS should be nested within each other, otherwise they wont work.


The query below shows how 1 INNER JOIN statement is nested within another using parenthesis.( ) The way in which this is nested differs from the example on the handout. I did it this way after referring to the help files in MS Access. This difference is due to differences in the way MS Access understands SQL.


If I were using visual studio to access a database then the INNER JOINS would be seperated by enclosing the first INNER JOIN in parenthesis, with the second INNER JOIN coming after the closing parenthesis.





The WHERE clause brings back the entrys with ConsoleID (1, 2, 3, 4, 5, 6, 7, 8, 9). In other words, every console. It seems a pointless statement but it isn't as I wanted to test results by removing some numbers and leaving others to check the validity of the results.


There was a problem, if a game is available on 4 consoles, then the game is listed 4 times but the same consoleName is listed 4 times also. As shown in the image below.




I realised the problem was that my query was pulling the information for the ConsoleID from tbGame instead of tbMultiConsole. In the following query I specified to pull the information from tbMultiConsole and display the related data from tbGames and tbConsoles.





This returned a better result.






This works well but I still want to have the producer name displayed as well. This involves 3 inner joins nested within each other (similar to the way HTML tags are nested) as shown with the query below.





This connects tables:
tbMultiConsole and tbconsole on ConsoleID
tbMultiConsole and tbGames on GameID
tbGames and tbProducer on ProducerID


and produces the following information: (The query had an ORDER BY GameYear statement)






If I were browsing my site, then I would probably be browsing by console.

I added a WHERE clause to the previous query to select only those games on the Sega Master System:


WHERE tbMultiConsole.ConsoleID = 3


Resulting in the following:


I'm happy with the database structure now although the GameYear field actually represents the year the game was first released, and not the year it was released on each specific console. This will not be a problem as the first release is the important version on a retro-games website. The release history can be supplied in the description field of the relevent games.

To ensure my database design is fit for purpose, I will now need to think about the website functionality to discover any other possible queries that may be needed for the site to operate correctly.

Friday, 3 October 2008

Data Tiers and Information Architecture

What are Data tiers?

data
[dey-tuh, dat-uh, dah-tuh] noun
1.
a pl. of
datum.
2.
(used with a plural verb) individual facts, statistics, or items of information: These data represent the results of our analyses. Data are entered by terminal for immediate processing by the computer.
3.
(used with a singular verb) a body of facts; information: Additional data is available from the president of the firm.

(Dictionary.com Unabridged (v 1.1). Retrieved October 03, 2008, from Dictionary.com website:
http://dictionary.reference.com/browse/data)


tier [teer] noun
1.
one of a series of rows or ranks rising one behind or above another
2.
one of a number of galleries, as in a theater.
3.
a layer; level; stratum: The wedding cake had six tiers. All three tiers of the firm's management now report to one director.

(Dictionary.com Unabridged (v 1.1). Retrieved October 03, 2008, from Dictionary.com website:
http://dictionary.reference.com/browse/tier)


After checking the dictionary definitions, I would simplify the meanings as follows:

Data: Information
Tier: A level within a structure

Therefore Data tiers are:

A) Levels of information within a structure.
B) An informational structure itself.

In yesterdays class, the tutor talked about data tiers and gave examples of ways in which information can be structured (with a database driven ecommerce site in mind). He then demonstrated that running simple 'data tier' tests on paper can identify faults or elements that can be more 'normalised' (reducing the amount of tables fields in a table, to simplify the structure as much as possible.)

Exercise

The tutor set us this exercise:

--------------------------------------------

Data Tier - Planning exercise
'Homespare' are a small retail chain of 10 locally based stores selling home appliance spares. They have a centrally located warehouse that houses their catalogue of 2000 products, so at least 85% of their goods are availableas an in-stock item.

The owner of the business has decided he wants to put his catalogue online and create a web based online shop.

The online shop is to feature:

The full product catalogue
A category based navigation
A shopping cart system
Order tracking
Product reviews / info
Product images
A search based on manufacturer or product type


Tasks
1) Design the data tier for this system. Be sure to normalise the database as much as possible and describe the table relationships you will use.

2) Run some simple data tier tests - and identify if there are any points in which your data tier demonstrates either a fault or an element that could be more normalised.

-----------------------------------------------------

My response to this task is as in the following image. The fields shown in black, within the 5 tables at the top, are the fields i had before running any tests.

I ran 3 queries on paper:

#1-inserting some categories
#2-insert some products
#3-attempt to purchase using the 'cart' and 'checkout' tables.
As I ran the queries, the need for more fields arose(shown in red)




The following image shows how the tables would be populated after the previous tests. I have used coloured arrows to show the relationship between the tables and to show where data is input into the system.

My diagram of the table contents shows that an 'ID' field for each table is essential for creating data tiers as each table is referenced by a number. The further you move through this system the fields eventually contain numbers only. This is good because databases handle numbers faster.