Tuesday 17 April 2012

Boring theory time...

For those of you who are unfamiliar with the concept of relational databases I'll do my best to explain it in simple terms, as an understanding of this is key to getting the best out of the whole exercise.

Below is a representation of a simple trading spreadsheet such as you might already use:


In database terminology this is know as a 'flat field' database. All information is recorded in a single place, and, as can be seen, lots of the same data is repeated. As the sheet grows this duplication gets more and more prevalent and the sheer physical amount of data becomes harder and harder to draw meaningful conclusions from. It's unlikely to affect us sports traders, but a commercial file with millions of data entries would soon need vast amounts of storage space and would be very unwieldy.

The core concept of a relational database is that commonly repeated data is removed into a separate table on an 'enter once, use often' sort of deal. If we look at the data above we can see there are some elements which are common to all trades, and some which are unique to a particular trade. Get this right and we're well on the way!

Each trade takes place on a specific date, and whilst that date is not unique, it is definitely a fixed facet of that trade. The same is true of the stakes used, the profit / loss resulting and any notes you might make. Conversely all trades involve a particular sport, and you might employ a particular named strategy but neither of those are unique to that trade alone. By the way, 'Scatter Gun' and 'Lazy Lamb' are actual trading strategies - the 'Cassini Catch' and the 'Sultan of Swing Trade' are made up - hopefully those two gents won't think I'm taking their names in vain!

So, what we do is to extrapolate the various sports and strategies into two separate tables:

Each sport and each strategy are assigned an 'ID' number which is unique to each entity. After having done this we can draw up a representation of a particular trade in a 'Trades' table as you can see below:


You don't have to be a rocket scientist to appreciate that to a computer such a table is simplicity itself. Whilst relatively unintelligible to a human looking at it, it is simple to store and to cross refer, index and sort etc for a computer. It also uses far less disk space - instead of there being 2,000 'Scatter Gun' records stored there would be 2000 '1' s etc.

I remember when I first became interested in databases I read something similar to the above and thought 'OK - I see the power of that, but how the **** am I supposed to remember that the Lazy Lamb is StratID 2 and that the X Factor  is represented by SportID 5?'

 Fear not, as all will be revealed in due course.

If you wish to look at the relational model Google is your friend. And enemy. It's a massively involved and complicated topic to the noob, but you can read about the First Normal Form and subsequent 'Normal Forms' if you really want to. I'm sort of OK with this up to the Third Normal Form but thereafter, to be frank, it becomes gobbeldegook to me! Good luck if you try.


2 comments:

  1. Gun, i've been playing around with Access and have managed to strike up a reasonable understanding now. I am however confused by the need for the SportID and StratID column. I seem to hav ebeen able to set most things I need up without these. All will become apparant i'm sure in your next piece. I look forward to it.

    Cheers

    Fed

    ReplyDelete
  2. Fed, remember the old adage that 'a little knowledge can be a dangerous thing'! I think I know where you are coming from, and if I'm right then you're right. And wrong as well. As you say... all will become apparent :-)

    ReplyDelete