Bucket of Sparks: October 2011

Saturday 22 October 2011

Machine Learning -Week 2 Multi Varient Linear Regression

Intoducing Octave

Fitted line graph -the blue lines is derived from the red crosses

If you got past the title you are doing well! More line fitting, but we're now into multi-dimensional space captain! Put simply this means that you have more parameters (different types) of information to match. So in the house price eample used in the course; the simple version matches area(sq ft) to the price of the house, the multivarient version might use area, number of rooms, and age to get you a more accurate fit. Fortunately, it works as you'd hope, in that you just add up the effects of the different terms, or in programming terms you just have two loops :

for i = 1 to number_of_houses
for j = number_of things_i_know_about_a_house
work_it_out()
end
end

Cost function plot from Octave

Which leads nicely to Octave a programming language designed specifically for doing maths, it combines with gnuplot to draw pretty pictures of your data. The sort of data we are talking about now comes in tables -whether printed, or as databases tables or spreadsheets doesn't really matter, as a programmer you would normally bring this stuff in as arrays of whatever dimension and do the above. Octave, however understands matrices and tables intuitively and can manipulate them directly, the above might become :

number_of_houses * work_it_out(number_of things_i_know_about_a_house)

A lot simpler and, because Octave is built for this sort of job, much quicker.

Onwards and Upwards. The top graph represents the point of all this, if you know the area of your house you can estimate it's cost. The advantage of getting a computer to do this is that you can have enormous training sets -all house prices in the country, and do lots of subset plots -all semis, all semis in Suffolk etc.

Friday 21 October 2011

AI - week 2 Bayes Networks, probably the best networks in the world

What are the chances of that?

After an easy introduction, via tree searching, last week, we're into the the thick of it with Bayes Networks and stochastic reasoning.


The good reverend Bayes

Bayes networks deal with uncertainty, they can answer questions such as -given that my cancer test was positive -what are the chances that I have cancer? I'm not going to go into the details here, there are several resources on the web -but the answer is of the form

P(C|T) = ( P(T|C) . P(C) ) / P(T)

and likely to be lower than you think. P(C|T) is the chances of you having cancer given the test, P(T|C) is the chance of the test being positive if you have cancer -which I guess would come from the testing of the test, P(C) is the chance of you having cancer in general -which would come from actuarial tables or the like and P(T) is the chance of the test being positive whether o not you have cancer.

The course goes into more depth than this, showing how to reason from one test result to another, say, and how to chain probabilities.

All good stuff -but my brain aches.

Monday 10 October 2011

A.I. -Week 1, the Prologue

In which the first video goes live and we meet Pros. Thrun and Norvig.

They had a bit of a struggle but Stanford have got the introductory lectures up on their site (via YouTube). The profs. seem affable and there's nothing too scary in the first video series -mainly just definition of terms and the illustration of some problems- but the reading list promises a hard climb.

Stanford seem to be encouraging these courses to be social -and for the videos to be viewed communally, turns out that there's a group in London so I'll trot along and take a look.

With apologies to Lurcio

Friday 7 October 2011

Machine Learning - Intro and Gradient Descent

Week 1 - In which we meet Prof. Ng and experience the delights of cost functions and Linear Regression with One Variable.

This is my first week on the Stanford University machine learning course and so far so good -as the man said when he jumped off Nelson's column. Prof. Ng seems to be a pretty decent lecturer who doesn't over estimate the ability of his student and there are plenty of examples and explanations to drive the points home. The lectures come as videos with a few embedded questions to keep you awake, and they are, thankfully, split into bite sized chunks of 10 -15 minutes.

In the intro we get introduced to terminology -the difference between supervised and unsupervised learning. Supervised learning requires a training set, the main example given is of house price history vs. house size; unsupervised is just trying to make inferences from a bunch of data -say clustering news stories. We are also given the difference between regression (line fitting, trending) problems and classification ones (sorting into buckets, true/false &c.).

Thence we arrive at Linear Regression with One Variable -which is line fitting on a 2D graph basically. The maths arrives, but not too brutally -calculus shows its head, but fortunately you don't have to understand the whole of calculus to use the little bit we want, so all good.

As well as the videos there are also online review questions -which I believe contribute to the final score. The good news is that you not only can, but are actively encouraged to, retake them until you get a perfect score. The real benefit of this is that they become a learning tool rather than just a test, there are answers given as well as just scores and I think that they have helped me understand what's going on rather better than I otherwise would.

I'm now most of the way through the optional linear algebra review -but I won't post on that.

Free university courses -computing at Stanford

This is a bit of a find, free Artificial Intelligence. and Machine Learning courses at Stanford University in the US. These two courses are examined (optionally, in the Advanced Track) and you can get a certificate, but there are other lecture series on more 'normal' programming topics too.

I think the 'Seven Languages in Seven Weeks' is going to take a bit of a knock as I have signed up for both AI and ML courses, neither have officially started yet but the first weeks' lectures are available for for machine learning.

I'll post on how my attempts at this go -suffice to say week one reveals the course to be fairly academic; and a friend is struggling with the mathematical approach, more because he finds it off-putting rather than difficult. I'm no Turing but so far so good -and there's always Khan Academy to fill in the gaps.

Bucket of Sparks

Pages