Bucket of Sparks: 2014

Friday, 28 November 2014

5 reasons why we are happy with AngularJS

I promised a general post about our experiences on sorted.jobs with AngularJS, so here's the view from ten thousand feet, rather than the 50 foot view of other posts.

1. Good for non-designers

In some ways this is more down to Bootstrap, but combine AngularJS, Bootstrap 3 and AngularUI you have a quick and easy way to produce a responsive and dynamic website.

2. Good for non programming web designers

Angular hides most of the Javascript so it makes for easy collaboration between a front end programmer and a designer. Angularjs directives mean that some of the clever bits can be presented to the designer as new HTML tags, which he can manipulate in the same way as other tags.

3. Very clear view seperation

Views live in templates, which are just HTML files, controllers live in javascript controller files and never the twain shall meet. The actual controller function gets specified in an attribute to a <div> or other HTML tag -what could be easier (don't mention $scope)?

4. Decent documentation and community

The AngularJS site is pretty good, with a good tutorial, developer guide, and reference. Stackexchange is very active, Egghead.io is a good source of video tutorials although there are many others. Manning has a couple of MEAP books; I have used AngularJS in Action, which is now nearly completed, the other is AngularJS in Depth. Packt has a whole raft of AngularJS titles, I have used Mastering Web Application Development with AngularJS, I would guess the Manning books will be more up to date than the Packt one at the moment as they aren't formally published yet.

5. Testing tool.

Angular provides Protactor for end to end testing, this uses Selenium Webdriver to run Jasmine tests via standard web browsers. It is aware of Angular directives so you can run tests on Angular loops etc. It works well and the devlopers have been helpful when I found a problem.

That'll do for now, there's loads more and AngularJS has it's quirks, but I need coffee so that's another post.

Friday, 14 November 2014

Sorting and paging with AngularJS

We have two dashboard screens that make heavy use of AngularJS on sorted.jobs, one for candidates and one for recruiters. The recruiter screen in particular can end up with an awful lot of jobs on it, so the list needs to page, in addition it would be good to be able to sort by posting date, number of applicants, company name and so on.

Recruiter Dashboard

This is where we have got up to, not too happy about the UX/UI side, but it's workable. The paging is provided by the excellent AngularUI Booststrap library. Using it is pretty simple this is the AngularJS template directive ;

<pagination total-items="active_jobs.total" ng-model="currentActivePage" ng-change="activePageChanged()"></pagination>

Pagination needs to know the size of the array it is paging over (active_jobs.total), has a variable to store the current page in (currentActivePage) and a function to call when you change the page - activePageChanged().

And here's the controller function :

$scope.activePageChanged = function(i) {
$scope.activePageStart = ($scope.currentActivePage -1) * $scope.itemsPerPage;
$scope.activePageEnd = $scope.activePageStart + 10 ;
}

As you can see all we are doing is changing the start end end points of the items we are viewing in the array; we have been a bit lazy and not passed itemsPerPage to the directive as we are using the default of ten.

The ng-repeat call looks like this :

<div class='row' ng-repeat="job in active_jobs.hits.slice(activePageStart,activePageEnd)">

I did see pages offering a range filter but array.slice() seems more direct.

In this example the whole array is passed from the backend to the front in one go (this is so it can be sorted in the browser), but you don't have to do it that way, other pages we have make a call to the back end from within the activePageChanged() function to get the next page of results.

Sorting

AngularJS provides the orderBy filter that will sort an array, the documentation is pretty good, the only real point to pick up is that to use it in our pagination example we need to call the controller version and not use the template filter version (this would only sort the array slice and not the whole array). So in the template we make a function call :

<p><a href="" ng-click="reverse=!reverse;active_order('_source.reference', reverse)">Reference<span class="glyphicon glyphicon-resize-vertical sort-col-header"></a></p>

We call the active order function with the column we want to sort by and the direction of sort. The reverse=!reverse just twiddles the sort order.

We then need to set up the controller to use orderBy by injecting $filter :

function SomeCtrl ($scope, $location, $filter...

and using a bit of syntactic suger :

var orderBy = $filter('orderBy');

it is just a matter of defining the function :

$scope.active_order = function(col, reverse) {
    $scope.active_jobs['hits'] = orderBy($scope.active_jobs['hits'], col, reverse);
};

and Robert is your father's brother.

Thursday, 6 November 2014

Data Munging with MongoDB Aggregation and Python

I am evaluating some named entity recognition systems for sorted.jobs , trying to improve our search by sorting the wheat from the chaff, and some of the initial results look encouraging -but just how encouraging? We need to do a bit of analysis to find out.

The most hopeful results come from the extraction of programming language and operating system entities from the text -see the figure below :

Entity Types

On to MongoDB

This table was generated from a MongoDb database using two collections entities and missed_entities entities contains the terms that the program found and missed_entities the ones that I though it missed. Of the ones it found it either got it right ('Hit'), wrong ('Miss') or it was a bit dubious ('Null'). To get the stats I used the new (to me) MongoDB aggregation operations, analagous to the SQL GROUP BY, HAVING, SUM &c.

You could do all this in the old MongoDB map/reduce way, but aggregation seems a bit more intuitive.

So to get the 'Hit', 'Miss' and 'Null' columns the Python code looks like :

entity_c.aggregate([{"$group": {"_id": {"etype" : "$_type", "hit_miss" : "$hit_miss"} , "count": {"$sum": 1}}}])

which returns me rows like :

{u'count': 55, u'_id': {u'etype': u'ProgrammingLanguage'ProgrammingLanguage', u'hit_miss': u'1'}}

{u'count': 2, u'_id': {u'etype': u'ProgrammingLanguage'}}

and nothing for the misses because there weren't any.

The hard work occurs in the $group where I create a compound '_id' field made up of the entity type field and entity hit_miss field and then count all the matching entities.

The Aggregation Pipeline

But we can also look at the terms that the recogniser missed :


Missed entities

Here we only want the entities for the given type ('ProgrammingLanguage') and we want them in order, our PyMongo aggregate call now becomes :

missed_c.aggregate([
    {"$match" : {"_type" : etype}},
    {"$group": {"_id": "$name", "count": {"$sum": 1}}},
    {"$sort" : {"count" : -1}}
])

We have extra terms : '$match' which filters the documents so we only consider those with the passed in type (etype) and '$sort' which orders by the count we generated in the group. MongoDB pipelines these performing the match then the group and then the sort before returning you the result.

Finally, looking at the results we can see that there are some casing issues, we can make everything the same case by adding in a '$project' :

{ "$project" : { "name":{"$toUpper":"$name"} } },

$project creates a new field (or overwrites an existing one in the pipeline, not the real one in the collection) in this instance I have told it to make all the names uppercase and we get :


Normalised entities

It doesn't matter where in the array you place any of theses terms MongoDB will sort out the ordering.

What does this tell us? Well in this case if I could persuade the tagger to recognise CSS and variants of terms it already knows with a number on the end I would get a big jump in the overall quality of the results.

References

Angular Aggregation manual pages.

Wednesday, 5 November 2014

Retro Tech -letting out the inner anorack

Or what I did at the weekend. Time for a break from sorted.jobs, angularjs, elasticsearch and the rest, time to play with the HiFi. When I was a yoof, back in the Jurassic, we would obsess over getting the best sound from our LPs, spending ridiculous amounts of time (and money) over the biggest and most accurate soundscape. My daughter on the other hand now seems happy with playing stuff using her phone speaker, and that sounds worse than a 70s trannie, oops tranny.

Decca Kelly Ribbon Tweeters

So, back to the future, clearing out my lockup I disinterred my old Mordaunt Short 700 speakers. I bought these from a junk shop in the 80s as the tweeters were shot and I thought I could fix them. Not just any tweeters though, these were Decca Kelly Ribbon Tweeters, high end hen's teeth. Fortunately there is a nice man, called Howard Dawson who makes spares for these things, and indeed his own ribbon tweeters, a new pair of ribbons were fitted and we were off to the races.

Back then I thought they sounded excellent, there was only one drawback, they aren't small you could use them as coffins; like the man said 'there ain't no substitute for cubes'. So into the lockup they went, coming out for the odd outdoor party, and I ended up with a pair of KEF 104s good speakers, discreet (for HiFi speakers) better than any boombox going. But then the lockup had to be cleared, and we have a bigger house...

Gruesome Twosome

Time to see which ones I want to keep! The obvious thing to do was an old-style A/B test on the two of them, pick a few CDs play a track on one pair of speakers, switch over, play it on the other, see who wins. So 4 CDs :

Richard Thompson 'Rumour and Sigh'

If you are in to Richard Thompson and are (or can appear to be) in the UK, you can hear him interviewed on BBC Mastertapes.

Mozart 'Concerto for Flute and Harp K 299'
Kevin Macleod 'Highland Strands'
The Beautiful South 'Choke'

The idea was to get a mix of instrumental and vocal, acoustic and electronic , over a range of styles .

Who won?

The Mordaunt Shorts obviously, I wanted them to win :) To be fair I think they really did win, the bass on the KEFs are comparable to the MSs, surprising for such a small box -but then MS paired KEF units with Decca Horns in the later version. But the treble on the 700s, driven by those tweeters, is just brilliant (although not too brilliant), you hear so much more; some fingerpicking I hadn't noticed on 'Rumour and Sigh', the girl on 'Choke' sounding like she's on fast forward, and the flute and the harp on Mozart and the acoustic instruments on Kevin's CD all sound that bit more real.

Anyone want a pair of Kef 104s? Nicer than many speakers you will find today, just not as good as something ten years older (which was probably in a much higher price bracket).

Tuesday, 28 October 2014

Early Days with Ansible for Nginx and Elastic Search on EC2

As our new recruitment site, sorted.jobs edges towards production I have to start thinking more about the infrastructure side of things. The search part of the site uses ElasticSearch which is, by design, insecure if you can find an installation you can use it. Current best practice seems to be to hide it behind a web server running https, normally Nginx, and use that to control access.

This gives me two sets of machines to configure, the Nginx proxy and the Elasticsearch server, since we're trying to be a grown up company we don't want to do all this by hand every time, so it make sense to script it. In the bad old days we used to do this with the Unix shell, see I said Unix not Linux -that's how old those bad old days were! Now we don't need to do that we can have centralised deployments using a variety of tools such as Chef, Puppet and johnny-come-lately Ansible.

So why choose Ansible? I have briefly played with Chef, and looked at Puppet for another company, and I seem to remember them being fairly complicated. I did a web search to compare the two and Ansible popped up as well in several cases.

Ansible had a few things going for it ;configuration files are in standard YAML, no client to install, Jinja2 templates -which we are already using- and the words 'easy', 'simple' and 'uncomplicated' came up a lot. So I decided to give it a whirl.

Getting it going

Ansible uses a hosts file (held in /etc/ansible/hosts) to define the servers it wants to talk to. As well as defining hosts you can group them for use in playbooks.

We are running sorted.jobs ElasticSearch on EC2 so the definitions look like :

54.123.123.123 ansible_ssh_user=ansible_user ansible_ssh_private_key_file=ansible_key.pem

you can use either IP addresses or domain names to set up the server.

Running a simple command like `ansible all -m ping`or `ansible all -a "/bin/echo hello"` will let you test out the definitions.

Once you have the definitions sorted out it is time to get Ansible to actually do something useful, you do this with playbooks. A playbook is basically just a script to tell Ansible what to do when. You run them with ansible-playbook (e.g. ansible-playbook -v elasticsearch.yml). This caught me out initially as I was looking for an option to pass the playbook to the 'ansible' command.

First Playbook Nginx

This playbook installs Nginx uploads the certificates configuration and password files for https

---
- hosts: es_proxys
sudo: yes

tasks:
    - name: Installs nginx web server
      apt: pkg=nginx state=installed update_cache=true
      notify:
        - start nginx

    - name: Upload default ngix certs and conf
      copy: src=./es_proxys/conf.tar dest=/tmp

    - name: Untar
      command: tar xf conf.tar
      register: untarred
      ignore_errors: True

    - name: move to nginx etc
      command: mv conf /etc/nginx

    - name: move to nginx etc
      command: `mv .htpasswd /etc/nginx
      register: https_conf

    - name: Upload proxy vhost
      copy: src=es_proxys/es_proxy dest=/etc/nginx/sites-enabled
      when: https_conf|success
      notify:
        - restart nginx


handlers:
    - name: start nginx
      service: name=nginx state=started

    - name: restart nginx
      action: service name=nginx state=restarted

From the top , the names of the tasks should tell you what each one is trying to do :

hosts refer to the hosts -or host groups in the Ansible hosts file we talked about above.
sudo -run this as root.
tasks simply the list of things to do
apt the ansible module for the Ubuntu packaging system
notify call a handler
handlers commands that can be run on demand from tasks, typically used to do things like bouncing servers.
register the result of a command into a variable
when conditionally run a task based on the value of a variable. In the example above the `mv .htpasswd /etc/nginx` command must have succeeded (and, by implication, the earlier tasks) for the proxy upload to be run.

Basic ElasticSearch

This playybook installs Elasticsearch and sets it up with some extra Elasticsearch plugins and a backup configuration.

As well as the things we saw in the proxy Playbook there are some new features :

get_url does what it says on the tin, as you can see it also checks file checksums
changed_when tells Ansible when something has happened, in this case it's used because dpkg will succeed whether or not it installs anything
shell runs a Linux shhell command in the raw, command samitizes it.
cron sets up a cron job.

Note in one case I had to use a raw command (curl in the backup config) as I couldn't get the builtin (get_url) to work for me (horrendous quoting issues.

---
- hosts: es_servers
sudo: yes

tasks:
- name: Installs java JRE
    apt: pkg=openjdk-7-jre-headless state=installed update_cache=true
    register: jre

- name: Download ES
    get_url: url=https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.4.deb dest=/tmp/es.deb sha256sum=6a15ab0f8c13574162e98828d7ec0e1155e6136f9d45c54b88e39222bbbd53ca
    register: es_dl

- name: Install ES
    command: dpkg --skip-same-version -i /tmp/es.deb
    register: dpkg_result
    changed_when: "dpkg_result.stdout.startswith('Selecting')"
    when: jre|success and es_dl|success
    notify:
      - start es

- name: Remove ES Attachment plugin
    shell: /usr/share/elasticsearch/bin/plugin -r elasticsearch-mapper-attachments || /bin/true
    register: es_plug_result
    changed_when: "'Removed' in es_plug_result.stdout"
    when: dpkg_result|success

- name: Install ES Attachment plugin
    command: /usr/share/elasticsearch/bin/plugin -i elasticsearch/elasticsearch-mapper-attachments/2.3.0
    register: es_plug_result
    changed_when: "'Installed' in es_plug_result.stdout"
    when:
    notify:
      - restart es

- name: Remove ES S3 plugin
    shell: /usr/share/elasticsearch/bin/plugin -r elasticsearch/elasticsearch-cloud-aws || /bin/true
    register: es_plug_result
    changed_when: "'Removed' in es_plug_result.stdout"
    when: dpkg_result|success

- name: Install ES S3 plugin
    command: /usr/share/elasticsearch/bin/plugin -i elasticsearch/elasticsearch-cloud-aws/2.3.0
    register: es_plug_result
    changed_when: "'Installed' in es_plug_result.stdout"
    when:
    notify:
      - restart es

- name: Upload s3 config
    copy: src=./s3_config.json dest=/home/ubuntu

- name: Configure backup for s3
    command: curl -XPUT 'http://localhost:9200/_snapshot/s3_live' -d @/home/ubuntu/s3_config.json
    register: s3_result
    changed_when: "'acknowledged' in s3_result.stdout"

- name: Remove s3 config
    command: rm /home/ubuntu/s3_config.json

- name: S3 cron
    cron: name=s3_bup hour=1 minute=50 job='curl -XPUT "http://localhost:9200/_snapshot/s3_live/snapshot_$(date +\%Y\%m\%d)"'


handlers:
- name: start es
    service: name=elasticsearch state=started

- name: restart es
    service: name=elasticsearch state=restarted

The Book

If you want a book there's Ansible Configuration Management I did buy this, but I think you will do just as well with the Ansible documentation.

Friday, 24 October 2014

AngularJS ng-if and ng-show

We use AngularJS to do a lot of the front end on sorted.jobs, at some point I may do some posts on the pros and cons of AngularJS, but for now it's enough to say that -in conjunction with Bootstrap it makes producing a front end pretty easy for a back end developer like me.

Anyhoo one of the joys of Angular is how easy it is to produce a display that varies depending on the data provided to it (even dynamically -although that's not the subject of this post). When we started sorted.jobs the way to do this was via ng-show and ng-hide, which basically twiddles the CSS display property for the element to show or hide it. This works very well, but the element would always be fetched from the server, it just wouldn't appear on the screen.

Now we have ng-if, this will remove the element from the DOM altogether if the expression evaluates to false, and so, assuming it is some some of resource it won't fetch it from the server, thus speeding up the page.

Quick Example

ng-show :
<img ng-show="job.user_id && !job.system.logo_url" ng-src="/logo/{{job.user_id}}/{{job_id}}" class="img-responsive">

ng-if :
<img ng-if="job.user_id && !job.system.logo_url" ng-src="/logo/{{job.user_id}}/{{job_id}}" class="img-responsive">

The ng-show in this case has a particularly bad effect in that it will load, although hide, an image that isn't there, i.e. one with no user_id!

References :

ng-if in Angular Docs
Stackexchange -you may want to read this if unexpected things start happening with ng-if, like some other Angular commands it has a habit of creating child scopes.

Thursday, 2 October 2014

AngularJS, Google Search and SEO

Our new site sorted.jobs is now in a stealthy, pre-launch mode, so it is time to start thinking about getting it into the various search engines -especially the big one.

AJAX

Our Job Posts are normally served as AngularJS views -which Google can't parse, since they are AJAX based (although interestingly it can render them with Google Fetch) . However, Google can be persuaded to fetch another version of the page and index that by including this meta tag in the head :

<meta name="fragment" content="!">

If the crawler sees this tag it will then resend the original request with ?_escaped_fragment_= tacked on the end (more here : https://prerender.io/js-seo/angularjs-seo-get-your-site-indexed-and-to-the-top-of-the-search-results/)

so
http://www.sorted.jobs/job_post/UX+Developer/ULKYpZTSSRqBGvGirp_yOQ
would become
http://www.sorted.jobs/job_post/UX+Developer/ULKYpZTSSRqBGvGirp_yOQ?_escaped_fragment_=

on the server side in the handler I recognise the second form and render a page that is HTML only.

SEO

Since the user doesn't see this page it doesn't need the full functionality of the original and we also have the chance to tweak the page to do things like providing a more meaningful <title> and a <meta name=description> tag which Google can do things with as per this snippet post

Testing and Webmaster Tools

One minor 'gotcha' that we found was with Webmaster tools, in as much as the stats aren't up to date -according to tools we have 0 pages indexed wheres a site specific google search (site:sorted.jobs) shows us the ten pages we expect.

A second issue is that the 'Fetch' functionality within Webmaster Tools doesn't fire off the second request automatically -so you can't see the page Google would actually index from sorted.jobs on the Fetch results page, just the originally requested AngularJS page.

On the plus side if you 'Fetch' the page you can check that the <meta name="fragment" content="!"> is in the content, and if it is you can then 'Submit to Index' which will kick off Google's crawler on your page and put it into their index within minutes.

Bucket of Sparks

Pages