node.js knockout
Countdown to KO #23: Login with Password, Facebook, Twitter, and more with everyauth

This is the 23rd in series of posts leading up to Node.js Knockout about how to use everyauth to manage logins. This post was written by everyauth author and Node.js Knockout contestant Brian Noguchi.

Introduction

So you want to add logins to your web app? Assuming that you are using Connect or Express (and who isn’t these days?), then everyauth can get you up and running within minutes.

4 Steps to Get Up and Running

Setting up everyauth comes down to 3 steps with Connect and 4 steps with Express:

  1. Step 1 - Choose and configure what one or more logins you want

    Currently, we support 19 different login types including password, Facebook, Twitter, GitHub, and more. For a full list, see the everyauth github page.

  2. Step 2 - Specify a function for finding a user by id

    The function configuration here will depend on how you are storing your data — i.e., in memory or via a database. For in memory storage of users, this would look like:

    
       var usersById = {};
    
       everyauth.everymodule
         .findUserById( function (id, callback) {
           callback(null, usersById[id]);
         });
       
  3. Step 3 - Add your middleware to Connect/Express

    This automatically will set up routes and views for your app. For example, if you chose to set up password authentication, then you can now navigate to http://localhost:3000/login, http://localhost:3000/register, and logout with http://localhost:3000/logout.

    In connect, this looks like:

    
       var everyauth = require('everyauth');
       // Step 1 code goes here
    
       // Step 2 code
       var connect = require('connect');
       var app = connect(
           connect.favicon()
         , connect.bodyParser()
         , connect.cookieParser()
         , connect.session({secret: 'mr ripley'})
         , everyauth.middleware()
         , connect.router(routes)
       );
       

    In express, this looks like:

    
       var everyauth = require('everyauth');
       // Step 1 code goes here
    
       // Step 2 code
       var express = require('express');
       var app = express.createServer(
           express.favicon()
         , express.bodyParser()
         , express.cookieParser()
         , express.session({secret: 'mr ripley'})
         , everyauth.middleware()
         , express.router(routes)
       );
       
  4. Step 4 (Express only) - Add view helpers to Express

    
       // Step 1 code
       // ...
       // Step 2 code
       // ...
    
       // Step 3 code
       everyauth.helpExpress(app);
    
       app.listen(3000);
       

Configuring Facebook Connect

This is how you would configure configure Step 1 from above to set up Facebook Connect.

First some boilerplate for creating and storing users in memory:

var nextUserId = 0;
var usersById = {};

function addUser (source, sourceUser) {
  var user;
  if (arguments.length === 1) { // password-based
    user = sourceUser = source;
    user.id = ++nextUserId;
    return usersById[nextUserId] = user;
  } else { // non-password-based
    user = usersById[++nextUserId] = {id: nextUserId};
    user[source] = sourceUser;
  }
  return user;
}

Now for the configuration (Step 1) that sets up Facebook Connect.

var usersByFbId = {};

everyauth
  .facebook
    .appId(YOUR_APP_ID)
    .appSecret(YOUR_APP_SECRET)
    .findOrCreateUser( function (session, accessToken, accessTokenExtra, fbUserMetadata) {
      return usersByFbId[fbUserMetadata.id] ||
        (usersByFbId[fbUserMetadata.id] = addUser('facebook', fbUserMetadata));
    })
    .redirectPath('/');

Get YOUR_APP_ID and YOUR_APP_SECRET by registering a Facebook app.

findOrCreateUser takes an incoming session object and the data returned from Facebook’s OAuth2 process as accessToken and accessTokenExtra, and fbUserMetadata. This function should find or create a user object and then return it.

redirectPath tells you where to redirect your user after a successful Facebook Connect login.

With this code, you can now include links to /auth/facebook in your views.

For detailed instructions to set up any of the 19 login strategies, please see the README

Other Resources

Countdown to KO #22: PostageApp

This is the 22nd in series of posts leading up Node.js Knockout, and covers using PostageApp to send email in your node app.

Given the time crunch for Node.js Knockout, there’s barely enough time for anything. Getting your app configured to send email is one of those things that can prove to be far more time-consuming than you expect, especially if you’re not prepared.

Here’s a quick-start guide for getting your Node application set up and ready to roll in just a few minutes.

1. Install the Module

With Node, there are two really easy ways to bring a module into your app: using the Node Package Manager (NPM) or a manual download.

Using the Node Package Manager (NPM)

Using the Node Package Manager is by far the easiest and quickest way to get a module installed into your Node app. All you have to do is run this command in the root of your application:

npm install postageapp

Manual Install

Manually installing is just slightly more tedious than installing via NPM, but it’s just as simple!

  • Download the module files from our GitHub account
  • Unzip the file you just downloaded, and copy the contents over to your app’s node_module folder
  • Rename the folder from postageapp-nodejs to postageapp

2. Get a PostageApp API Key

Log in to your PostageApp account and make sure you create at least one project. Once you have a project associated with your account, you should be able to see an API key specific to that project. Once you have the API key, you can include the PostageApp plugin into your Node.JS app by using the following code:

var postageapp = require('postageapp')('YOUR API KEY GOES HERE');

3. Creating a Parent Template

It’s not hard to send great looking email messages if you have the right tools. Email clients are notoriously particular about what kind of HTML they accept, and even support for CSS is extremely limited. One big feature of PostageApp is that you’re able to create a nice HTML template, add in a CSS file, and the two will be combined in an email-friendly markup format that you can preview before sending to ensure it’s working properly.

Every new project comes with a sample layout you can customize with your own logo, CSS theme, and of course content. A parent template can be used to establish common headers and footers without having to cut and paste these to every type of message you’ll be making. These are available under the Message Templates tab of any project page.

To explain how this works, let’s create a very simple parent template by going to the Message Templates tab and clicking on the Create a New Template link just above the list of templates. You’ll get an empty editor screen you can use to create it.

Add a simple layout that looks something like this in the HTML tab edit area:

<h2>Awesome Web App</h2>

<hr>

<div>
    {{ * }}
</div>

<hr>

<a href="http://yoururl.com">Awesome Web App Page</a>

The mysterious double-curly symbol-with-a-star-in-it {{ * }} in the middle is the location where the template content will go.

You can preview the template at any time and see how it should look in a regular email client. Warnings about your HTML and CSS are reported here, so if you’re making use of exotic, cutting-edge features like background images that some ornery email clients like Outlook don’t support, you’ll get a heads up here. You can always use the Send test email button located just below the editor to see how the email looks in your own client, or through an email previewing service if you use one.

Without some CSS this is going to look really plain. As you design your app, it’s easy to snip key styles and paste them into the CSS tab of the template editor.

Before you can save this template, you have to give it a template slug. For layouts, this is really just a descriptive name you can use to remember what layout it is. In this case call it something like default_layout so it’s easily identified later.

The Subject and From fields generally only apply to child templates themselves, not parent templates.

Save your template and you should be ready for the next step.

4. Create a Child Template

Having a parent template is great, but without something to go into it, you won’t get much use out of it. A message template can be created as you usually would, within your Node application, but it’s usually far easier to have the templates within PostageApp so you can edit them without having to redeploy your application. Think of this as CMS for your email messages where you can make changes at any time and see the results immediately.

A typical application sends out dozens of different messages to its users. When you sign up, when you confirm your registration, when you forget your password, when you haven’t been active in a while, when you invite someone, when you receive a message from someone, or even for general announcements or special offers. It can be difficult to maintain these if you have to check in and deploy your application to make even the smallest change.

A good example is an invitation email sent by one user to someone else. Create a New Message Template again. This time we’ll use the parent template created in the last step to give an otherwise boring email some style.

Here’s a sample invitation that can be pasted into the HTML tab edit area:

<p>You've been invited to join {{ app_name }}!</p>

<p><a href="{{ signup_link }}">Sign up</a> now and receive five free invites you can share with your friends!</p>

There are two variables here you can customize with user data when sending the message, {{ app_name }} and {{ signup_link }}. Through the API you can set some of these the same for everyone, or customize each field individually for each recipient.

Set the parent layout to be the parent template created in the earlier step.

You can set the default From address here, or assign it later when making the API call. The same goes for the Subject. You can also use template variables in the subject to personalize it. In this case, set the subject to:

{{app_name}} - Invitation from {{user_name}}

If you preview the message now, you should see the template wrapped neatly inside the layout.

Set the Template Slug to be invitation and save the message.

You’re now ready to set up something to trigger this message.

5. Sending an Email with Node

To send emails through PostageApp using the Node plugin, you have to create a hash with all of the arguments that you need, and then make the API call using the payload which we assembled. Here’s an example of what assembling a payload looks like:

var options = {
    recipients: 'email@address.com',

    subject: 'Subject Line',
    from: 'sender@example.org',

    content: {
        'text/html': '<strong>Sample bold content.</strong>',
        'text/plain': 'Plain text goes here'
    }

    template: 'sample_template_slug',

    variables: {
        'global_variable_1': 'First Name',
        'global_variable_2': 'Username'
    }
};

For a better idea of how to use the arguments, take a look at the Node.JS plugin’s GitHub page for further examples and elaboration.

Once you have your arguments set up, all you have to do is make an API call.

postageapp.sendMessage(options);

Recap!

From here you can go and customize this as required, add other notifications, and create new templates.

Hopefully this saves you a bunch of time so you can make an even better application this weekend.

More detailed documentation is available on our knowledge base.

Good luck with Node Knockout!

Coundown to KO #21: Using Spreecast during Node Knockout

This is the 21st in series of posts leading up Node.js Knockout, and covers using Spreecast during to collaborate with your team and share with other participants around the world.

What is Spreecast?

Spreecast lets groups share video experiences in real time. With Spreecast, you can broadcast your live video to the world, pulling in viewers to share the experience when it suits you, and dismissing them when they’re done. It’s a fun format for sharing at event locations and an awesome new way for Node.js Knockout teams to coordinate their coding.

For Node.js Knockout you will be able to sign up and get exclusive early, pre-beta access to Spreecast and unlimited live video during the competition. Spreecast is pre-beta and working hard to squash bugs and tune user experience. We’d love to hear feedback on any bugs or issues that you encounter.

Getting Started

Here is the URL for node knockout: URL: beta.spreecast.com. Credentials for the basic auth prompt are on nodeknockout.com/services.

Once you are on the site, login and create an account. To create a Spreecast click the green Spreecast Now button at the top of the page.

Spreecast home

When you are done with a Spreecast, click on the red “On Air” button. This ends the spreecast and brings you to a page where you can view the archive.

Sharing and Tagging

Be sure to tag your Spreecast with “nodeknockout”, so the Spreecast will show up in beta.spreecast.com/tags/nodeknockout. When you create a new Spreecast, click on “Add Details” to add tags before starting.

Spreecast tags

Feedback

There’s a feedback link at the bottom of the page. We’d love to hear what you think.

Have fun Spreecasting!

(protip) Add the Vote KO Badge to your App

Here’s a quick tip: you should link to your team’s page to get as many votes as possible.

Alternatively, if you want to let people vote from your app directly, you can use our “Vote KO” widget:

Vote KO widgets

Here’s how to use it:

<iframe src="http://nodeknockout.com/iframe/YOUR_TEAM_SLUG" frameborder=0 scrolling=no allowtransparency=true width=115 height=25>
</iframe>

You can find your slug from your team’s page. The slug is the last segment of the url (e.g. http://nodeknockout.com/teams/fortnight-labs)

Once you’ve deployed your widget, link to your app in the comments so other people can see how you integrated it.

Coundown to KO #20: no.de Getting Started Guide

This is the 20th in series of posts leading up to Node.js Knockout about how to use Joyent’s no.de service. This post was written by no.de architect and Node.js Knockout judge Isaac Schlueter.

These instructions will tell you how to deploy your code on Joyent’s no.de service.

Create an Account

Go to no.de and click “Sign up”.

Then fill in the stuff. You’ve done this before.

Now you’re logged in. If you’re not logged in now, email support.

Add an SSH Key

You need to add an SSH public key to your account to provision Node SmartMachines.

If you’re on a Windows computer, then use the puttygen.exe program which comes along with PuTTY. The key you want is the one marked Public key for pasting into OpenSSH authorized_keys file.

If you’re on any other kind of computer, then your SSH keys are probably in ~/.ssh/*.pub. If you don’t have one, then you can create it by using the ssh-keygen program.

Paste the key into the big box. You can also add a name for the key, if you like labels.

Save it. Now you’ve got a key.

Order a Machine

Click the button on the right that says “Order a Machine”.

Give it a name.

Click “Provision”.

Follow Instructions

On the machine details page, there are a bunch of instructions.

Follow them.

It won’t work unless you follow the instructions.

If you forget, and need to follow them later, that’s fine. They’ll still be there.

It involves pasting some stuff into your .ssh/config file. You can achieve a similar effect on Windows by using this method, or using git and ssh from Cygwin.

Bask in the Cool Glow of the Logo

On the machine details page is a hyperlink to your new zone. Click it.

Enjoy the logo.

When you’re done enjoying the logo, click the logo to return to the machine details page.

Repeat until bored.

Push Some Code

Use the power of the instructions! Push code to your machine! Be a winner!

Some tips:

  • If you have npm dependencies you can add them to a package.json file in the root of your repository.
  • The default start command is node server.js. If you want to have it start up some other way, then you can put something like this in your package.json file: "scripts": { "start" : "my-custom-command" }
  • If you have a dependency that takes a long time to install, you can make deploys faster by ssh-ing into your zone, and npm install <some-dependency> -g. The deploy script will reuse globally installed dependencies if they’re suitable.

If you run into trouble, email support.

Countdown to KO #19: A primer for GridFS using the Mongo DB driver

This is the 19th in a series of posts leading up to Node.js Knockout on using mongodb with node-mongodb-native This post was written by Node Knockout judge and node-mongo-db-native author Christian Kvalheim.

In the first tutorial we targeted general usage of the database. But Mongo DB is much more than this. One of the additional very useful features is to act as a file storage system. This is accomplish in Mongo by having a file collection and a chunks collection where each document in the chunks collection makes up a Block of the file. In this tutorial we will look at how to use the GridFS functionality and what functions are available.

A simple example

Let’s dive straight into a simple example on how to write a file to the grid using the simplified Grid class.

var mongo = require('mongodb'),
  Server = mongo.Server,
  Db = mongo.Db,
  Grid = mongo.Grid;

var server = new Server('localhost', 27017, {auto_reconnect: true});
var db = new Db('exampleDb', server);

db.open(function(err, db) {
  if(!err) {
    var grid = new Grid(db, 'fs');
    var buffer = new Buffer("Hello world");
    grid.put.(buffer, {metadata:{category:'text'}, content_type: 'text'}, function(err, fileInfo) {
      if(!err) {
        console.log("Finished writing file to Mongo");
      }
    });
  }
});

All right let’s dissect the example. The first thing you’ll notice is the statement

var grid = new Grid(db, 'fs');

Since GridFS is actually a special structure stored as collections you’ll notice that we are using the db connection that we used in the previous tutorial to operate on collections and documents. The second parameter ‘fs’ allows you to change the collections you want to store the data in. In this example the collections would be fs_files and fs_chunks.

Having a live grid instance we now go ahead and create some test data stored in a Buffer instance, although you can pass in a string instead. We then write our data to disk.

var buffer = new Buffer("Hello world");
grid.put.(buffer, {metadata:{category:'text'}, content_type: 'text'}, function(err, fileInfo) {
  if(!err) {
    console.log("Finished writing file to Mongo");
  }
});

Let’s deconstruct the call we just made. The put call will write the data you passed in as one or more chunks. The second parameter is a hash of options for the Grid class. In this case we wish to annotate the file we are writing to Mongo DB with some metadata and also specify a content type. Each file entry in GridFS has support for metadata documents which might be very useful if you are for example storing images in you Mongo DB and need to store all the data associated with the image.

One important thing is to take not that the put method return a document containing a _id, this is an ObjectID identifier that you’ll need to use if you wish to retrieve the file contents later.

Right so we have written out first file, let’s look at the other two simple functions supported by the Grid class.

the requires and and other initializing stuff omitted for brevity

db.open(function(err, db) {
  if(!err) {
    var grid = new Grid(db, 'fs');
    var buffer = new Buffer("Hello world");
    grid.put.(buffer, {metadata:{category:'text'}, content_type: 'text'}, function(err, fileInfo) {
      grid.get(fileInfo._id, function(err, data) {
        console.log("Retrieved data: " + data.toString());
        grid.delete(fileInfo._id, function(err, result) {
        });
      });
    });
  }
});

Let’s have a look at the two operations get and delete

grid.get(fileInfo._id, function(err, data) {});

The get method takes an ObjectID as the first argument and as we can se in the code we are using the one provided in fileInfo._id. This will read all the chunks for the file and return it as a Buffer object.

The delete method also takes an ObjectID as the first argument but will delete the file entry and the chunks associated with the file in Mongo.

This api is the simplest one you can use to interact with GridFS but it’s not suitable for all kinds of files. One of it’s main drawbacks is you are trying to write large files to Mongo. This api will require you to read the entire file into memory when writing and reading from Mongo which most likely is not feasible if you have to store large files like Video or RAW Pictures. Luckily this is not the only way to work with GridFS. That’s not to say this api is not useful. If you are storing tons of small files the memory usage vs the simplicity might be a worthwhile tradeoff. Let’s dive into some of the more advanced ways of using GridFS.

Advanced GridFS or how not to run out of memory

As we just said controlling memory consumption for you file writing and reading is key if you want to scale up the application. That means not reading in entire files before either writing or reading from Mongo DB. The good news it’s supported. Let’s throw some code out there straight away and look at how to do chunk sized streaming writes and reads.

the requires and and other initializing stuff omitted for brevity

var fileId = new ObjectID();
var gridStore = new GridStore(db, fileId, "w", {root:'fs'});
gridStore.chunkSize = 1024 * 256;

gridStore.open(function(err, gridStore) {
 Step(
   function writeData() {
     var group = this.group();

     for(var i = 0; i < 1000000; i += 5000) {
       gridStore.write(new Buffer(5000), group());
     }
   },

   function doneWithWrite() {
     gridStore.close(function(err, result) {
       console.log("File has been written to GridFS");
     });
   }
 )
});

Before we jump into picking apart the code let’s look at

var gridStore = new GridStore(db, fileId, "w", {root:'fs'});

Notice the parameter “w” this is important. It tells the driver that you are planning to write a new file. The parameters you can use here are.

  • “r” - read only. This is the default mode
  • “w” - write in truncate mode. Existing data will be overwritten
  • “w+” - write in edit mode

Right so there is a fair bit to digest here. We are simulating writing a file that’s about 1MB big to Mongo DB using GridFS. To do this we are writing it in chunks of 5000 bytes. So to not live with a difficult callback setup we are using the Step library with its’ group functionality to ensure that we are notified when all of the writes are done. After all the writes are done Step will invoke the next function (or step) called doneWithWrite where we finish up by closing the file that flushes out any remaining data to Mongo DB and updates the file document.

As we are doing it in chunks of 5000 bytes we will notice that memory consumption is low. This is the trick to write large files to GridFS. In pieces. Also notice this line.

gridStore.chunkSize = 1024 * 256;

This allows you to adjust how big the chunks are in bytes that Mongo DB will write. You can tune the Chunk Size to your needs. If you need to write large files to GridFS it might be worthwhile to trade of memory for CPU by setting a larger Chunk Size.

Now let’s see how the actual streaming read works.

var gridStore = new GridStore(db, fileId, "r");
gridStore.open(function(err, gridStore) {
  var stream = gridStore.stream(true);

  stream.on("data", function(chunk) {
    console.log("Chunk of file data");
  });

  stream.on("end", function() {
    console.log("EOF of file");
  });

  stream.on("close", function() {
    console.log("Finished reading the file");
  });
});

Right let’s have a quick lock at the streaming functionality supplied with the driver (make sure you are using 0.9.6-12 or higher as there is a bug fix for custom chunksizes that you need)

var stream = gridStore.stream(true);

This opens a stream to our file, you can pass in a boolean parameter to tell the driver to close the file automatically when it reaches the end. This will fire the close event automatically. Otherwise you’ll have to handle cleanup when you receive the end event. Let’s have a look at the events supported.

  stream.on("data", function(chunk) {
    console.log("Chunk of file data");
  });

The data event is called for each chunk read. This means that it’s by the chunk size of the written file. So if you file is 1MB big and the file has chunkSize 256K then you’ll get 4 calls to the event handler for data. The chunk returned is a Buffer object.

  stream.on("end", function() {
    console.log("EOF of file");
  });

The end event is called when the driver reaches the end of data for the file.

  stream.on("close", function() {
    console.log("Finished reading the file");
  });

The close event is only called if you the autoclose parameter on the gridStore.stream method as shown above. If it’s false or not set handle cleanup of the streaming in the end event handler.

Right that’s it for writing to GridFS in an efficient Manner. I’ll outline some other useful function on the Gridstore object.

Other useful methods on the Gridstore object

There are some other methods that are useful

gridStore.writeFile(filename/filedescriptor, function(err fileInfo) {});

writeFile takes either a file name or a file descriptor and writes it to GridFS. It does this in chunks to ensure the Eventloop is not tied up.

gridStore.read(length, function(err, data) {});

read/readBuffer lets you read a #length number of bytes from the current position in the file.

gridStore.seek(position, seekLocation, function(err, gridStore) {});

seek lets you navigate the file to read from different positions inside the chunks. The seekLocation allows you to specify how to seek. It can be one of three values.

  • GridStore.IO_SEEK_SET Seek mode where the given length is absolute
  • GridStore.IO_SEEK_CUR Seek mode where the given length is an offset to the current read/write head
  • GridStore.IO_SEEK_END Seek mode where the given length is an offset to the end of the file

    GridStore.list(dbInstance, collectionName, {id:true}, function(err, files) {})

list lists all the files in the collection in GridFS. If you have a lot of files the current version will not work very well as it’s getting all files into memory first. You can have it return either the filenames or the ids for the files using option.

gridStore.unlink(function(err, result) {});

unlink deletes the file from Mongo DB, that’s to say all the file info and all the chunks.

This should be plenty to get you on your way building your first GridFS based application. As in the previous article the following links might be useful for you. Good luck and have fun.

Links and stuff

Countdown to KO #18: Load Testing with blitz.io

This is the 18th in series of posts leading up Node.js Knockout, and covers using blitz.io to load test your node app.

What’s blitz.io?

blitz.io

blitz.io, powered by Mu Dynamics, is a self-service load and performance testing platform. Built for API, cloud, web and mobile application developers, blitz.io quickly and inexpensively helps you ensure performance and scalabilty. And we make this super fun.

Why Load Test?

Node.js is purdy fast, but if you are not careful in the way you invoke backend services like CouchDB or MongoDB, you can easily cause pipeline stall making your app not scale to a large number of users. Typically you will end up with each concurrent request taking longer and longer resulting in timeouts and fail whales. Load testing shows you what kind of concurrency you can achieve with your app and how it’s actually scaling out.

Signing up

Go to our login page and use your Facebook or Google accounts to login in with just 2 clicks. As simple as that. You will immediately be able to run load test against your app from the blitz bar.

Running a Load Test (rush)

If your app is at http://my.cool.app, the following blitz line will generate concurrent hits against your app:

--pattern 1-250:60 --region virginia http://my.cool.app

As simple as that. If your express and connect routes have parameters in them that you use for looking up in your favorite database, you can read up on variables to parameterize query arguments and route paths so you can simulate production workloads on your app.

During the Node.js Knockout

We are super excited about sponsoring Node.js Knockout and have something fun planned.

At the start of the event, we are providing all contestants with enough blitz-power so you can generate lots of hits against your cool node.js app for 48 hours. We are also working on a scoreboard so you get bragging rights on the app with the most number of hits. Watch this page at the start of the event and you’ll know what to do.

Check it out!

Command-Line Testing

For those developers that don’t like UI and prefer command line, here’s the simplest way to run iterative load tests right after you git push your changes to the app:

$ gem install blitz
$ blitz api:init
$ blitz curl --pattern 1-250:60 --region virginia http://my.cool.app

To build cool node.js apps is awesome, to watch it scale out? priceless!

Countdown to KO #17: Natural Language Processing with Natural

This is the 17th in series of posts leading up Node.js Knockout, and covers using natural in your node app. This post was written by natural author and Node.js Knockout judge Chris Umbel.

“natural” is a general-purpose natural language processing library for node.js developed principally by Chris Umbel. Various algorithms in the way of stemming, classification, inflection, and phonetics are currently supported as well as basic WordNet usage.

At the time of writing “natural” is still young and support for new algorithms in the aforementioned categories or even other categories still are being feverishly developed. If you have anything to contribute consult the github repository.

This post will walk you through the installation of “natural”, consumption of the various components, and outline the future plans.

Installation

“natural” is available as an npm and can be installed as such:

$ npm install natural

Consumption

Stemming

Stemming is the processing of taking a word and stripping of affixes down to the base stem of the word. “natural” currently provides two algorithms for stemming: the Porter Stemmer and the Lancaster Stemmer.

Porter Stemmer

The Porter Stemmer was developed in 1979 by Martin Porter and was originally implemented in BCPL.

This example stems the string “words” to its root “word”.

var stemmer = require('natural').PorterStemmer;
console.log(stemmer.stem("words"));

This example illustrates a common pattern used throughout “natural”. The attach() method patches String to have stem() and tokenizeAndStem() helper methods.

The tokenizeAndStem() splits the string up on whitespace and punctuation, removes noise words, and then stems each remaining token into an array.

var stemmer = require('natural').PorterStemmer;
stemmer.attach();
console.log("i am waking up to the sounds of chainsaws".tokenizeAndStem());
console.log("chainsaws".stem());

Lancaster Stemmer

The Lancaster Stemmer (AKA Paice/Husk) algorithm was developed by Chris Paice at Lancaster University with some help by Gareth Husk. The Lancaster algorithm is somewhat aggressive in its removal of suffixes resulting is stems that aren’t correct spellings of their respective word. If used for comparison in systems such as full-text searches that’s typically acceptable.

var stemmer = require('natural').LancasterStemmer;
console.log(stemmer.stem("words"));
stemmer.attach();
console.log("i am waking up to the sounds of chainsaws".tokenizeAndStem());
console.log("chainsaws".stem());

Classification

Classification is the process of categorizing texts into predetermined classes automatically. Before the classification can occur it’s necessary to train the classifier on sample texts.

The only algorithm currently supported for classification in “natural” is Naive Bayes.

Notice that the training text can either be arrays of tokens or strings. Strings will be stemmed and have noise words removed so if you want your training data to be unmodified supply token arrays directly. This example will output “computing” on the first line and “literature” on the second.

var natural = require('natural'),
    classifier = new natural.BayesClassifier();

classifier.train([{classification: 'computing', text: ['fix', 'box']},
    {classification: 'computing', text: 'write some code.'},
    {classification: 'literature', text: ['write', 'script']},
    {classification: 'literature', text: 'read my book'}
]);

console.log(classifier.classify('there is a bug in my code.'));
console.log(classifier.classify('write a book.'));

Inflection

“natural” provides inflectors for transforming words. Currently a noun inflector is provided to pluralize and singularize nouns, a count inflector is provided to transform integers to their string ordinals i.e. “1st”, “2nd”, “3rd” and an experimental present tense verb inflector is provided for pluralizing/singularizing relevant verbs.

Noun Inflector

The following example uses the NounInflector to transform the word “beer” to “beers”.

var natural = require('natural'),
    nounInflector = new natural.NounInflector();

console.log(nounInflector.pluralize('beer'));
console.log(nounInflector.singularize('beers'));

Much like the stemmers an attach() method exists to patch String to perform the inflections with pluralizeNoun() and singularizeNoun() methods.

nounInflector.attach();
console.log('radius'.pluralizeNoun());
console.log('radii'.singularizeNoun());

Count Inflector

In this example the CountInflector converts the integers 1, 3 and 111 to “1st”, “3rd” and “111th” respectively.

var natural = require('natural'),
    countInflector = natural.CountInflector;

console.log(countInflector.nth(1));
console.log(countInflector.nth(3));
console.log(countInflector.nth(111));

Present Tense Verb Inflector

At the time of writing the PresentVerbInflector is still experimental and likely does not correctly handle all cases. It is, however, designed to transform present tense verbs between their singular and plural forms.

var verbInflector = new natural.PresentVerbInflector();
console.log(verbInflector.singularize('become'));
console.log(verbInflector.pluralize('becomes'));

And, of course, the attach() method is provided to patch String.

verbInflector.attach();
console.log('walk'.singularizePresentVerb());
console.log('walks'.pluralizePresentVerb());

Phonetics

“natural” employes two phonetic algorithms to determine if words sound alike, SoundEx and Metaphone.

SoundEx

SoundEx is an old algorithm that was originally designed for use in physical filing systems and was patented in 1918. Despite its age it’s been widely adopted in modern computing to determine if words sound alike.

Here’s an example of using “natural“‘s implementation.

var soundEx = require('natural').SoundEx;

if(soundEx.compare('ruby', 'rubie'))
    console.log('they sound alike');

The raw SoundEx phonetic code can be obtained with the process() method. The following example outputs a cryptic “R100”.

console.log(soundEx.process('rubie'));

Of course an attach() method is provided to patch string with helpers. Note that the tokenizeAndPhoneticize() method splits a string up into words, and returns an array of phonetic codes.

console.log('phonetics'.phonetics());
console.log('phonetics rock'.tokenizeAndPhoneticize());

if('ruby'.soundsLike('rubie'))
    console.log('they sound alike');

Metaphone

“natural” also implements the Metaphone phonetic algorithm which is considerably newer (developed in 1990 by Lawrence Philips) and more robust than SoundEx. Its implementation in “natural” mirrors SoundEx.

var metaphone = require('natural').Metaphone;

if(metaphone.compare('ruby', 'rubie'))
    console.log('they sound alike');

metaphone.attach();

console.log('phonetics'.phonetics());
console.log('phonetics rock'.tokenizeAndPhoneticize());

if('ruby'.soundsLike('rubie'))
    console.log('they sound alike');

WordNet

A new and somewhat experimental feature of “natural” is WordNet database integration. WordNet organizes English words into synsets (groups of synonyms), and contains example sentences and definitions.

Lookup

Consider the following example which looks up all entries for the word “node” in WordNet.

Note the path parameter passed in to the WordNet constructor. That’s the path where the WordNet database files are to be stored. If the files do not exist “natural” will download them for you.

var natural = require('natural'),
    wordnet = new natural.WordNet('.');

wordnet.lookup('node', function(results) {
    results.forEach(function(result) {
        console.log('------------------------------------');
        console.log(result.synsetOffset);
        console.log(result.pos);
        console.log(result.lemma);
        console.log(result.pos);
        console.log(result.gloss);
    });
});

Synonyms

In this example a list of synonyms are retrieved for the first result of a lookup via the getSynonyms() method.

var natural = require('natural'),
    wordnet = new natural.WordNet('.');

wordnet.lookup('entity', function(results) {
    wordnet.getSynonyms(results[0], function(results) {
        results.forEach(function(result) {
            console.log('------------------------------------');
            console.log(result.synsetOffset);
            console.log(result.pos);
            console.log(result.lemma);
            console.log(result.pos);
            console.log(result.gloss);
        });
    });
});

Future Plans

While “natural” has a reasonable amount of functionality at this point it has quite a way to go to make it to the level of projects like Python’s Natural Language Toolkit.

To make up that gap in the short term plans are in the works to implement part of speech (pos) tagging, the double-metaphone phonetic algorithm, and a maximum entropy classifier.

In the longer term extending “natural” beyond English is a hope, but will require additional expertise.

If you have the interest to help out please do so!

Countdown to KO #16: Stock Market Mashups with TradeKing

This is the 16th in series of posts leading up Node.js Knockout, and covers using TradeKing in your node app.

At TradeKing, we’ve all been infatuated with Node. From its inception we’ve been touting its swift performance, reasonable learning curve, and its particular ability to add a completely new dimension to web applications.

While developing the API we were always thinking about the angles developers might use to create riveting new experiences for traders, and many of those angles have a very common intersection: real-time. Whether it’s streaming market data or interactive real-time charting, the financial industry moves incredibly quick and requires web technologies to match its pace. Node combines perfectly with web sockets allowing us to meet those needs in a very agile way. The latest of which was a quick mashup demo for an internal board meeting.

Here is a quick tutorial of how we got Node and Sockets working with our API in a demo watchlist application. The idea: a streaming watchlist tool that integrates with Twitter. What’s a watchlist? Think of it as an interactive list of stocks you might hold or be interested in holding.

TradeKing Screenshot

Installation

First things first, grab the project repository from http://github.com/tradeking/node-watchlist. Once you clone that locally, hop in the new repository and run npm install to grab all the projects dependencies.

Configuration

Crack open the server.js file and fill in the configuration here:

// Configuration!
global.tradeking = {
  api_url: "https://api.tradeking.com/v1",
  consumer_key: "",
  consumer_secret: "",
  access_token: "",
  access_secret: ""
}
global.twitter_user = {
  consumer_key : '',
  consumer_secret : '',
  access_token_key : '',
  access_token_secret : ''
}

You can get all of your TradeKing keys at https://developers.tradeking.com by creating a developer application. Create a Twitter application (http://dev.twitter.com) to get those keys as well.

Authentication

The TradeKing API uses OAuth authentication so it was a snap to start talking to the API and there was no shortage of Twitter modules to snag their stream. Since we’ve supplied all of our keys, we don’t need the full flow so we’ll just setup the consumer and bring our own access tokens to the table (see the next step).

global.tradeking_consumer = new oauth.OAuth(
  "https://developers.tradeking.com/oauth/request_token",
  "https://developers.tradeking.com/oauth/access_token",
  tradeking.consumer_key,
  tradeking.consumer_secret,
  "1.0",
  "http://localhost:3000/tradeking/callback",
  "HMAC-SHA1");

global.twitter_consumer = new oauth.OAuth(
  "https://twitter.com/oauth/request_token",
  "https://twitter.com/oauth/access_token",
  twitter_user.consumer_key,
  twitter_user.consumer_secret,
  "1.0A",
  null,
  "HMAC-SHA1");

Making Requests to TradeKing

Now that the consumer is set up, making requests is a breeze!

tradeking_consumer.get(
  tradeking.api_url+'/market/quotes.json?watchlist=DEFAULT&delayed=false',
  tradeking.access_token,
  tradeking.access_secret,
  function(error, data, response) {
    quotes = JSON.parse(data);
    if(quotes.response.type != "Error") {
      client.emit('watchlist-quotes', quotes.response.quotes.instrumentquote);
    }
  }
);

This bit of code makes a GET request to a specified URL and using our access token/secret. Once completed the callback is executed. In this particular instance we are parsing the returned JSON data, checking for errors, and then sending a socket event to the client.

Want to know more?

Since we’ve open sourced the whole application and slapped it up on Github, pull it down, throw your keys in and check out how it all works — maybe even make some upgrades and submit a pull request! Head over to our forums to see what the rest of the devs are up to or to drop us a note about your progress with the API.

Online trading has inherent risk due to system response and access times that may vary due to market conditions, system performance, and other factors. An investor should understand these and additional risks before trading.*

© 2011 TradeKing. All rights reserved. Member FINRA and SIPC

Countdown to KO #15: Publish/Subscribe with PubNub

This is the 15th in series of posts leading up Node.js Knockout, and covers using PubNub in your node app.

PubNub lets you connect mobile phones, tablets, web browsers and more with a 2 Function Publish/Subscribe API (send/receive).

HTML Interface

If you are building HTML5 Web Apps, start by copying and pasting the code snippet below. If not, skip to Other Languages.

<div pub-key="demo" sub-key="demo" id="pubnub"></div>
<script src="http://cdn.pubnub.com/pubnub-3.1.min.js"></script>
<script>(function(){

    // Listen For Events
    PUBNUB.subscribe({
        channel  : "hello_world",      // Channel
        error    : function() {        // Lost Connection (auto reconnects)
            alert("Connection Lost. Will auto-reconnect when Online.")
        },
        callback : function(message) { // Received An Event.
            alert(message.anything)
        },
        connect  : function() {        // Connection Established.

            // Send Message
            PUBNUB.publish({
                channel : "hello_world",
                message : { anything : "Hi from PubNub." }
            })

        }
    })

})();</script>

Other Languages

Follow the instructions linked below to use PubNub APIs from other programming languages: Node, Ruby, PHP, Python, Perl, Erlang and more programming languages on GitHub.