bayes.js - naïve Bayesian classification in JavaScript ====================================================== This module is a JavaScript port of Divmod Reverend, (c) 2003 Amir Bakhtiar (c) 2007 Sam Angove This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, version 2.1 ONLY. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. Version 0.7 Status ------ This software is not under active development. Notable omissions: * I haven't implemented Robinson-Fisher. * For obvious reasons there's no `load` or `save`, but there also isn't a JavaScripty analogue like `toJSONString()`. Other files in this directory ----------------------------- * `test.html`: JsUnit tests, not comprehensive * `generate-tests.py`: generates tests to ensure that output matches that of the Python version. Generated tests are already included in test.html. Basic usage ----------- var guesser = new Bayes(); guesser.train("hannibal", "I love to kill people and eat them."); guesser.train("austen", "Come, let us have tea and scones in Mr. Bingley's gazebo."); guesser.guess("Jane, these scones are simply delightful!"); // [["austen", 0.9999]] guesser.train("hannibal", "I love to kill people and eat them with tea and scones."); guesser.guess("Give me those scones or I'll kill and eat you."); // [["hannibal", 0.9481433307479079], ["austen", 0.6203339133520634]] You'll have to read the source to get more than that, I'm afraid. It's pretty flexible, though; e.g., you can replace the naïve whitespace tokenizer with a just-as-naïve bigram tokenizer like this: // split into bigrams function bigramTokenize(s) { var tokens = s.toLowerCase().split(/\s+/); var t1, t2, out = []; while (true) { t1 = tokens.shift(); t2 = tokens.shift(); if (typeof t1 == "undefined") { break; } else if (typeof t2 == "undefined") { out.push(t1); } out.push(t1 + " " + t2); } return out; } guesser.tokenize = bigramTokenize; It's not designed for complicated usage, so you might have a harder time using, say, a scheme that relies on choosing significant n-grams.