If you liked SpamBayes, you'll love SpamNeoBayes -- a fork of the project that uses a more accurate classifier and adds several important features.
The classifier is DMNBtext from Weka. Invented in 2008 by Dr. Jiang Su and his colleagues at the University of Ottawa, it's designed to be less likely to become biased if the same message is trained twice, or when two very similar messages are trained. To learn more about it, see "Discriminative Parameter Learning for Bayesian Networks". In testing, I found this classifier was almost as accurate as the Support Vector Machine (which is too slow for a real-time application like this). I converted Su's implementation, which was in Java, into Lua and made minor refinements.
Extra features include:
- Separate categories for reportable and non-reportable spam. This is useful when you want to auto-report gold spam, but also filter out guild-recruitment spam without reporting it.
- Doesn't auto-report innocent players if you use TradeForwarder or LFGForwarder.
- Training reportable spam automatically reports it when possible.
Planned future features include:
- Approximate word matching for unknown words that may have been accidentally or maliciously misspelled. Nice try, mister "WTS WOW DOLD"!
- Full integration with TradeForwarder and LFGForwarder.
Note that with discriminative learning, it is impossible to support unlearning individual messages.
How to use
Using SpamNeoBayes is similar to using SpamBayes; the following are the differences:
- Type /snb or /spamneobayes to open the window, then right-click a message or its author's name to train it. You can also train messages that haven't been filtered by right-clicking them in a chat frame.
- When training a message, you can choose one of three categories:
- Ham: This message is an example of what you want to see.
- Spam: You never want to see messages like this one, but you never want to report them to the GMs as spam either. This is a good idea for things like guild recruitment messages, and conversations about music genres you don't like. I also use it for anal/murloc/Thunderfury spam, since I feel GMs' time is better spent shutting down gold sellers.
- Reportable: You never want to see messages like this one, and when automatic reporting is enabled, you want to report them to the GMs. Training a message as Reportable reports it to the GMs to save you some clicks, even if automatic reporting isn't enabled.
- In the window, 2 percentages will appear next to a message, separated by a slash. The first is the chance that the message is any kind of spam; the second is the chance that it's reportable spam.
- The more wrong SpamNeoBayes is about a message, the greater its "weight", and the more difference it will make to train it. Accidentally training the same message twice won't introduce as much bias as it would in the original SpamBayes.
- Date created
- Sep 19, 2011
- Last update
- Nov 16, 2011
- Development stage
- GNU General Public License version 3 (GPLv3)
- Recent files