Spam Filtering: Understanding SEP and CEP

April 13, 2008

In order to help folks further understand the differences between CEP and SEP, prompted by Marc’s reply in the blogosphere, More Cloudy Thoughts, here is the scoop.

In the early days of spam filtering, let’s go back around 10 years, detecting spam was performed with rule-based systems.  In fact, here is a link to one of the first papers that documented rule-based approaches in spam filtering, E-Mail Bombs and Countermeasures: Cyber Attacks on Availability and Brand Integrity published in IEEE Network Magazine, Volume 12, Issue 2, p.10-17 (1998).   At the time, rule-based approaches were common (the state-of-the-art) in antispam filtering.

Over time, however, the spammers get more clever and they find many ways to poke holes in rule-based detection approaches.  They learn to write with spaces between the letters in the words, they change the subject and message text frequently, they randomize their originating IP addresses, they use IP addresses of your best friends, they changed the timing and frequency of the spam, etc. ad infinitium.

Not to sound like an elitist for speaking the truth,  but the more operational experience you have with detection-oriented solutions, the more you will understand that rule-based approaches (alone) are not scalable nor efficient.  If you followed a rules-based approach (only), against heavy, complex spam (the type of spam we see in cyberspace today), you would spend much of your time writing rules and still not stop very much of the spam!

The same is true for the security situation-detection example in Marc’s example.

Like Google’s Gmail spam filter, and Microsoft’s old Mr Clippy (the goofy help algorithm of the past), you need detection techiques that use advanced statistical methods to detect complex situations as they emerge.  With rules, you can only detect simple situations unless you have a tremendous amount of resources to build a maintain very complex rule bases (and even then rules have limitations for real-time analytics).

We did not make this up at Techrotech, BTW.   Neither did our favorite search engine and leading free email provider, Google!   

This is precisely why Gmail has a great spam filter.   Google detects spam with a Bayesian Classifer, not a rule-based system.    If they used (only) a rule-based approach, your Gmail inbox would be full of spam!!! 

The same is true for search and retrieval algorithms, but that is a topic for another day.  However, you can bet your annual paycheck that Google uses a Bayesian type of classifer in their highly confidential search and retreival (and – hint – classification) algorithms.

In closing, don’t let the folks selling software and analysts promoting three-letter-acronyms (TLAs) cloud your thinking. 

What we are seeing in the market place, the so-called CEP market place, are simple event processing engines.  CEP is already happening in the operations of Google, a company that needs real-time CEP for spam filtering and also for search-and-retrieval.  We also see real-time CEP in top quality security products that use advanced neural networks, and Bayesian networks, to detect problems (fraud, abuse, denial-of-service attacks, phishing, identity theft) in cyberspace.


Threats to the Democratic Process

April 13, 2008

Our readers might recall that this post by Tim Bass, The Top Ten Cybersecurity Threats for 2008.  One of the top ten threats to cybersecurity in 2008, according to this post, was:

    — Subversion of democratic political processes.

Interestingly enough, Electoral-Vote.com, a site maintained by Dr. Andrew Tanenbaum, Professor of Computer Science at the Vrige University in Amsterdam, reports (a link to a news story) that the US presidental election can be hacked.

This is not science fiction folks, it is simply the political and social realities of our brave new electronic world.


A Bitter Pill To Swallow: First Generation CEP Software Needs To Evolve

February 8, 2008

Frankly speaking, the CEP market is now saturated with hype about all the great things CEP can do, detecting opportunities and threats in real time and supporting the decision cycle.  However, in my opinion, it is time for the software vendors and analysts to move beyond the marketing hype and demonstrate real operational value with strong end user success, something seriously lacking today.

I have advocated this evolution for two years, including the notion of expanding CEP capabilities with proven techniques for event processing that have worked well long before current “Not yet CEP but called CEP” software hit the marketplace and airwaves.

For example, in my first CEP/EP presentation in New York in 1Q 2006, I presented Processing Patterns for Predictive Business and talked about how the US military has implemented high performance detection-oriented systems for many years (in the art-and-science of multisensor data fusion, MSDF), and how every day, when we sit at home (or at work or in transit), we are comforted to know we are safe from missile attacks because of what I would also call “complex event processing.”   There is a very rich history of “CEP but not called CEP” behind the scenes keeping people safe and warm. (The same thing can be said with many similar examples of complex event processing in use today, but not called “CEP” by CEP software vendors.)

This is one reason, when I read the “CEP history lessons,” I am amused at how, at times, the lessons appear self-serving, not end user serving.  There is so much rich event processing history and proven architectures in “CEP but not called CEP” (CEP that actually works, in practice everyday, long before it was called CEP).  It continues to puzzle me that a few people the CEP/EP community continue to take the “we invented EP” view.  Quite frankly, the history we read is missing most, if not all, of the history and practice of MSDF.

When we take the current CEP COTS software offerings and apply it to these working “CEP but not called CEP” applications, the folks with real operational “CEP but not called CEP” detection-oriented experience quickly cut through the hype because they are, based on their state-of-the-practice, now seeking self-learning, self-healing “real CEP type” systems.  They are not so excited about first generation technologies full of promises from software vendors with only a few years of experience in solving detection-oriented problems and very few real success stories.

The same is true for advanced fraud detection and other state-of-the-art detection-oriented processing of “complex events” and situations.  The state-of-the-art of complex event processing, in practice, is far beyond the first generation CEP engines on the market today. 

This is one of the reasons I have agreed with the IBM folks who are calling these first generation “CEP orchestration engines” BEP engines, because that view is closer to fact than fiction.  Frankly speaking again, process orchestration is much easier than complex detection with high situation detection confidence and also low false alarms.

Customers who are detection-savvy also know this, and I have blogged about a few of these meetings and customer concerns.  For example, please read my blog entry about a banker who was very sceptical in a recent wealth management conference in Bangkok.  I see this reaction all the time, in practice. 

Complex problems are not new and they still cry out for solutions.  Furthermore, many current-generation event processing solutions are already more advanced that the first generation CEP engines on the “call it CEP” market today.  This is a very real inhibitor, in my opinion, to growth in the “call it CEP” software space today – and credibility may ultimately be “at risk.”  Caution is advised.

Candidly speaking again, there are too many red-herring CEP-related discussions and not enough solid results given the time software vendors have been promoting CEP/EP (again, this is simply my opinion).  The market is in danger of eventually losing credibility, at least in the circles I travel and complex problems I enjoy solving, because the capabilities of the (so called) CEP technologies by software vendors in the (so called) CEP space have been over sold; and, frankly speaking, I have yet to see tangible proof of “real CEP capabilities” in the road maps and plans of the current CEP software vendors.  This is dissappointing.

This pill is bitter and difficult to swallow, but most of my life’s work has been advising, formulating and architecting real-time solutions for the end user (the C-level executives and the operational experts with the complex problems to solve).   CEP software must evolve and there needs to be more tangible results, not more marketing hype.


Key Indicators (KIs) Versus Key Performance Indicators (KPIs)

January 31, 2008

SL‘s new web page, Solutions for CEP Engine Users, discusses how CEP is a “technology that is used to help companies detect both opportunities and threats in real-time with minimal coding and reusable key performance indicators (KPIs) and business models.”

I agree with SL, but would like to suggest my friends at SL expand the notion of KPIs in CEP to include the idea of KIs.  In my opinion, the SL phrase should read,  “technology that is used to help companies detect both opportunities and threats in real-time with minimal coding and reusable key indicators (KIs) and business models.”  

The reason for my suggestion is that KPIs are a subset of KIs.   KIs designate, in my mind, more than just performance.  

CEP is used to both detect opportunities and threats in real-time which may, or may not be, performance related.  For example, when a CEP engine detects evidence of fraudulent behavior, this is a KI.  The knowledge, or pattern, used to estimate this situation is a KI not a KPI, per se.   Also, when a CEP application is processing market data and indicates that it is the right time to purchase an equity and enter the market,  the knowledge used in this decision support application is a KI, not a KPI.

Therefore, I recommend when folks think about the notion of  “key performance indicators” (KPIs) in CEP and BAM, they should also think in terms of “key indicators” (KIs).   Detecting opportunities and threats in real-time are much broader than the traditional notion of KPIs. 


The ART of Event Processing: Agility, Reuse, Transparency

January 18, 2008

The other day I discussed CEP in Layman’s Terms: Reuse and Agility. Today, our topic is CEP and transparency. One of the major benefits of “white box” event processing solutions is transparency, something not readily available or obvious in black-box solutions.

Friend and colleague John Bates, Progress Apama, often discusses the benefits of white-box algorithmic trading platforms in terms of increased time-to-market and other competitive advantages. I agree with John and would like to point out that there is another key benefit, in simple layman’s terms, transparency.

For example, let’s say you have designed an event processing solution for operational risk management (ORM). It is time for your favorite auditors to come by and they wish to take a look at what is going on with that proprietary black-box ORM application running quietly in the server room.

The nice auditors ask you, “What does that application do?” and you reply “Well, it looks for evidence of insider trading,” and they ask “Do you mind if we ask how?” and you respond “Good question, do you mind to wait a moment while I get you the contact info for the vendor because we don’t have access to the source code or the actual key indicators (KIs)?”

Now, let’s look at the white-box scenario:

Again, the nice auditors ask you, “What does that application do?” and you reply “Well, it looks for evidence of insider trading,” and they ask “Do you mind if we ask how?” and you respond “Yes, sit down and we will pull up our insider trading key indicator models. These models are stored in XML format and viewable in our graphical KI design studio. We can print out the KI models for insider trading if you like!” and the smiling auditor says “Thank you, your system is much more transparent than the last place we visited!”

This scenario also applies in looking for why certain KIs were not detected that should have been; or when performing a root cause analysis to see why the KI you used in your wrong business decision was inaccurate.

So, CEP in layman’s terms is what we might refer to as the ART of event processing:

  • Agility
  • Reuse
  • Transparency

Please feel free to reuse these idea, but please don’t forget to reference the author and this blog 🙂

Kindly share and reuse by reference, because all content in The CEP Blog is ©2007-2008 Tim Bass – All Rights Reserved. Thank you!


The Top Information Security Risks for 2008

January 15, 2008

Blogging has it’s rewards.

I recently published a list of the Top Ten Cybersecurity Threats for 2008.

This list motivated another collaborative list for 2008, organized by Dr. Gary HinsonThe Top Information Security Risks for 2008.


Keyloggers: Why Banks Need Two-Factor Authentication

January 14, 2008

Recently I briefed banking executives in Bangkok on how easy it is to steal userIDs and passwords from their on-line banking customers and why they must have two-factor authentication.   To illustrate my key points, I showed the captive audience various pictures of hardware keyloggers, for example the small black keylogger circled in the figure below.

A Keylogger

There are PS2 keyloggers (illustrated above) and USB keyloggers. There are even keyboards with the keyloggers built into normal looking keyboards, so you have no idea a keylogger is there.    Don’t believe me?   You can search the net and find so many!

Today I was reminded about my recent meeting in this Network World article, Two-factor authentication: Hot technology for 2008.  This article mentions numerous token-based two-factor authentication (2FA) solutions.  However, it misses a popular and inexpensive two-factor authentication used here in Thailand and APAC:  SMS-based 2FA.

In a nutshell, SMS-based 2FA involves having your on-line banking system send an SMS message with a one-time password (OTP) to your cell phone.   You then must enter the OTP to complete your transaction.

Is this a perfect solution?

No.

But, it is much better than than just passwords!

A ten year old child can easily steal your userID and password, really.

So, the next time you are at an Internet cafe, trusting your SSL link to your bank, don’t forget to take a peek at the computer and look for a small keylogger.   

Well, on the other hand, also don’t forget to bring your own keyboard (or laptop) 🙂