Clouding and Confusing the CEP Community

April 20, 2008

Ironically, our favorite software vendors have decided, in a nutshell, to redefine Dr. David Luckham’s definition of “event cloud” to match the lack-of-capabilities in their products.  

This is really funny, if you think about it. 

The definition of “event cloud” was coordinated over a long (over two year) period with the leading vendors in the event processing community and is based on the same concepts in David’s book, The Power of Events. 

But, since the stream-processing oriented vendors do not yet have the analytical capability to discover unknown causal relationship in contextually complex data sets, they have chosen to reduce and redefine the term “event cloud” to match their product’s lack-of-capability.  Why not simply admit they can only process a subdomain of the CEP space as defined by both Dr. Luckham and the CEP community-at-large? 

What’s the big deal?   Stream processing is a perfectly respectable profession!

David, along with the “event processing community,” defined the term “event cloud” as follows:

Event cloud: a partially ordered set of events (poset), either bounded or unbounded, where the partial orderings are imposed by the causal, timing and other relationships between the events.

Notes: Typically an event cloud is created by the events produced by one or more distributed systems. An event cloud may contain many event types, event streams and event channels. The difference between a cloud and a stream is that there is no event relationship that totally orders the events in a cloud. A stream is a cloud, but the converse is not necessarily true.

Note: CEP usually refers to event processing that assumes an event cloud as input, and thereby can make no assumptions about the arrival order of events.

Oddly enough, quite a few event processing vendors seem to have succeeded at confusing their customers, as evident in this post, Abstracting the CEP Engine, where a customer has seemingly been convinced by the (disinformational) marketing pitches  – “there are no clouds of events, only ordered streams.”

I think the problem is that folks are not comfortable with uncertainty and hidden causal relationships, so they give the standard “let’s run a calculation over a stream” example and state “that is all there is…” confusing the customers who know there is more to solving complex event processing problems.

So, let’s make this simple (we hope). referencing the invited keynote at DEBS 2007, Mythbusters: Event Stream Processing Versus Complex Event Processing.

In a nutshell…. (these examples are in the PDF above, BTW)

The set of market data from Citigroup (C) is an example of multiple “event streams.”

The set of all events that influence the NASDAQ is an “event cloud”.

Why?

Because a stream  of market data is a linear ordered set of data related by the timestamp of each transaction linked (relatively speaking) in context because it is Citigroup market data.    So, event processing software can process a stream of market data, perform a VWAP if they chose, and estimate a good time to enter and exit the market.  This is “good”.

However, the same software, at this point in time, cannot process many market data feeds in NASDAQ and provide a reasonable estimate of why the market moved a certain direction based on a statistical analysis of a large set of event data where the cause-and-effect features (in this case, relationships) are difficult to extract.  (BTW, this is generally called “feature extraction” in the scientific community.)

Why?

Because the current-state-of-the-art of stream-processing oriented event processing software do not perform the required backwards chaining to infer causality from large sets of data where causality is unknown, undiscovered and uncertain.

Forward chaining, continuous query, time series analytics across sliding time windows of streaming data can only perform a subset of the overall CEP domain as defined by Dr. Luckham et al.

It is really that simple.   Why cloud and confuse the community?

We like forward chaining using continuous queries and time series analysis across sliding time windows of streaming data. 

  • There is nothing dishonorable about forward chaining using continuous queries and time series analysis across sliding time windows of streaming data.   
  • There is nothing wrong with forward chaining using continuous queries and time series analysis across sliding time windows of streaming data. 
  • There is nothing embarrassing about forward chaining using continuous queries and time series analysis across sliding time windows of streaming data. 

Forward chaining using continuous queries and time series analysis across sliding time windows of streaming data is a subset of the CEP space, just like the definition above, repeated below:

The difference between a cloud and a stream is that there is no event relationship that totally orders the events in a cloud. A stream is a cloud, but the converse is not necessarily true.

It is really simple.   Why cloud a concept so simple and so accurate?


A Vocabulary of Confusion

April 16, 2008

The blog post, On Event Processing Agents, reminds me of a presentation back in March 2006, where TIBCO‘s ex-CEP evangelist Tim Bass (now busy working for a conservative business advisory company in Asia and off the blogosphere, as we all know) presented his keynote, Processing Patterns for Predictive Business, at the first event processing symposium.

In that presentation, Tim introduced a functional event processing reference architecture based on the long established art-and-science of multisensor data fusion (MDSF).   He also highlighted the importance of mapping business requirements for event processing to established processing analytics and engineering patterns.

In addition, Tim introduced a new slide (shown below),  “A Vocabulary of Confusion,” by adapting a figure from the Handbook of Multisensor Data Fusion, overlaying the notional overlap (and confusion) of the engineering components of MSDF with CEP and ESP, to illustrate this confusion:

One idea behind the slide above, dubbed the “snowman” by Tim, was that there is a wealth of mature and applicable knowledge regarding technical and high functional pre-existing event processing applications that span many years and multiple disciplines in the art-and-science of MSDF.     A few emerging event processing communities, vendors and analysts do not seem to be leveraging the art-and-science of multiple core engineering disciplines, including well established vocabularies and event processing architectures.  

On Event Processing Agents implies  a “new” event processing reference architecture with terms like,  (1) simple event processing  agents for filtering and routing, (2)  mediated event processing agents for event enrichment, transformation, validation, (3) complex event processing agents for pattern detection, and (4) intelligent event processing agents for prediction, decisions.

Frankly, while I generally agree with the concepts, I think the terms in On Event Processing Agents tend to add to the confusion because these concepts in On Event Processing Agents are following, almost exactly, the same reference architecture (and terms) for MSDF, illustrated again below to aid the reader. 

Unfortunately, On Event Processing Agents does not reference the prior art:

Event Processing Reference Architecture

My question is why,  instead of creating and advocating a seemingly “new vocabulary” and “new event processing theory”, why not leverage the excellent prior art over the past 30 years?  

Why not leverage the deep (very complex) event processing knowledge, well documented and solving some of the challenging CEP/EP problems we face today,  by some of the top minds in the world?   

Why not build upon the knowledge of a mature pre-existing CEP community (a community that does not call itself CEP) that has been building successful operational event processing applications for decades?

Why not move from a seemingly “not really invented here” approach to “let’s embrace the wealth of knowledge and experience already out there” worldview?

Since March 2006, this question remains unanswered and, in my opinion, the Vocabulary of Confusion,  introduced in March 2006 at the first unofficial EPTS party, is even more relevant today.   Competition is good;  new ideas are good; new perspective are good; however ignoring 30 years of prior art and not leveraging critical prior art is not very good, is it?

Frankly speaking, there is more than enough CEP theory in the art-and-science of MSDF.  If we map the prior art of operational MSDF systems against existing “CEP platforms” we will gain critical knowledge in just how far behind the emerging CEP/EP software vendors are in their understanding of where event processing has been and where the art-and-science is headed.  

Well, enough of blogging for now.   Time to get back to mudane SOA “hearding cats” tasks at Techrotech, so I’ll be back Off The Grid for a while.

 


Event Processing in Twitter Space

April 14, 2008

I don’t Twitter.  

But….

Then all the Twitter jokes on Geek and Poke got my attention.

Then, again, I started thinking ….

What if we could process all those Twitter events, all the millions of answers to the little Twitter question:

What are you doing now?

What if your entire sales force Twittered?

Maybe a slick Twitter alliance with SalesForce.com?

Then, we process all the Twitterevents in Twitterspace.

What could we discover? 

Twitter Trends?   Twitter Demographics?    Twitter Agent Behavior?

Maybe Twitter can merge with Simutronics for real-time gaming with Twitterites?


Spam Filtering: Understanding SEP and CEP

April 13, 2008

In order to help folks further understand the differences between CEP and SEP, prompted by Marc’s reply in the blogosphere, More Cloudy Thoughts, here is the scoop.

In the early days of spam filtering, let’s go back around 10 years, detecting spam was performed with rule-based systems.  In fact, here is a link to one of the first papers that documented rule-based approaches in spam filtering, E-Mail Bombs and Countermeasures: Cyber Attacks on Availability and Brand Integrity published in IEEE Network Magazine, Volume 12, Issue 2, p.10-17 (1998).   At the time, rule-based approaches were common (the state-of-the-art) in antispam filtering.

Over time, however, the spammers get more clever and they find many ways to poke holes in rule-based detection approaches.  They learn to write with spaces between the letters in the words, they change the subject and message text frequently, they randomize their originating IP addresses, they use IP addresses of your best friends, they changed the timing and frequency of the spam, etc. ad infinitium.

Not to sound like an elitist for speaking the truth,  but the more operational experience you have with detection-oriented solutions, the more you will understand that rule-based approaches (alone) are not scalable nor efficient.  If you followed a rules-based approach (only), against heavy, complex spam (the type of spam we see in cyberspace today), you would spend much of your time writing rules and still not stop very much of the spam!

The same is true for the security situation-detection example in Marc’s example.

Like Google’s Gmail spam filter, and Microsoft’s old Mr Clippy (the goofy help algorithm of the past), you need detection techiques that use advanced statistical methods to detect complex situations as they emerge.  With rules, you can only detect simple situations unless you have a tremendous amount of resources to build a maintain very complex rule bases (and even then rules have limitations for real-time analytics).

We did not make this up at Techrotech, BTW.   Neither did our favorite search engine and leading free email provider, Google!   

This is precisely why Gmail has a great spam filter.   Google detects spam with a Bayesian Classifer, not a rule-based system.    If they used (only) a rule-based approach, your Gmail inbox would be full of spam!!! 

The same is true for search and retrieval algorithms, but that is a topic for another day.  However, you can bet your annual paycheck that Google uses a Bayesian type of classifer in their highly confidential search and retreival (and – hint – classification) algorithms.

In closing, don’t let the folks selling software and analysts promoting three-letter-acronyms (TLAs) cloud your thinking. 

What we are seeing in the market place, the so-called CEP market place, are simple event processing engines.  CEP is already happening in the operations of Google, a company that needs real-time CEP for spam filtering and also for search-and-retrieval.  We also see real-time CEP in top quality security products that use advanced neural networks, and Bayesian networks, to detect problems (fraud, abuse, denial-of-service attacks, phishing, identity theft) in cyberspace.


Models and Reductionism – Reducing Clouds Into Streams

April 13, 2008

Reducing complex problems sets to simple problem sets is an interesting, and sometimes valid, approach to complex event processing.    Transformations can be useful, especially when well defined.

For example, CEP was envisioned as a new technology to debug relatively large distributed systems, discover hidden causal relationships in seemingly disconnected event space.    This “discovery” requires backwards chaining with uncertainty, for example.  Most of the current so-called “CEP software” (on the market today (including Marc Adler’s SQL-based examples) do not perform backwards chaining (with uncertainty).   This is also true from other so-called CEP products, like most forward chaining RETE engines – for example, see this post.

Marc Adler says he is, “hunting for advice from people who might have implemented event clouds in Coral8, Streambase, and Aleri, all three which are based on SQL.”

Current steaming SQL engines cannot model true event clouds without reducing the cloud to causal-ordered sets of linear streaming data.   These software tools are stream processing engines that process events in a time window of continuous streaming data.  These products are not, in reality, CEP engines – calling them “CEP engines” is marketing-speak, not technology-speak!

Reducing complex models to simple ones is a valid technique for problem solving.  Likewise, eliminating uncertainty and assuming causality is a way to reduce complexity. 

CEP was envisioned to discover causal relationships in complex, uncertain, “cloudy data” and the current state-of-the-art of software from the streaming SQL vendors do not have this capability, unless you reduce all event models to ordered sets of streaming data (reduce POSETS to TOSETS).

Reductionism can be a valid technique, of course.  We can eliminate uncertainty, force strict determinism, demand apriori system rules and perform all sorts of tricks to permit us to reduce complex problems to simple ones.  

However this also results in reducing CEP  (complex event processing) to SEP (simple event processing).  

 


Implementing the Event Cloud

April 13, 2008

In his post, Cloudy Thinking, Marc Adler asks how to implement the event cloud.

As a reminder, we process event clouds; we don’t implement them.   Event clouds simply exist, independent of our desire to process and extract meaningful information from the event cloud.

For example, there are many voices in a crowded stadium.  These voices make up the “sound cloud” (or maybe you prefer the term “voice cloud”), in a manner of speaking.   The “trick” is to have the processing capability to listen to the “sound cloud” and detect opportunities and threats in real-time.   So, in theory, we might call this “complex sound processing”.

Events exists.

The stated goal of CEP is to process event clouds in order to detect opportunities and threats in the business world, in real-time.

We don’t “implement” the event cloud because the events exist independent of our capability to process the cloud and extract meaningful and actionable situational knowledge from the cloud.

However, event clouds are represented as POSETS.  This is directly from the CEP literature.

 Note:  See also, these posts on POSETS.


Threats to the Democratic Process

April 13, 2008

Our readers might recall that this post by Tim Bass, The Top Ten Cybersecurity Threats for 2008.  One of the top ten threats to cybersecurity in 2008, according to this post, was:

    — Subversion of democratic political processes.

Interestingly enough, Electoral-Vote.com, a site maintained by Dr. Andrew Tanenbaum, Professor of Computer Science at the Vrige University in Amsterdam, reports (a link to a news story) that the US presidental election can be hacked.

This is not science fiction folks, it is simply the political and social realities of our brave new electronic world.