Clouding and Confusing the CEP Community

Ironically, our favorite software vendors have decided, in a nutshell, to redefine Dr. David Luckham’s definition of “event cloud” to match the lack-of-capabilities in their products.  

This is really funny, if you think about it. 

The definition of “event cloud” was coordinated over a long (over two year) period with the leading vendors in the event processing community and is based on the same concepts in David’s book, The Power of Events. 

But, since the stream-processing oriented vendors do not yet have the analytical capability to discover unknown causal relationship in contextually complex data sets, they have chosen to reduce and redefine the term “event cloud” to match their product’s lack-of-capability.  Why not simply admit they can only process a subdomain of the CEP space as defined by both Dr. Luckham and the CEP community-at-large? 

What’s the big deal?   Stream processing is a perfectly respectable profession!

David, along with the “event processing community,” defined the term “event cloud” as follows:

Event cloud: a partially ordered set of events (poset), either bounded or unbounded, where the partial orderings are imposed by the causal, timing and other relationships between the events.

Notes: Typically an event cloud is created by the events produced by one or more distributed systems. An event cloud may contain many event types, event streams and event channels. The difference between a cloud and a stream is that there is no event relationship that totally orders the events in a cloud. A stream is a cloud, but the converse is not necessarily true.

Note: CEP usually refers to event processing that assumes an event cloud as input, and thereby can make no assumptions about the arrival order of events.

Oddly enough, quite a few event processing vendors seem to have succeeded at confusing their customers, as evident in this post, Abstracting the CEP Engine, where a customer has seemingly been convinced by the (disinformational) marketing pitches  – “there are no clouds of events, only ordered streams.”

I think the problem is that folks are not comfortable with uncertainty and hidden causal relationships, so they give the standard “let’s run a calculation over a stream” example and state “that is all there is…” confusing the customers who know there is more to solving complex event processing problems.

So, let’s make this simple (we hope). referencing the invited keynote at DEBS 2007, Mythbusters: Event Stream Processing Versus Complex Event Processing.

In a nutshell…. (these examples are in the PDF above, BTW)

The set of market data from Citigroup (C) is an example of multiple “event streams.”

The set of all events that influence the NASDAQ is an “event cloud”.

Why?

Because a stream  of market data is a linear ordered set of data related by the timestamp of each transaction linked (relatively speaking) in context because it is Citigroup market data.    So, event processing software can process a stream of market data, perform a VWAP if they chose, and estimate a good time to enter and exit the market.  This is “good”.

However, the same software, at this point in time, cannot process many market data feeds in NASDAQ and provide a reasonable estimate of why the market moved a certain direction based on a statistical analysis of a large set of event data where the cause-and-effect features (in this case, relationships) are difficult to extract.  (BTW, this is generally called “feature extraction” in the scientific community.)

Why?

Because the current-state-of-the-art of stream-processing oriented event processing software do not perform the required backwards chaining to infer causality from large sets of data where causality is unknown, undiscovered and uncertain.

Forward chaining, continuous query, time series analytics across sliding time windows of streaming data can only perform a subset of the overall CEP domain as defined by Dr. Luckham et al.

It is really that simple.   Why cloud and confuse the community?

We like forward chaining using continuous queries and time series analysis across sliding time windows of streaming data. 

  • There is nothing dishonorable about forward chaining using continuous queries and time series analysis across sliding time windows of streaming data.   
  • There is nothing wrong with forward chaining using continuous queries and time series analysis across sliding time windows of streaming data. 
  • There is nothing embarrassing about forward chaining using continuous queries and time series analysis across sliding time windows of streaming data. 

Forward chaining using continuous queries and time series analysis across sliding time windows of streaming data is a subset of the CEP space, just like the definition above, repeated below:

The difference between a cloud and a stream is that there is no event relationship that totally orders the events in a cloud. A stream is a cloud, but the converse is not necessarily true.

It is really simple.   Why cloud a concept so simple and so accurate?

7 Responses to Clouding and Confusing the CEP Community

  1. peter lin says:

    Good post. On a technical note, it’s feasible to simulate backward chaining with forward chaining rule engines. The real downside is it has to be done manually and requires expertise in pattern matching. Using automatic subgoal generation like what Paul Haley describes is more powerful and flexible.

    Of the current CEP centric products, do any provide Truth Maintenance and removal of events? What I’m thinking of is the equivalent of retracting a fact from a Business Rule engine. I’ve only looked at current products a little, but it doesn’t appear the SQL based engines provide either TMS or retract.

    From my experience, providing TMS really requires support for retract. That in turn requires features like memory indexing similar to RETE for fast retract. Without an efficient design and implementation for TMS and retract, it will be difficult to simulate backward chaining in a forward chaining engine.

    Backward chaining engines often employ memory indexing to improve performance, which suggests we need a formal approach for fact management. In my bias opinion, a lot of work is needed to formalize temporal logic and fact management. I’m curious to hear how others are address these technical issues related to backward chaining, memory indexing and TMS.

    peter

  2. “But, since the stream-processing oriented vendors do not yet have the analytical capability to discover unknown causal relationship in contextually complex data sets”

    Not entirely true… Well at least for ruleCore, We have some simple capability to do one type of processing that fits this description.

    Our ruleCore CEP Server can discover which event source are emitting events which are part of the same transaction. This works simply by pairing together events which are produced by one entity and consumed by another. We look for a matching id in the events to keep track of which transaction they belong to.

    Not a solution for the complete problem but a small step in that direction…

  3. Greg Reemler says:

    Hi Marco,

    This level of inference is interesting; but as you seem to imply, it is not yet to the level of inference which was the main theme of the blog post.

    Hi Peter,

    As you mention, the SQL-based approaches do not currently have the features you highlight. However, I believe the TIBCO approach does (for the most part), since it based on RETE optimized by TIBCO’s development team for event processing. On the other hand, TIBCO’s approach does not (yet) perform backwards chaining – hence has generally been used for scheduling and orchestration-oriented event processing applications v. detection-oriented approaches.

    Thanks to both of you for visiting and for your much appreciated contributions to the CEP blog.

    Greg

  4. Peter Lin says:

    I asked Paul Vincent for details on the tibco blogs, but he wasn’t able to share any details. That is understandable. This is guess work on my part, but once a formal model of fact management is established, it will easier to build rule engines for CEP that handle both forward and backward chaining efficiently. Along the same lines, I think support for fuzzy logic is needed to handle the complex machine learning scenarios that Tim has mentioned in the past.

    Are there any groups out there working on a formal model of fact management? By formal model I mean this. A well defined approach, which is capable of calculating the temporal distance and life cycle of rules and facts in any set of rules. Those results are then used to manage the rules and facts at runtime to insure optimized performance and proper management of facts. The model should provide concrete algorithms, which can be implemented. I’ve looked at existing literature, but I think quite a bit of work is still needed.

    peter

  5. Perhaps you should put this post and the ensuing discussion on the CEP Forum?
    Its amazing that partially ordered sets of events still raises such confusion!

    One comment suggests that “order of arrival” is the only event relationship one should be interested in analyzing!

    And the idea that since the activity of a distributed system arrives at a rules engine in a one-at-a-time order, the event relationships in that activity can be represented by that order …!

    God knows how such people will deal with dynamically distributed rules engines in large systems. And that is an application of CEP that is happening (but the vendors aren’t doing it! The customers are.)

    I have no more time at the moment, but why not put this on the Forum?

  6. Tim Bass says:

    Dear Paul,

    Our apologies, your last comment was accidentally deleted.

    Please repost if you have a few spare moment. I reposted a cached version below, FYI.

    http://haleyai.com/wordpress/

    Yours sincerely, Tim

  7. Paul Haley says:

    paul@haleyAI.com Says:

    May 8, 2008 at 3:33 pm
    It is refreshing to be reminded that reality may not as synchronous and totally ordered as our simplifying assumptions and implementation techniques may prefer. It would be helpful to understand in more practical terms who deals with this asynchronicity well and what the tradeoffs are. For example, TIBCO and Oracle seem to be well-positioned for this but to lack the throughput / low latency required by front-office apps in the capital markets. What do you thought leaders think?

    You are clearly right on the critical leading edge of predictive analytics in the broader cotnext of process and decision management. Incremental machine learning algorithms are a critical need and open opportunity in this space. Can you identify any interesting approaches, visionaries or emerging leaders in this area?

Leave a reply to Peter Lin Cancel reply