Chapter 1 Introduction --- Motivation and challenges
Malicious code is becoming increasingly common and can tremendously cause serious economic and military damages.
As the rapid development of network, malicious code is also able to propagate more quickly than ever before and in a much easier way than through a floppy disk. It can be introduced into a system through many different ways, such as by embedding itself in a downloadable executables, attaching itself in an email, actively breaking into some machines, and even some unknown ways.
virus detection
There are many ways to detect a virus, including checking the changes of file characteristics such as size or checksum, expert system analysis, monitoring system activity for suspicious behaviors, and scanning the signatures of viruses.
Checking the changes of file characteristics can not detect all the viruses since some viruses can hide themselves in a very clever way such that no suspicious changes of an infected file can be noticed. Expert system analysis or system monitoring usually bring a heavy overhead, which may not be suitable to some systems with low capability such as a small memory size.
Virus signature scanning is easy and effective. However, only the viruses known to the scanner can be detected. The number of viruses has been always growing and there is no reason to believe this trend won’t continue. Particularly as more and more computers are connected into the Internet, new viruses can be propagated through the network very quickly, which could be earlier before a virus scanner knows their signature.
Currently after a new virus is detected, what normally is done is to update the anti-virus software so that the new virus can be detected. This involves manual intervention and may cause mistake. Furthermore if a user or a system administrator can not be kept updated in a prompt manner, the update of the software can easily fall behind. The automatic notification mechanism provided by Revere addresses these problems.
The design and implementation of Revere should meet the following goals.
The security of networked machines is exactly the purpose of Revere. Revere provides the signatures of newly detected viruses, so that these machines can be protected against newly emerging viruses. It’s this goal that leads to the following goals such as scalability, heterogeneity, fast and robust dissemination, and so on.
The security of Revere itself is of vital importance because of its above purpose, which necessitates the following capabilities:
The security of a data message is not the secrecy of the contained information, because by allowing any machine to be eligible to join RBone, we already imply that any machine has the right to take a look at a data message. However, nobody is allowed to modify the contained information in a data message, block the transmission of a data message, or replay previous data messages. Thus, if a data message is corrupted or blocked, a Revere node must be able to detect the problems. < This paragraph overlaps with the security goal section >
The fast dissemination of a data message is important because virus signature is urgent information. The sooner that a machine knows the signature of a new virus, the less possibility that the machine may be infected by the virus.
The resiliency of a data message is necessary because (1) the Internet and RBone are both dynamic, for instance, a Revere node which is responsible for forwarding a Revere message may quit from RBone or crash; (2) Both the Internet and RBone are also maybe hostile environment, such that some machines may be corrupted. If a Revere node detects one of its incoming path is corrupted, it should be able to find another incoming path for data messages.
This multipath capability can be implemented in two ways: (1) Once a Revere node detects that a data message is corrupted or blocked, it immediately reacts to this by relying on another available Revere node to be his new designated node to send him a genuine version of the message; or (2) the virus signature dissemination mechanism is essentially n-way, that is, each node has more than one fan-ins and can potentially receive more than one copy of the same message declaring a new virus signature from different incoming paths, and only when all the incoming paths are broken can the Revere node be prevented from receiving genuine information of virus signature.
The adaptivity of data message dissemination is also necessary.
Revere needs to be able to adjust the RBone topology for dissemination purpose. For instance, if a new Revere node joins RBone, or, an existing Revere node quits from RBone, or a Revere node is believed to be corrupted or crashed, or a Revere node has stronger routing capability of Revere messages because of its geographical location or its platform and configuration, Revere should be able to tune the system to achieve best possible performance in terms of doing dissemination.
An optional goal for the data message dissemination is the ability of monitoring the progress of dissemination. The feedback information needs to be collected for this purpose so that the center can know the percentage of Revere node covered in a dissemination session, monitor the status of a Revere node, and adapt its dissemination strategies.
This problem is hard, because the large scale of RBone makes it impossible for any one single machine ( or dozens of machines ) to know the overall knowledge of the whole RBone. Moreover, in RBone only centers have authentication information available so that messages declared from them can be verified. No machine can have all the authentication information for all the regular Revere nodes thus can not verify all the feedback messages.
However, unlike virus signature messages which are all originally from virus signature centers, the initiator of a control information transmitted over RBone can be any Revere node in the RBone. How to prevent the replay of control information by malicious node is an important security issue of the Revere message transmission mechanism.
The resiliency and adaptivity of control message may have different mechanism from those of data message.
Moreover, depending on the purpose of different control messages, the speed requirement of some out-of-line control message may not have to be as fast as that of data message. For instance, there is normally no strict timing requirement on a "keep alive" message to a Revere node from some other nodes on RBone.
The control message needs to be light-weighted, too.
Thus, each Revere node only stores and manages incomplete knowledge of the whole Revere world. Based on the partial information, each Revere node should be able to cooperate with other Revere nodes to do the virus signature reception and dissemination in an both effective and secure way.
So does the virus signature dissemination centers. Each center needs to be able to disseminate virus signatures into RBone without the whole knowledge of Revere world.
It’s possible that as the number of Revere nodes grows, the percentage of nodes having received a new virus signature within a certain fixed period is less than before. But the design of Revere should minimize this reduction.
While we try to make Revere be able to run on top of all kinds of hardware and operating systems, a Revere version running on Windows platform is a must, since this platform is most popular platform and most viruses have been found in this platform.
Much less viruses in other platforms also imply much less caution against viruses, however. So, Revere shouldn’t limit its design within the scope of Windows platform.
Furthermore, different machines may have different security capabilities. Some machine may be protected by a firewall or an anti-virus software, some machines may have security library installed to facilitate a security implementation, while some may have nothing. Different machine may also have different security impacts. The impact of a broken firewall is different from that of an ordinary PC.
Different machine may also have different communication configuration and capabilities. A machine connected to Internet through a Modem has much less communication capability than a machine on a Ethernet, which has much less communication capability than that of an machine on Internet backbone.
Thus, Revere needs to incorporate the features of its platform and configuration. For example, the authentication of Revere messages on a firewall machine can be configured to be stronger than that of a leaf machine ( a machine doesn’t forward Revere data message) ; or, a machine with broadcast or multicast capability may use different mechanism to send out control message or forward data messages than a machine only having a point to point connection.
In many cases, some hard problems Revere must solve can be simplified merely by spending more cycles computing them. For example, substantial resources can be devoted to intrusion detection and key distribution. The success of Revere depends on deciding wisely and carefully when to use extra system resources to solve a problem. Generally, we must design all Revere algorithm to work as efficiently as they possibly can, and must seek to make use of otherwise wasted resources on Revere nodes, such as CPU cycles during idle periods.
RBone Properties in terms of dissemination
By checking the signature of a received message about virus and/or other information, a Revere node can tell whether the message is corrupted or not (such as a replay), and then ask a retransmission. However, since there is only one single path, the retransmission can not be guaranteed to succeed next time.
If a Revere node can not hear genuine Revere data messages about viruses, the node can not get Revere service, which is exactly what Revere wants to do.
So, a single-path dissemination mechanism over RBone is subject to the denial of service attack. And Revere needs to be multipath when doing message dissemination over RBone.
A multipath mechanism can provide more than one incoming paths to a Revere node, so that if one path is corrupted, the node can resort to another path.
They are also not disseminated frequently, and only when a new virus is found that a dissemination session will be started.
This also makes multipath dissemination of viruses fairly acceptable. Though multipath dissemination seems to use more bandwidth as a security cost, the light-weight of the messages does not make this cost a burden.
At the same time, the virus propagates at the level of human time, such as several minutes. A virus is normally activated by specific user behaviors. The detection of a new virus and the determination of its remedy also take time at human level. So, while it’s certainly great to be able to disseminate the virus information as quickly as possible, Revere can still be successful if it can disseminate the information quickly enough at human time level.
The latency metrics for a Revere session can be the time for a data message to reach all Revere nodes. However, considering that at any time there maybe some disconnected nodes, the time to reach 100% Revere nodes may be impossible. The metrics can also be the time to reach certain amount or some fraction of Revere nodes. To be more realtime, the metrics can also be the percentage of covered Revere nodes by the Revere message of a specific session.
Each session has a session ID. A session ID consists of the identification of the virus dissemination center which starts the session, and a sequence number. The sequence number is generated by the center.
No matter how RBone is organized, it needs to achieve the goals discussed in chapter 2.
< More
discussion needed here>
RBone Autonomy
Some administrative domain may enforce their own Revere policies. For example, they want to double check a report of a new virus signature and probably sign the messages containing the virus information before the machines in the domain can hear the messages or act upon the received information. A firewall can be this kind of place enforcing local Revere policies.
A multipath mechanism can have two different styles. One
is to have many paths at the same time. By doing so, a Revere node can
receive more than one copies of a data message of the same session. Another
alternative of multipath is to let each Revere node be able to dynamically
locate a new path only if an existing path is detected corrupted. To be
more accurate, this latter method is potentially multipath, but still single
path if taking a snapshot.
Choking point
Because of the randomness of Internet physical topology, given a virus dissemination center and a Revere node, there can be some choking points that any message from the center to the node must pass through. Sometimes, no matter wherever a center is located, any messages from outside may have to pass through a choking point.
If a LAN only has one router, this router is a choking point for any nodes in the LAN as long as the message is from outside. For example, a firewall is a choking point to all nodes protected by it. A choking point can also be faraway from a Revere node. For instance, if all Revere messages originated from a center have to pass another machine, typically a router, this machine is a choking point of any Revere messages initiated by this center.
These choking points are known ones. There can also be unknown choking points, which can be temporary for routing reasons. If special handling is needed, the detection of these unknown points may need to be done first.
The above choking points are unavoidable because of the physical topology. We can call them hard choking point. Sometimes a soft choking point may be formed because of the multipath algorithm of Revere which results in two or more paths sharing a single point, i.e. a soft choking point. The design of Revere should try to avoid the formation of soft choking points.
Choking point is of vital importance in terms of security. Even if there are more than one paths to reach a Revere node from a center, they all have a common point unfortunately. If a choking point is broken, then every channel to a Revere node is broken.
A choking point can either be a Revere node or not. But anyway, a corrupted choking point should not cause serious impact on the rest of the world, except the domain downstream.
IP multicasting can cause a message to reach many receivers from a sender just once, instead of "unicasting" them one by one. This can save both time and bandwidth. Revere has similar requests in terms of disseminating virus information from a center to many Revere nodes, particularly when pushing. The problem is that IP multicasting works by having a sender-based tree or a shared tree, over which the messages are routed. So, there is only one single path from a sender to a receiver. To implement a multipath mechanism for Revere, simply multicasting is not satisfactory. Another problem of multicasting is that the multicasting functionality is still not universal over Internet, although multicast deployment has been expanding, such as MBone.
Reliable multicasting focuses on the reliability of multicasting in terms of message loss or error. It still can not provide security needed by Revere such as the resiliency by multipath if the reliable multicasting is still single path.
Also, no matter whether multicasting is used or not, a disconnected machine needs to pull missed messages anyway.
The active element in a packet, normally a piece of code, can either be executable or be a pointer to a routine resided in an active node. An active node can either be a Revere node to receive Revere service, or just a router being able to following Revere instructions embedded in a Revere message. When a Revere center disseminates some information about a virus which even has some particular dissemination requests, the center can similarly code the packets containing the virus information to satisfy the specific requests.
On the other hand, Revere also has the potential to protect active networks from being infected by viruses embedded in an active packet.
Worse than that, a virus dissemination center may also fail or be detached from the network.
So, since Revere is open to every machine to join and
any joined machine has the right to view the data messages, there is no
secrecy of the virus information contained in data messages.
* PULLING
In a point to point communication when a receiver wants to pull a message from a sender, a random number can be firstly sent from the receiver to the sender, and then the sender can include the random number in the message, so that the receiver can check whether a received message is a replay or not by comparing the random number.
A pulled message may be sent from a center, a neighboring node, or some other Revere nodes. Differentiating a pulled message from a replay is a challenging issue.
If assigning each data message a sequence number, and each Revere node keeps a record of the most recent sequence number, then, upon receiving a data message, the node can check the sequence number of the message ( given the signature has been verified).
It’s possible that the sequence number of a received data message is not continuous, for example, when a data message of a following session arrives before that of a current session.
It’s also possible that a received data message with correct sequence number is a replayed message. But this does not matter as long as the signature can be verified. In this situation, such a replay message can only help our Revere dissemination session, although the later true message is judged as a repetition and discarded.
Secure multicasting (in SRI )
Replay prevention?
The difficulty of the security of the control messages is that the messages can be from anywhere.
The Revere intrusion detection facility running on a Revere node will work by monitoring Revere activities of the Revere node itself and its neighbors, the inbound and outbound traffic, and some other behavior patterns. Intrusion detection can help prevent and report the flooding by illegal control messages ( those messages such as joining Revere or a pull request ) and help to pinpoint which machine is causing trouble.
The multipath mechanism of Revere also provides a good chance for doing intrusion detection. A Revere node having more than one incoming paths expects to receive more than one copies of a Revere message of a session. For instance, if the number of received copies is less than expected, Revere can know that something is wrong with those incoming paths from which no Revere messages can get in. Certainly we need to make sure here that the messages reporting problems are authentic or at least in some degree they are authentic.
Multiple-signatures of a Revere data message may be needed, depending on the impact if a Revere node responds according to the message. If the probability that a center is corrupted is a, the probability that all m centers are corrupted is a exp m, which is much less then a , if these m centers are independent to each other and have same probability (i.e. a) of being corrupted.
In case of key compromise, the revocation of the compromised public key is an issue. It’s not easy to make a revocation message faster than that of a Revere message with a compromised key.
Multiple signature may be used to overcome this problem. So only when all signatures are not authentic is the message corrupted. But this may slow down the dissemination speed since each message has to be signed more than once independently.