Putting the magic in the machine since 1980.

Saturday, December 20, 2008

Multiagent Security

I was recently intervied by Mark Ingebretsen from IEEE Intelligent Systems about multiagent security, on which I did some work a few years ago. I don't know if my questions will appear on a future issue or not so I am posting my answers here.

Before I answer these questions I want to explain what I mean by a multiagent system and how I envision the future of security using multiagent systems.

Prototype Architecture When designing a multiagent system what we actually build is an interaction protocol which then dictates the agents' strategies. For example, I consider bittorrent to be the most successful multiagent system currently deployed. Bittorrent consists of an interaction protocol and a number of clients written anyone who wants to. Of course, the agents in bittorrent (the clients) are human/machine hybrids. The users get to make many decisions such as which files to upload and download, how long to seed a connection, how to limit bandwidth, etc. In a pure machine-only multiagent system all those decisions would be made by the software. I don't think we will want many machine-only multiagent systems, most will be hybrids.

In the case of a security multiagent system, I envision an open system where agents act like sensors in a distributed sensor network and publish or relay important information to each other. That is the first deployment step will be sensing. Once we get that working we can start thinking about having the agents act on that information: shutting down rogue PCs, dropping packets, filtering certain protocols, applying patches, etc. That is, the second deployment step will be acting. Of course, as we give the agents more and more capabilities we need to prepare for the possibility that some agents might start taking actions against the system.

Are multi-agent systems best used within a single organization's computer network or can they function as effectively if they reside on multiple connected networks? Similarly, should multi-agent systems be allowed to spread freely throughout the Internet (e.g. via voluntary downloads) or is it important that their propagation be strictly controlled?

The best way for multiagent security to be effective is by having one world-wide multiagent security protocol. Organizations would be free to choose to participate or not and there would be many different levels of participation. For example, at its simplest a company could offer a REST page with data on the security status of its internal network. The data on this page could be more detailed for a local connection than for outside connections, in case the owners are concerned about privacy. The data would be used by agents on each machine, whether local or remote, to detect and stop security threats. The same REST interface might then be extended to allow outside parties to make reports or requests. For example, an outside agent might ask another one to shut down a particular connection coming from its domain because it believes it to be a DoS attack.

Thus, I don't see agents “spreading” throughout the Internet. Instead, the protocol will be freely available and organizations will decide which parts of it to implement. Each organization must decide what information to make public, how to use information from others, and how to handle outside requests. The growth of the system is dictated by the individual desires of the participants.

Have multi-agent systems evolved to the extent that they can take collective action to actually halt a network threat? If the answer is yes, what sort of actions do they take? If no, is this a viable goal that we can expect to see implemented at some point.

I am unaware of any deployed systems that take autonomous action based on aggregate data, but there is no technical reason why these cannot exist. One problem obtaining the system administrators' trust. However, I do expect that as technology matures and research prototypes demonstrate their capabilities we will see more autonomous security systems.

What are the dangers and possible consequences that might occur if the agents were to misidentify a legitimate communication as a threat? Could the result be a serious slowing of network traffic?

That is exactly the type of problem one must keep in mind when designing an interaction protocol. The simplest way to minimize the threat of error is to minimize the agents' capabilities: if they can't do much then they can't do much damage. As we start giving them more capabilities, such as shutting down computers and networks, the threat of misuse becomes real. At this point we start looking at human management of the multiagent system. That is, the agents should present the system administrator with their case for why they want to perform a given action but only the administrator's password would allow the system to take the action. Note that this administrator only has control over his organization's agents.

Are there provisions built in allowing some sort of universal over-ride of the agents' collective actions? If so, who should have the authority to halt actions by the agents?

A universal override is a bad idea as it becomes a target for abuse. Notice that there is no universal override for the Internet or the web. I consider this a feature. In open multiagent systems we strive to distribute power, that is, to minimize the power of the most powerful agent in the system. In this way we also minimize the possibility of a catastrophe, either planned or accidental. A universal over-ride goes against these design guidelines.

Is there any danger that the agents themselves might be co-opted by a clever hacker and used to undermine a network?

Yes, individual agents can always be co-opted, that is the reality for every engineered product. But, a correctly designed protocol would have taken into account the possibility of rogue agents so their impact should be minimal. Also, a good system minimizes the power of the most powerful agent so a few compromised agents should not present too much of a problem.

Are the agents trained? If so, how? Through simulations of network activity incorporating known past threats? Or is it better to allow the agents to monitor actual network activity?

There is ongoing work on applying machine learning techniques to network activity in order to detect what is normal versus what is abnormal behavior: like an immune system. I believe that work shows a lot of promise, especially once we let these agents communicate with each other since they could then share local information in order to get a global view of an ongoing threat. For example, if an agent detects an abnormally high set of packets coming from another domain it could tell an agent on that domain, thereby possibly alerting him to a security problem within its network.

What are some of the new developments in this area that you see as particularly important?

The growth of semantic web technologies—semantic markup languages, inference engines, and web services—will greatly help speedup the adoption of open multiagent systems, such as the envisioned world-wide multiagent security protocol.

Thursday, December 4, 2008

Simple Programming Questions Reveal a Lot

This semester I am again teaching our Software Engineering Lab class. This is a senior-level class where we form groups of 4-5 students and develop a large software project. This semester I asked the students to write short programs on the whiteboard. I added this requirement because, after talking to our recent graduates and scouring the blogs, it is clear that nearly all employers are now using these questions to weed out possible candidates. It is not just Microsoft and Google anymore, it is also the small local insurance processing companies, etc. Everyone is asking programming questions, and I can see why.

FizzBuzz

I wanted to use really simple questions that real employers use, so I went to stackoverflow.com and asked them to give me some questions. They pointed out some simple popular questions like, among others, implementing the following:

  • int strstr(String a, String b): return the first location of string a within string b, or -1 if it does not appear.
  • String trim(String a): return a string that is like a but eliminates any spaces before and after a.
  • String strtok(Stream a): each time it is called it returns the next token in a, where a tokens are separated by spaces.
  • boolean isprime(int a): return true if a is a prime number, false otherwise.

I also came across the fizzbuzz question, repeatedly. The question is:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.
The author claims that more than half of the people he interviews for programming position are unable to write this program, other commenters report seeing similar numbers. I also used this question for some students.

Luckily, my students performed better than that. However, not everyone was able to tackle these programs with complete ease, as I would have wanted. I found that surprising, especially since we require large amounts of programming in our undergraduate curriculum. Note that I do not require a perfect bug-free implementation. For these questions a solid answer simply has to implement most of the logic needed to solve the problem, even if the final program breaks for some boundary cases.

On the other hand, I also noted that all the students know the basic programming constructs. They know about strings, arrays, for loops, etc. The difficulty was always in the transformation from the English description to code. That is, the problem lies in algorithm generation.

At this point some might look to studies, such as The camel has two humps and assume that there is some gene that gives some people the ability to program. I don't believe that is correct. Current research in psychology and neuroscience shows that the vast majority of abilities are learned, not inherited (what a surprise!). That is, I believe people who are good at programming become so because they practice. My hypothesis is that some students manage to take all the classes but end up doing little programming. This is easy to do since most programming experience comes from homework problems which are extremely easy to copy.

I remember that when I was an undergraduate a study revealed that about 37% of MIT's undergraduate students taking a certain introductory-level programming class had copied some of their homework programs. Copying homework programs, or doing no programming in a group project, is an unfortunately common occurrence even in the best schools.

Beside being an honor code violation, the problem with copying someone else's program is that it leads to a vicious cycle. A student that copies one program misses out on some practice so the next program is much harder for him to write, so he has even more incentive to copy the next one, and so on. This accumulates not just over one class but over the whole undergraduate experience: programming is a skill that takes years to build. Thus, it is possible that by graduation some students have had 10 times as much programming experience as others, which we know will make them 1000 times faster programmers.

To summarize, from this experiment I have derived a couple of conclusions. For students I recommend to

  1. Remember to practice the craft of programming. It is easy in a CSE degree, with all the high-level talk about hashing functions, AVL trees, NP-Completeness, multiple inheritance, etc., to forget to practice the basics. Every practising programmer will tell you that most of the programming they do is of the simple kind, the kind reflected by the simple questions above. Complex algorithms, theory, and language structures will help you become a great programmer, but first you have to master the basic craft: writing small programs.
  2. Don't fall into the trap of believing that because you can read and understand a program that you can write code. That is like me saying that I can write a good novel because I can read and understand novels. Yes, you have to be able to read programs, but you also need to practice writing them: turning an English description of the problem into code. I think some students justify copying someone else's code by thinking that it is enough to read and understand what others wrote. It is not.
  3. Practice writing small programs on the whiteboard, or on paper, especially if you are going to be looking for a job in the near future. It is a different experience from typing with Intellisense on.
  4. In an inteview, first make sure you understand the question by repeating it back to the interviewer. Write a few sample input-output pairs on the board. Then implement the simplest solution you can think of. If you are worried that the interviewer might be looking for a better answer simply check with him. Say, “Well, a brute-force way of doing that is yada yada yada.” He will either ask you implement that or ask you for a faster solution.
  5. Don't copy that program. It might take you all night to write it but next time you will be able to do it 10 times faster, and the next time after that you will be another 10 times faster. If at first it seems like it takes you for-fscking-ever to write a simple program it is because it does, and that is normal! Everyone is slow to start. We only get faster with practice. Oh, and that kid in your class that can write code at ninja speed, he started programming in middle school. Don't worry though, one can only get so fast with practice and after a handful of years you will reach that limit.

I also have a few suggestions for employers:

  1. Students are easily intimidated by the whiteboard because most have never used one. Some will be too nervous to perform. Pencil and paper might be a better way.
  2. It is important to give feedback and hints during this process. I've found that several students erroneously (probably, my fault) assumed the problem was a different and much harder one than the one I was actually asking them solve. Other students immediately tried to find a, sometimes non-existent, super clever solution instead of starting with a brute-force approach.
  3. Ask programming questions. It is an amazingly simple and quick way to determine if someone can program. It will save your company a lot of money. But, ask several different questions of each applicant.

In any case, I will continue to use whiteboard questions as they provide valuable experience for the students. I am also planning on using more of this "face to face" type of quizing in my other classes. It is a good and quick way of getting a lot of information on a person's technical capabilities.

PS. If you want more programming practice check out my Programming Questions blog for some more examples.