Putting the magic in the machine since 1980.

Saturday, December 20, 2008

Multiagent Security

I was recently intervied by Mark Ingebretsen from IEEE Intelligent Systems about multiagent security, on which I did some work a few years ago. I don't know if my questions will appear on a future issue or not so I am posting my answers here.

Before I answer these questions I want to explain what I mean by a multiagent system and how I envision the future of security using multiagent systems.

Prototype Architecture When designing a multiagent system what we actually build is an interaction protocol which then dictates the agents' strategies. For example, I consider bittorrent to be the most successful multiagent system currently deployed. Bittorrent consists of an interaction protocol and a number of clients written anyone who wants to. Of course, the agents in bittorrent (the clients) are human/machine hybrids. The users get to make many decisions such as which files to upload and download, how long to seed a connection, how to limit bandwidth, etc. In a pure machine-only multiagent system all those decisions would be made by the software. I don't think we will want many machine-only multiagent systems, most will be hybrids.

In the case of a security multiagent system, I envision an open system where agents act like sensors in a distributed sensor network and publish or relay important information to each other. That is the first deployment step will be sensing. Once we get that working we can start thinking about having the agents act on that information: shutting down rogue PCs, dropping packets, filtering certain protocols, applying patches, etc. That is, the second deployment step will be acting. Of course, as we give the agents more and more capabilities we need to prepare for the possibility that some agents might start taking actions against the system.

Are multi-agent systems best used within a single organization's computer network or can they function as effectively if they reside on multiple connected networks? Similarly, should multi-agent systems be allowed to spread freely throughout the Internet (e.g. via voluntary downloads) or is it important that their propagation be strictly controlled?

The best way for multiagent security to be effective is by having one world-wide multiagent security protocol. Organizations would be free to choose to participate or not and there would be many different levels of participation. For example, at its simplest a company could offer a REST page with data on the security status of its internal network. The data on this page could be more detailed for a local connection than for outside connections, in case the owners are concerned about privacy. The data would be used by agents on each machine, whether local or remote, to detect and stop security threats. The same REST interface might then be extended to allow outside parties to make reports or requests. For example, an outside agent might ask another one to shut down a particular connection coming from its domain because it believes it to be a DoS attack.

Thus, I don't see agents “spreading” throughout the Internet. Instead, the protocol will be freely available and organizations will decide which parts of it to implement. Each organization must decide what information to make public, how to use information from others, and how to handle outside requests. The growth of the system is dictated by the individual desires of the participants.

Have multi-agent systems evolved to the extent that they can take collective action to actually halt a network threat? If the answer is yes, what sort of actions do they take? If no, is this a viable goal that we can expect to see implemented at some point.

I am unaware of any deployed systems that take autonomous action based on aggregate data, but there is no technical reason why these cannot exist. One problem obtaining the system administrators' trust. However, I do expect that as technology matures and research prototypes demonstrate their capabilities we will see more autonomous security systems.

What are the dangers and possible consequences that might occur if the agents were to misidentify a legitimate communication as a threat? Could the result be a serious slowing of network traffic?

That is exactly the type of problem one must keep in mind when designing an interaction protocol. The simplest way to minimize the threat of error is to minimize the agents' capabilities: if they can't do much then they can't do much damage. As we start giving them more capabilities, such as shutting down computers and networks, the threat of misuse becomes real. At this point we start looking at human management of the multiagent system. That is, the agents should present the system administrator with their case for why they want to perform a given action but only the administrator's password would allow the system to take the action. Note that this administrator only has control over his organization's agents.

Are there provisions built in allowing some sort of universal over-ride of the agents' collective actions? If so, who should have the authority to halt actions by the agents?

A universal override is a bad idea as it becomes a target for abuse. Notice that there is no universal override for the Internet or the web. I consider this a feature. In open multiagent systems we strive to distribute power, that is, to minimize the power of the most powerful agent in the system. In this way we also minimize the possibility of a catastrophe, either planned or accidental. A universal over-ride goes against these design guidelines.

Is there any danger that the agents themselves might be co-opted by a clever hacker and used to undermine a network?

Yes, individual agents can always be co-opted, that is the reality for every engineered product. But, a correctly designed protocol would have taken into account the possibility of rogue agents so their impact should be minimal. Also, a good system minimizes the power of the most powerful agent so a few compromised agents should not present too much of a problem.

Are the agents trained? If so, how? Through simulations of network activity incorporating known past threats? Or is it better to allow the agents to monitor actual network activity?

There is ongoing work on applying machine learning techniques to network activity in order to detect what is normal versus what is abnormal behavior: like an immune system. I believe that work shows a lot of promise, especially once we let these agents communicate with each other since they could then share local information in order to get a global view of an ongoing threat. For example, if an agent detects an abnormally high set of packets coming from another domain it could tell an agent on that domain, thereby possibly alerting him to a security problem within its network.

What are some of the new developments in this area that you see as particularly important?

The growth of semantic web technologies—semantic markup languages, inference engines, and web services—will greatly help speedup the adoption of open multiagent systems, such as the envisioned world-wide multiagent security protocol.

Thursday, December 4, 2008

Simple Programming Questions Reveal a Lot

This semester I am again teaching our Software Engineering Lab class. This is a senior-level class where we form groups of 4-5 students and develop a large software project. This semester I asked the students to write short programs on the whiteboard. I added this requirement because, after talking to our recent graduates and scouring the blogs, it is clear that nearly all employers are now using these questions to weed out possible candidates. It is not just Microsoft and Google anymore, it is also the small local insurance processing companies, etc. Everyone is asking programming questions, and I can see why.

FizzBuzz

I wanted to use really simple questions that real employers use, so I went to stackoverflow.com and asked them to give me some questions. They pointed out some simple popular questions like, among others, implementing the following:

  • int strstr(String a, String b): return the first location of string a within string b, or -1 if it does not appear.
  • String trim(String a): return a string that is like a but eliminates any spaces before and after a.
  • String strtok(Stream a): each time it is called it returns the next token in a, where a tokens are separated by spaces.
  • boolean isprime(int a): return true if a is a prime number, false otherwise.

I also came across the fizzbuzz question, repeatedly. The question is:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.
The author claims that more than half of the people he interviews for programming position are unable to write this program, other commenters report seeing similar numbers. I also used this question for some students.

Luckily, my students performed better than that. However, not everyone was able to tackle these programs with complete ease, as I would have wanted. I found that surprising, especially since we require large amounts of programming in our undergraduate curriculum. Note that I do not require a perfect bug-free implementation. For these questions a solid answer simply has to implement most of the logic needed to solve the problem, even if the final program breaks for some boundary cases.

On the other hand, I also noted that all the students know the basic programming constructs. They know about strings, arrays, for loops, etc. The difficulty was always in the transformation from the English description to code. That is, the problem lies in algorithm generation.

At this point some might look to studies, such as The camel has two humps and assume that there is some gene that gives some people the ability to program. I don't believe that is correct. Current research in psychology and neuroscience shows that the vast majority of abilities are learned, not inherited (what a surprise!). That is, I believe people who are good at programming become so because they practice. My hypothesis is that some students manage to take all the classes but end up doing little programming. This is easy to do since most programming experience comes from homework problems which are extremely easy to copy.

I remember that when I was an undergraduate a study revealed that about 37% of MIT's undergraduate students taking a certain introductory-level programming class had copied some of their homework programs. Copying homework programs, or doing no programming in a group project, is an unfortunately common occurrence even in the best schools.

Beside being an honor code violation, the problem with copying someone else's program is that it leads to a vicious cycle. A student that copies one program misses out on some practice so the next program is much harder for him to write, so he has even more incentive to copy the next one, and so on. This accumulates not just over one class but over the whole undergraduate experience: programming is a skill that takes years to build. Thus, it is possible that by graduation some students have had 10 times as much programming experience as others, which we know will make them 1000 times faster programmers.

To summarize, from this experiment I have derived a couple of conclusions. For students I recommend to

  1. Remember to practice the craft of programming. It is easy in a CSE degree, with all the high-level talk about hashing functions, AVL trees, NP-Completeness, multiple inheritance, etc., to forget to practice the basics. Every practising programmer will tell you that most of the programming they do is of the simple kind, the kind reflected by the simple questions above. Complex algorithms, theory, and language structures will help you become a great programmer, but first you have to master the basic craft: writing small programs.
  2. Don't fall into the trap of believing that because you can read and understand a program that you can write code. That is like me saying that I can write a good novel because I can read and understand novels. Yes, you have to be able to read programs, but you also need to practice writing them: turning an English description of the problem into code. I think some students justify copying someone else's code by thinking that it is enough to read and understand what others wrote. It is not.
  3. Practice writing small programs on the whiteboard, or on paper, especially if you are going to be looking for a job in the near future. It is a different experience from typing with Intellisense on.
  4. In an inteview, first make sure you understand the question by repeating it back to the interviewer. Write a few sample input-output pairs on the board. Then implement the simplest solution you can think of. If you are worried that the interviewer might be looking for a better answer simply check with him. Say, “Well, a brute-force way of doing that is yada yada yada.” He will either ask you implement that or ask you for a faster solution.
  5. Don't copy that program. It might take you all night to write it but next time you will be able to do it 10 times faster, and the next time after that you will be another 10 times faster. If at first it seems like it takes you for-fscking-ever to write a simple program it is because it does, and that is normal! Everyone is slow to start. We only get faster with practice. Oh, and that kid in your class that can write code at ninja speed, he started programming in middle school. Don't worry though, one can only get so fast with practice and after a handful of years you will reach that limit.

I also have a few suggestions for employers:

  1. Students are easily intimidated by the whiteboard because most have never used one. Some will be too nervous to perform. Pencil and paper might be a better way.
  2. It is important to give feedback and hints during this process. I've found that several students erroneously (probably, my fault) assumed the problem was a different and much harder one than the one I was actually asking them solve. Other students immediately tried to find a, sometimes non-existent, super clever solution instead of starting with a brute-force approach.
  3. Ask programming questions. It is an amazingly simple and quick way to determine if someone can program. It will save your company a lot of money. But, ask several different questions of each applicant.

In any case, I will continue to use whiteboard questions as they provide valuable experience for the students. I am also planning on using more of this "face to face" type of quizing in my other classes. It is a good and quick way of getting a lot of information on a person's technical capabilities.

PS. If you want more programming practice check out my Programming Questions blog for some more examples.

Thursday, November 13, 2008

CSCE 242 Sign Up Now, Please!

As of this writing exactly two students have signed up for my CSCE 242: Client Server Computing class. With so few students signed up we will have to cancel the class, and I will be bummed.

I find it strange that there is so little interest when nearly all programming jobs nowadays require at least some web development, whether it is using a REST API, developing a web front-end, a mobile web front-end, or actually building a complete web application. Plus, javascript is a boatload of fun! Where else can you modify the inheritance hierarchy at runtime?

Maybe someone can comment and explain this lack of interest to me?

Wednesday, November 12, 2008

Mobile posting

I am posting this from my iphone using an app that connects to blogger using their REST API. One service in the cloud, multiple frontends: I think we shall see this a lot more often. If you want to learn how to build these services , sign up for 242!

Wednesday, August 27, 2008

Master and Undergraduate Projects

I have posted a list of MS thesis and undergraduate projects that I am interested in working on. If you are an undergraduate or graduate student interested in doing some research please check them out.

Saturday, August 9, 2008

NSF Award for Negotiation Networks Research

I am honored to have just received NSF award 0829580 to continue my research into negotiation networks. The total award is for $234K over the next 3 years. This research is under the Theoretical Foundations program, and the Scientific Foundations of the Internet's Next Generation (SING) sub-topic.

We hope that our research will lay the scientific foundations for the Internet's next generation of protocols and deployment strategies. The project summary of theproposal explains our goals:

We propose to develop automated negotiation protocols for autonomous agents in negotiation networks, which we define as a type of bargaining problem where every agent must negotiate with a fixed set of other agents in order to arrive at a unique deal. Negotiation networks are, in effect, a natural re-formulation of a winner determination in combinatorial auctions problem as a negotiation problem. They thereby distribute the costly winner determination computation among the agents and avoid the need for unnecessary exchange of currency or trusting of third parties.

Intellectual merit: Since the problem of negotiation networks combines features of characteristic form games and bargaining games from game theory, combinatorial auctions and distributed mechanism design from Economics, network exchange theory from Sociology, and distributed algorithm design from computer science, its solution should be a significant contribution to all these fields. While several parts of the proposed problem have been studied in the various disciplines, we will bring these disparate parts within one framework and provide solutions for this very important problem. Our approach is fresh in that we focus on algorithmic solutions for the dynamic, distributed problem of resource allocation among self-interested parties, in contrast with game theory and Economics' solutions which are typically steady-state axiomatic solutions. Success in this research is potentially transformative as it will provide the foundation for the engineering of protocols and efficient algorithms for distributed resource allocation, which is a pervasive problem in our ever-growing highly-interconnected society.

Broader impacts: The negotiation networks problem is not only significant from a research perspective, it also has some very immediate applications. One such application is the multiagent enactment of workflows for the growing SOAP and REST-based (Web Services) service Internet where complex tasks are dynamically and automatically bid for and allocated among arbitrarily large number of agents. Such systems would revolutionize just-in-time manufacturing and purchasing practices by automating not only the paperwork but also the allocation of resources and contract negotiation. Another near-term application lies in the development of incentive-compatible routing mechanism for the new Internet which would provide the proper incentives for companies to deploy more bandwidth and to make their existing bandwidth available to others without having to worry about freeloaders. Further down the road, negotiation networks could even eliminate the need for predefined workflows and lead to much more efficient and dynamic allocation of resources in our economy. These adaptive supply chains would be resilient to local failures and attacks as they dynamically re-negotiate deals in response to environmental changes. In fact, with slight modifications these distributed resource allocation algorithms could be used to coordinate first-responders in an emergency situation, to program distributed environmental sensor networks, and to instruct automated robotic swarms.

Wednesday, April 30, 2008

Marriage Problem

In the marriage problem we have an equal number of men and women who we want to pair up (it is presumed they are all heterosexual). Each one has an ordered list of their prefered mates. The goal is to find the set of marriages such that no two people would rather get divorced and marry each other. That is, not two prefer each other over their assigned mates.

This problem can be solved with a simple algorithm where all the men propose to the women. Then, each woman with more than one proposal rejects all but her most preferred which she temporarily accepts. The bachelors then repeat the process by asking their most favorite woman who has not rejected them. The process continues until all women have a partner. I have a NetLogo implementation of the marriage problem that illustrates this process.

It is interesting to notice that while the women are the ones that get to turn down the men, the men do overwhelmingly better than the women. I mentioned this to my wife and she said that is why she proposed to me. Of course, in the end it is always the women who propose to us.

I am interested in finding distributed algorithms that arrive at stable solutions, that is, those where no two people would rather be married to each other than to their partner. Here is a link to a set of slides that goes into more detail on the marriage problem, in case you want to learn a bit more.

Monday, April 28, 2008

Trading Houses

Imagine that all students are assigned dorm rooms in some ad-hoc way. After each one has a room, two students might find that they both prefer the room the other one has, thus they could switch rooms and both be happier. Similarly, three students (A,B, and C) might find that A prefer's B's room, B prefers C's room, and C prefers A's room. These three students might also trade rooms in a cycle. This can go on for even larger cycles. But, how do we find these cycles?

A very simple algorithm, proposed back in the seventies, is to have each agent point to its most preferred choice. If the resulting graph has any loops (it will) then all the agents in the loops exchange rooms and drop out of the game. The remaining agents then point to their most preferred room and we repeat the process until there are no more agents left.

I have implemented a simple demonstration of this top trading cycle algorithm using a simple distributed cycle detection protocol. Its fun to watch!

Thursday, April 24, 2008

My Programming Languages History

As faculty members in Computer Science and Engineering we often discuss the pros and cons of languages and which ones we should teach. The Tiobe software index shows us how the popularity of certain languages ebbs and flows. I think it is clear that it does not really matter which specific language you learn first or second, what matters is that you learn how to think clearly. And that you learn how to learn new languages.

As a demonstration, I thought I'd take a trip back memory lane and list the languages I learned (and forgotten) while still in school:

  1. BASIC - Freshman year in high school I took a summer class on Basic programming using the old Tandy PET (I also had the Atari 2600 basic programming cartridge, but it kinda sucked). Next year, my new Apple IIe had basic built in. I should also mention that back then it was common for high schools to teach Basic programming, Power Point and Excel did not yet exist.
  2. 6052 Assembly language - Of course, I wanted to write games and the only way to get any kind of animation in the Apple IIe was to progran in assembly.
  3. APL - I only worked with this extremely strange language for two weeks.
  4. Pascal - Somehow I got a hold of a copy of a Pascal compiler for the Apple IIe. I was surprised to learn that you could write programs without line numbers and GOTO/jmp statements.
  5. Scheme - I used it in my freshman year at college. Scheme is the simplest language I have seen. It is beautiful.
  6. CLU - In college we had to do our software projects using CLU. It is a pre-cursor to object-oriented languages.
  7. Emacs Lisp - I learned it for a summer job. This rss feed will be turned into HTML automatically by an Emacs Lisp function I wrote.
  8. C - Learned it for an OS class in graduate school.
  9. Lisp - thesis work.
  10. Tcl/tk - thesis work.
  11. C++ - thesis work.
  12. Java - I wanted to write some applets, for fun.
Those are just the major programming languages I encountered while still a student (pre 1998), I also learned bits and pieces of countless scripting languages (awk, sed, bash) or special purpose languages (latex, um-prs, sql). The point is, my experience is not uncommon. A good computer scientist or software engineer will learn at least one new language every year or so. After a while, one notices how they are all not that different but how each one teaches us something about the way we think, the way we solve problems. Writing software is about how we think, and how we translate these thoughts to meet the capabilities of the machine at our fingertips.

Thus, there is no need to get too hung up on which programming language you should learn first. If you choose software as a career, you will likely learn over 100 languages over your lifetime. I can only imagine what we will be using 10 years from today!

If you also have fun writing programs then maybe you would like to try to solve some of my programming questions.

Saturday, February 9, 2008

Fast Bidding in a Distributed Combinatorial Auction

Walmart needs stuff moved from point A to B, for many A's and Bs. Also, these deliveries have other possible requirements: one might need a refigerated truck, one might be a night drop off, one might be a rush order, etc. Similarly, trucking companies have complex requirements about where and when they can deliver. Put these millions of orders together and you have a complex resource allocation problem. If you are willing to have everyone send their requirements to a centralized auctioneer then, maybe, you can solve this problem.

However, what it you don't want to trust, or can't afford to pay a centralized auctioneer? We are studying automated incentive-compabile negotiation protocols for distributed resource allocations. I know this is a mouthfull but all it is saying is that we are developing algorithms that automated agents can use to negotiate with each other and solve these type of problems. Think of it as moving all buy/sell transaction to the web (we are nearly there) and then using software to decide who to buy what from and at what price. We can show that this will lead to more efficient solutions, that is, everybody wins.

Anyway, our latest paper detailing our efforts is
  • Benito Mendoza and José M. Vidal. Approximate Bidding Algorithms for a Distributed Combinatorial Auction (Short Paper). Padgham and Parkes and Müller and Parsons ed.In Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems, May; 2008.
    Distributed allocation and multiagent coordination problems can be solved through combinatorial auctions (CAs). However, most of the existing winner determination algorithms (WDAs) for CAs are centralized. The PAUSE auction is one of a few efforts to release the auctioneer from having to do all the work. The pausebid bidding algorithm generates myopically-optimal bids for agents in a PAUSE auction but its running time is exponential on the number of bids. We present new approximate bidding algorithms that not only run in linear time but also increase the utility of the bidders as result of small decrement in revenue.

Check it out, approximation mechanism let ut achieve 1000-fold speedups at only a small cost in utility.

Tuesday, January 8, 2008

A NetLogo Introduction

I will be speaking tomorrow Thursday January 10 6:30pm at the Columbia Linux User's Group about NetLogo. It will be an informal introduction. I will try to show why it is a great language for both learning to program and for building simulations that can be used to engage students in active learning of other subjects: such as physics, chemistry, economics, sociology, etc. The meeting is open to all.