What did attendees think were the hot trends at WWW2007? I’ve got my own ideas, but of course, all the parallel sessions I went to were “self selected”. I saw a lot of discussion of security and privacy. And a lot of interest in parlaying the extensive personal information that is available, on the web or on the desktop, to produce higher quality and more targetted features and functions. And some overlap between those two topics.

My favorite quote so far:

If it’s visible, you fail.
        Bill Buxton

(Authored by Neel Sundaresan, eBay research lab, plenary speaker)

Over the past decade we have seen the web change in dramatic ways. Static html web pages are facing a stiff challenge from blogs and video clips. The notion of relations, networks, and reputation which were largely built on web page linkage have matured into community voting of sites and documents. Social network sites have become common and the notion of identity, trust, and reputation are commonly defined between users and the pages or objects they interact on. As eCommerce matures and blends with social networking search, classification, and recommender systems take new shapes. Social trust, reputation, and identity form key entities that help commerce thrive in the generation of the social web. Further, peer-to-peer networks provide for new platforms and challenges for search and social network structures. eBay, as one of the earliest known social commerce companies, provides a great context to study these concepts as applied to a marketplace. In my discussion I will touch upon many of the topics mentioned here as we have been studying them at the research labs.

(authored by Prabhakar Raghavan, Yahoo!, plenary speaker)

I’ve been at Yahoo! for close to two years, aiming to build up a world-class research organization and expand our strengths beyond computer science to areas such as microeconomics and sociology. We are charged with developing the sciences that will deliver the next generation of business to Yahoo!, while helping to shape the future of the Web.

Yahoo! is in a nascent market, one where most of the technical and market action is still to come. The challenges we face do not have ready-made solutions – there is no common notion of the “sciences underlying the Web?, or of the tools and techniques needed to address the grand challenges of our industry.

We must therefore ask: to develop the future of online interactive media, what sciences must we develop today? Do we identify and expand existing scientific disciplines, or do we try to build new ones that are not currently pursued at academic institutions? Are these disciplines centered on computer science, or should other disciplines be incorporated?

In my talk “Web N.0: What sciences will it take?? on May 10, I hope to develop answers to these questions; these sciences appear to be a blend of the old and the new, of computer science and of the social sciences.

(authored by David Huynh, MIT CSAIL)

For those who want to browse the WWW2007 papers by authors’ affiliations, tracks, categories, and more, we have made a faceted browser from the WWW2007 data generously given to us from the WWW2007 organizers. We have also scraped information on papers at WWW 2001-2006 from ACM.org and included them in our faceted browser.

Note that this site has been put together using the Exhibit lightweight structured
data publishing framework, which will be presented at WWW2007 itself.

Web History Events: 10 Year Anniversary

In 1997, the Web History Day and Exhibit was one of the most popular programs of the 6th International World Wide Web Conference (WWW6) in Santa Clara, California. This year, ten years after the original event, at the 16th Web Conference (WWW2007) in Banff, Canada, a reprise of this program will occur.  The conference will host multiple Web History-related events and a weeklong Web History Exhibit area for the benefit of conference attendees and where attending pioneers can donate historical materials and add recollections.

The 1997 event brought together many of the major pioneers of the early Web and hypermedia for the very first time, from Douglas Engelbart, Tim Berners-Lee, Brewster Kahle and Ted Nelson to authors of early browsers – Viola, Mosaic, Netscape, Cello, Internet Explorer, Midas and more (http://1997.webhistory.org/historyday). The hands on exhibit featured pioneering software and sites, from the first browser/editor running on its original NeXt cube to the White House site and HotWired. The program was co-organized by Web pioneer Kevin Hughes and Web historian Marc Weber with help from pioneer Jean-François Groff, at the invitation of conference organizer Bebo White.

At WWW2007, the Web History events will focus on the history of E-Commerce, with speakers from Marty Tenenbaum of CommerceNet to blogger Robert Scoble. Featured also will be a history of the conference series which began in 1994 at CERN. It will also bring attendees together with leaders of the museum and archiving communities, who are becoming increasingly convinced of the importance of collecting artifacts from the early days of the Web and documenting the historical evolution of Web technology. The preliminary event program can be seen at  http://www2007.org/webhistory.php.

The organizers of this year’s Web History Day include many of the same persons involved with the original program—Marc Weber, Bebo White, Kevin Hughes, and Jean-François Groff– with some additions.  In the past year Marc Weber and Bill Pickett have co-founded The Web History Center, www.webhistory.org, which has absorbed Weber and Hughes’s older Web History Project and adds key pioneers like Robert Cailliau and Marty Tenenbaum to its Advisory Board.

You are invited to look at the Web History Center Web site. The Center’s goal is to attract attention to the need to save records and memories of the origins and development of the Web, and to put individuals and organizations that have such materials in contact with archives and museums (twelve such institutions have joined as members of the Center) who are interested in preserving and making these materials available to researchers and educators.

Concurrently, the Center is creating a digital library that will ultimately become a definitive resource on the history of the Web. This library will allow anyone with an Internet connection (students, researchers, entrepreneurs) to view original documents, videos, and images.
The Center is working with Bebo White to bring together and find an archival home for the records of the International World Wide Web Conference Committee (IW3C2).  The history of the Web Conference series closely parallels the evolution of Web technology and the activities of the World Wide Web Consortium (W3C). White is a member of the Web History Center’s Advisory Board and serves as liaison with the IW3C2 and the Stanford Linear Accelerator Center (SLAC), home of the first Web site in the United States.  Anyone having materials relevant to any of the above categories are invited to contact White, Weber, or Pickett.

For the Web History event at WWW2007, we are especially interested in items from past conferences that can be included in the exhibit—the original sites, posters, T-shirts, pins, badges, printed collateral, and more. As a part of the conference program, the WHC is sponsoring a weeklong exhibit and hosting a light buffet reception on Tuesday night to open the event and introduce key speakers. Wednesday’s plenary address by Sir Tim Berners-Lee, co-organized by the WHC and WWW2007, will provide an opportunity for him to reflect on both past and future. Finally, all day Wednesday, the main Web History track will feature speakers on the history of E-Commerce, the conference series, and ways of preserving and making public the Web’s history—both past and ongoing. Please join us!

(authored by Jimmy Nilsson)

Even though I have a background from academia, I haven’t read too many research papers and especially not recently. That’s one reason why it was especially fun to be on the program committee for papers to WWW’2007. I read eight papers and learned a lot about new and interesting ideas. Afterwards I heard that only one of the papers I read was accepted. I was positive to most of my eight, but the level was obviously very high and only my very favorite made it. It’s called “Introduction and Evaluation of Martlet, a Scientific Workflow Language for Abstracted Parallelisation? by Daniel Goodman.

(authored by Peter F. Patel-Schneider, Program Co-Chair)

Now that the dust has mostly settled and my nerves have calmed down, I decided that I would write a short note about the WWW2007 paper submission process.

When running any conference there is always the worry that the number of papers will not be what was expected.  There is the possibility of a disaster – perhaps the research area is imploding, perhaps conference publicity went astray, perhaps something will go wrong with the submission process, etc., etc. – and too few papers are submitted.  (With all the problems with spam email a worry is that conference announcements will be caught by over-zealous spam filters.)
There is also the possibility of a success-disaster with so many papers submitted that the reviewing machinery and program committee is overloaded. 

To add to these general worries, WWW has a history of problems.  Last year the building housing the computers for the conference web site experienced a major fire just before submissions were due.  In previous years, the submission site had serious capacity problems.

For WWW2007 there was also a new submission process – the EasyChair system.  As well, the reviewing process for WWW2007 has essentially no slack in it so slipping the submission deadline, as has become quite common, was not an option.
With all these issues, I was rather nervous about the number of submissions for WWW2007.  To try to calm my nerves I planned on counting the number of submissions at various points.  In my previous experience with running conferences (admittedly a long time ago) the rule of thumb was that 1/3 of the submissions arrived the last day, 1/3 the day before, and 1/3 before that, so I had some expectations on how the numbers would go.

Unfortunately, the early “returns” were very low.  One week before the deadline there were only 55 submissions.  Two days before the deadline there were only 132 submissions.  By my rule of thumb this would mean about 400 total submissions – a rather large drop from the 716 submissions in 2006.  One day before the deadline there were only 252
submissions, indicating only about 380 total submissions.  I was now definitely beginning to worry.  Although the pace of submissions picked up during the last day, by the time I went to sleep about six hours before the deadline, there were only 522 submissions, and I was still quite nervous.

Of course, all my worries turned out to be unfounded.  A very late surge of submissions (253 submissions in the last six hours) resulted in 775 submissions to WWW2007, more than in any previous year, but not more than had been allowed for. 

In retrospect, I should have expected this late surge, as electronic submission allows for last-minute behaviour and researchers are notorious for not being early.  However, I instead expected that the history of problems with WWW would have made more authors more
conservative.  There were a couple of tracks that had to add a few extra PC members, but surprisingly little had to be done to react to the submissions.

Now if only the reviewing process works as well….

(invitation to the AIRWeb’2007 workshop, authored by ChaTo (Carlos Alberto-Alejandro CASTILLO-Ocaranza))

The AIRWeb workshop is now in its third edition. This workshop includes several topics related to Adversarial Information Retrieval on the Web, that is, how to search, rank, or classify documents if a fraction of the documents has been manipulated with a malicious intent. This includes search engine spam as well as comment spam, splogs, click fraud, and several other themes.

The dominant topic in past years has been search engine spam, an obnoxious problem that affects all major search engines, either by tricking them into showing irrelevant results for some queries, or simply by wasting a part of their network and storage resources. This year, the AIRWeb workshop will include a novel element: a reference collection of Web pages, in which over 3,000 hosts have been labeled by a team of volunteers as spam or non-spam.

The following is a partial view of the corpus (black nodes are spam, white nodes are non-spam):


The organizers of the Web Spam Challenge provide the graph, training labels, the contents for the pages, and a set of pre-computed feature vectors. The goal is to predict the label (non-spam or spam) for a test set of hosts for which labels are not given. For more information, check the challenge web site.

See you in Banff!

(invitation to the Query Log Analysis workshop, authored by Einat Amitay, IBM Research, Haifa, Israel)

The dilemma of whether to use or not to use the AOL query log data for research is described in detail in a NYTimes article: “Researchers Yearn to Use AOL Logs, but They Hesitate“. Search Engine companies no longer support independent academic research and have stopped sharing their data with graduate students and university professors. The hesitation and the data embargo are stopping research from being conducted, which in turn increases the gap between what is known to the public via published research and what is hidden behind corporate legalese.

We initiated this workshop thinking that WWW 2007 is the right place to open this issue and find a solution that will allow researchers to use query log data without the fear of being accused of a crime.

There are many ways in which we can help amend the situation. We can establish a research collection of query logs donated by consenting individual users. We can create a standard for accepting or rejecting log recording similar to the robots.txt solution. We can promote research for anonymization of logs. And we can help persuade the public that our intentions are good and that search engines live and die by their data.

We hope to have all sides represented in our workshop. Please come and join us!

WWW’2007 will be held at the Fairmont Banff Springs hotel, in Banff National Park.

Banff is famous for its spectacular setting in the Canadian Rocky Mountains. The town of Banff is a small community (population a bit over 7000). The center of town is a walking distance from the hotel, and there are quite a few very good restaurants and bars there, as well as numerous gift shops. A variaty of hikes are offered around Banff, some as short as an hour in the woods around the hotel – for those who wish to explore Banff during some of the breaks.

Ski season is November 10th to May 24th, so WWW’2007 comers may actually enjoy some end-of-the-season skiing.

Some of us already had a chance to visit Banff, in preparations for WWW’2007 or in other conferences.
Let’s use this post for sharing experiences and tips, through talkbacks or trackbacks.
One tip already received, is a suggestion to join the Fairmont Presidential Club prior to arriving, for some on-site benefits.

These two pictures are just samples of the vast amount of pictures taken by WWW’2007 people when visiting Banff:

A photo of Banff in snow, taken by Dr. Ivan Herman

Banff, taken by Dr. Ivan Herman
A photo of Fairmont Banff Springs hotel, taken by Michal Jacovi, 8.11.06

Fairmont Banff Springs hotel, taken by Michal Jacovi, 8.11.06

Please use talkbacks and trackbacks to this post for sharing your Banff experiences with the rest of us!