<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:evnet="http://www.mscommunities.com/rssmodule/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"><channel><title>Entries tagged with podcasts - Channel 10</title><atom:link rel="self" type="application/rss+xml" href="http://www.on10.net/tags/podcasts/feed/ipod/default.aspx" /><itunes:summary>podcasts</itunes:summary><itunes:author>Sampy, Larry, allenjs, Mossyblog, Michael Lehman, dshadle, krobi, sarahintampa, Grace Francisco, Erik, Laura, Adam, kleneway, Jeff, Tina, Duncan, MaxPowerhouse7</itunes:author><image><url>http://mschnlnine.vo.llnwd.net/d1/Dev/App_Themes/Channel10/images/feedimage.png</url><title>Entries tagged with podcasts - Channel 10</title><link>http://on10.net/tags/podcasts/</link></image><itunes:image href="http://mschnlnine.vo.llnwd.net/d1/Dev/App_Themes/Channel10/images/feedimage.png" /><itunes:category text="Technology" /><description>podcasts</description><link>http://on10.net/tags/podcasts/</link><language>en-us</language><pubDate>Fri, 26 Sep 2008 12:00:37 GMT</pubDate><lastBuildDate>Fri, 26 Sep 2008 12:00:37 GMT</lastBuildDate><generator>EvNet (EvNet, Version=1.0.3143.743, Culture=neutral, PublicKeyToken=null)</generator><item><title>Scott Prevost explains Powerset's hybrid approach to semantic search</title><description>&lt;p&gt;
Scott Prevost is General Manager and Director of Product for Powerset, the company whose semantic search engine was recently acquired by Microsoft. In this interview he describes the history of Powerset's natural language engine, and explains how it works as part of a hybrid approach to indexing, retrieval, and ranking.
&lt;/p&gt;
&lt;p&gt;
Scott will expand on these topics in his keynote address at &lt;a href="http://www.web3event.com/"&gt;Web 3.0&lt;/a&gt; in October.
&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/prevost/prevost.gif" /&gt;
&lt;div&gt;
&lt;b&gt;Scott Prevost&lt;/b&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: The notion of search enhanced by natural language understanding has a long history. I was just reading Danny Sullivan's rant about how he's been hearing about this for years, but it's never amounted to anything.&lt;/p&gt;

&lt;p&gt;Of course, people are all over the map on this topic, but nonetheless you guys are doing certain demonstrable things, and working on other things. So I'd like to find out more about how the technology -- which was acquired from Xerox, where it had been worked on for a long time -- actually works. What you mean by natural language understanding, how you're applying the technology, and where this is going.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Well, there are a lot of questions tucked in there, but maybe we can start with what we licensed from PARC, what was formerly Xerox PARC. They had been working for 30 years in a linguistic framework called LFG -- &lt;a href="http://www.powerset.com/explore/semhtml/Lexical_functional_grammar"&gt;lexical functional grammar&lt;/a&gt; -- and they built a very robust parser. It's probably parsed more sentences than any other parser in the world. &lt;/p&gt;

&lt;p&gt;What it allows us to do is take apart every document that we index, sentence by sentence, uncover its linguistic structure, and then translate that into a semantic representation we can encode in our index.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Can you confirm or deny something that Danny Sullivan reported, which is that it takes on the order of two months to index Wikipedia one time using this method?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: [laughs] That's a very, very old number. It all depends on the number of machines, but we do it now on the order of a couple of days.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: And it scales linearly?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yes. And in fact we're working really hard to bring those numbers down. We have a very small data center right now. We're looking at what it takes to stand up a 2 billion document index, and it's absolutely attainable.&lt;/p&gt;

&lt;p&gt;I think Danny Sullivan realized, when he wrote another article on the day we launched, that we're doing something different. He called us an understanding engine. It's not the case that we're just applying linguistic technology at runtime, by parsing the query and then trying to use the same old kind of keyword index for retrieval. We're actually doing the heavy lifting at index time.&lt;/p&gt;

&lt;p&gt;We're actually reading each sentence in the corpus, pulling out semantic representations, indexing those semantic representations, and then at query time we try to match the meaning of the query to the meaning of what's in the document. That allows us to both increase precision and improve recall.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: When you say semantic representation, what it means -- or anyway what's evident in the current version -- is subject/verb/object triples, basically. That seems to be how things are organized.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: That's one small part of what the engine does. It's the part we've exposed in the user interface in a very direct way. But actually those are only three of several dozen semantic roles that we uncover at index time, and all of those roles go into selecting documents, and snippets of documents, when we present the organic results. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Really? So even though the patterns aren't exposed in the advanced query interface, they're still used?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: That's right, they're still used.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What would be an example of one of those other patterns, and how it's applied?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: So, you ask a question like: "When did Hurricane Katrina strike?" The 'when' is a certain kind of semantic role that we've indexed, separately from the subject, verb, and object. There are a number of other roles like that: location, time, other types of relationships.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: I saw a private demo, about a year ago, in which one of the most striking examples was something like: "Companies acquired by IBM between 1996 and 2003". At that point, I think the light bulb goes on in people's heads about what this could really be.&lt;/p&gt;

&lt;p&gt;That class of query isn't exposed yet, but it's an example of what's possible, right?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Absolutely. That's exactly the direction we're moving in. Initially most of the work we've done has been on the index side. Now we're starting to catch up on the query side, which allows us to complete the loop and do queries like that.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: The other piece that's visible on the website, in addition to the Wikipedia stuff, is the Freebase material that you've recently integated. That's an interesting case because there you can pull semantics directly from Freebase. So this becomes more of a query-time interface to something which is already structured and understandable.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yeah, that's right. Freebase is kind of like Wikipedia, except it's all structured data. Unlike with our core technology, which turns unstructured data into structured data, with Freebase we just go directly to the structured data. But it uses the same linguistic technology on the front end to parse the query, which then gets mapped into a Freebase database call.&lt;/p&gt;

&lt;p&gt;But by using linguistic technology to parse the query, we're able to match very flexible ways of saying things. We don't have to imagine every possible way someone might ask for a particular piece of information. The linguistic engine takes care of a lot of that for us.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: That's why I can type in something like "Barack Obama's book" and get back the answer &lt;i&gt;Dreams From My Father&lt;/i&gt; directly from Freebase.&lt;/p&gt;

&lt;p&gt;So, what was the intent of including Freebase along with Wikipedia. What are you trying to show there?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: That the linguistic technology can be used with both structured and unstructured data. Freebase just has a lot of really great information.&lt;/p&gt;

&lt;p&gt;One of the things about a natural language front end is that it encourages people to ask questions and expect answers. With the Freebase database, it's pretty easy to provide direct answers right at the top of the search results page, which users find to be a nice experience.&lt;/p&gt;

&lt;p&gt;Of course you have to be very high-precision, so we've tuned the Freebase stuff for precision rather than recall.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Tell me about the natural language landscape: the variety of approaches that exist, the style that you're using, how that compares to others, how all this fits into the history of the technology.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: The technology goes back a long way, three decades or so. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Longer, actually.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yeah, really since the beginnings of AI people have been trying to use computers to understand and generate language. There have been a number of different approaches: purely symbolic approaches, statistical approaches. We really use a hybrid. &lt;/p&gt;

&lt;p&gt;The Xerox technology uses a particular grammatical formalism, and we do use symbolic approaches to our semantic rules. But we also then put these semantic features into our index, and use machine learning and statistical approaches to retrieve and rank results. &lt;/p&gt;

&lt;p&gt;It really is a combination. We try not to be religious about these things, but just use best of breed, and choose the right tools for the jobs we're doing. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: One of the things that Peter Norvig at Google is always saying is that the real secret to their success is vast quantities of data, and that in the end you don't really need AI, you just need lots and lots of data and the ability to crunch through it.&lt;/p&gt;

&lt;p&gt;I assume you would argue that the natural language techniques are also helpful, and that as the quantity of data in your possession grows, the power that it brings to the table will also grow.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yeah. One thing we try to do with the natural language technology is give a leg up to the statistical and machine learning approaches. If you look at a search engine that just uses keywords, the information you have about the page is pretty slim. &lt;/p&gt;

&lt;p&gt;We're trying to capture more information about each page that we index, which enhances our ability to retrieve and rank. For example, it allows us to retrieve documents where there are no keyword matches, but there's a good meaning match.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: For example?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: So, consider a query like: "What politicians were killed by disease?" Powerset will retrieve documents that don't include the words 'disease' or 'politician' or 'kill', but that are about particular politicians who died from particular diseases.  &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Is the process of mapping generic terms to specific instances a hybrid of human editorial effort and statistical techniques?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yeah. We use things like &lt;a href="http://wordnet.princeton.edu/"&gt;WordNet&lt;/a&gt;, which is a giant dictionary or thesaurus of the English language that shows how various word senses relate to each other. We use that with some editing on our own. We also use machine learning techniques to figure out some word relationships, and which are most helpful in retrieval and ranking. So it really is a combination. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: When did you start this work?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: The company was founded three years ago, and I joined two years ago. But of course the work at PARC goes back 30 years.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: You obviously have an academic background in this field.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yeah, I have a Ph.D. in computation linguistics, as do probably about twenty other people at Powerset. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What's your take on how this engine will start to surface through the various Microsoft online properties?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: The two areas where we can make a big impact are, first of all, improving core relevance, which is an absolute must for every search engine. And then also the user experience. Some of the technology -- and you start to see it in the Wikipedia search engine that we put out -- some of it really allows us to do different things in the presentation of these results. Thing that can save the user time, by getting the answer right on the search results page. &lt;/p&gt;

&lt;p&gt;Our goal is to continue to work on improving relevance, and we've shown that by using these semantic features we can drive large relevance improvements, but there's still a lot of work to be done there.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: In that case, the improvements would be under the covers, the person using Live Search wouldn't know that you were contributing to the relevance of the result.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: That's correct. Another way it can happen is by creating a different quality of snippet or caption, things that highlight the parts that match the query instead of just bolding the keywords. Actually highlight the answer right there on the search results page, so you don't have to click through to determine if it's the right page.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: There's a related area called entity extraction, and there's been a lot of action there. For example there's a company called ClearForest, recently acquired by Reuters, which has put a lot of work into entity extraction. What's the story on that front?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: A lot of companies are working on this, we have our own in-house effort for name recognition and entity recognition, and this is of course really helpful as a kind of light semantic layer. But for us, it becomes deeper because we can start to relate all kinds of entities to one another, based on where we've seen them, and also with the help of things like Freebase. &lt;/p&gt;

&lt;p&gt;To follow up on how you'll see the impact in things like Live Search, beyond the improvement in relevance and in the quality of snippet, I think you'll see features like related searches, other ways of presenting information similar to the Factz that are shown in our Wikipedia product, I think you'll see a lot more work on the instant answers, with a database that extends beyond Freebase.&lt;/p&gt;

&lt;p&gt;Without committing to particular deliverables, these are the kinds of things I think you can expect to see. And you'll also continue to see growth on powerset.com, where we can be a bit more daring in terms of ways of presenting search results.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Well thanks for your time. This has been interesting, and I'll be fascinated to see how things unfold over the next few years. I've got a feeling you'll have access to a pile of resources to work with...&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yeah, we're really excited about it. As a startup, it's hard to build a full-scale web search engine. Having the resources available, and the really smart people at Live Search, is just a tremendous boost to us. &lt;/p&gt;&lt;img src="http://on10.net/23618/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://on10.net/blogs/jonudell/Scott-Prevost-explains-Powersets-hybrid-approach-to-semantic-search/</comments><itunes:summary>
Scott Prevost is General Manager and Director of Product for Powerset, the company whose semantic search engine was recently acquired by Microsoft. In this interview he describes the history of Powerset's natural language engine, and explains how it works as part of a hybrid approach to indexing, retrieval, and ranking.


Scott will expand on these topics in his keynote address at Web 3.0 in October.






Scott Prevost




JU: The notion of search enhanced by natural language understanding has a long history. I was just reading Danny Sullivan's rant about how he's been hearing about this for years, but it's never amounted to anything.

Of course, people are all over the map on this topic, but nonetheless you guys are doing certain demonstrable things, and working on other things. So I'd like to find out more about how the technology -- which was acquired from Xerox, where it had been worked on for a long time -- actually works. What you mean by natural language understanding, how you're applying the technology, and where this is going.

SP: Well, there are a lot of questions tucked in there, but maybe we can start with what we licensed from PARC, what was formerly Xerox PARC. They had been working for 30 years in a linguistic framework called LFG -- lexical functional grammar -- and they built a very robust parser. It's probably parsed more sentences than any other parser in the world. 

What it allows us to do is take apart every document that we index, sentence by sentence, uncover its linguistic structure, and then translate that into a semantic representation we can encode in our index.

JU: Can you confirm or deny something that Danny Sullivan reported, which is that it takes on the order of two months to index Wikipedia one time using this method?

SP: [laughs] That's a very, very old number. It all depends on the number of machines, but we do it now on the order of a couple of days.

JU: And it scales linearly?

SP: Yes. And in fact we're working really hard to bring those numbers down. We have a very small data center right now. We're looking at what it takes to stand up a 2 billion document index, and it's absolutely attainable.

I think Danny Sullivan realized, when he wrote another article on the day we launched, that we're doing something different. He called us an understanding engine. It's not the case that we're just applying linguistic technology at runtime, by parsing the query and then trying to use the same old kind of keyword index for retrieval. We're actually doing the heavy lifting at index time.

We're actually reading each sentence in the corpus, pulling out semantic representations, indexing those semantic representations, and then at query time we try to match the meaning of the query to the meaning of what's in the document. That allows us to both increase precision and improve recall.

JU: When you say semantic representation, what it means -- or anyway what's evident in the current version -- is subject/verb/object triples, basically. That seems to be how things are organized.

SP: That's one small part of what the engine does. It's the part we've exposed in the user interface in a very direct way. But actually those are only three of several dozen semantic roles that we uncover at index time, and all of those roles go into selecting documents, and snippets of documents, when we present the organic results. 

JU: Really? So even though the patterns aren't exposed in the advanced query interface, they're still used?

SP: That's right, they're still used.

JU: What would be an example of one of those other patterns, and how it's applied?

SP: So, you ask a question like: "When did Hurricane Katrina strike?" The 'when' is a certain kind of semantic role that we've indexed, separately from the subject, verb, and object. There are a number of other roles like that: location, time, other types of relationships.

JU: I saw a private demo, about a year ago, in which one of the most striking examples was something like: "Companies acquired by IBM between 1996 and 2003". At that point, I think the light bulb goes on in people's heads about what this could really be.

That class of query isn't exposed yet, but it's an example of what's possible, right?

SP: Absolutely. That's exactly the direction we're moving in. Initially most of the work we've done has been on the index side. Now we're starting to catch up on the query side, which allows us to complete the loop and do queries like that.

JU: The other piece that's visible on the website, in addition to the Wikipedia stuff, is the Freebase material that you've recently integated. That's an interesting case because there you can pull semantics directly from Freebase. So this becomes more of a query-time interface to something which is already structured and understandable.

SP: Yeah, that's right. Freebase is kind of like Wikipedia, except it's all structured data. Unlike with our core technology, which turns unstructured data into structured data, with Freebase we just go directly to the structured data. But it uses the same linguistic technology on the front end to parse the query, which then gets mapped into a Freebase database call.

But by using linguistic technology to parse the query, we're able to match very flexible ways of saying things. We don't have to imagine every possible way someone might ask for a particular piece of information. The linguistic engine takes care of a lot of that for us.

JU: That's why I can type in something like "Barack Obama's book" and get back the answer Dreams From My Father directly from Freebase.

So, what was the intent of including Freebase along with Wikipedia. What are you trying to show there?

SP: That the linguistic technology can be used with both structured and unstructured data. Freebase just has a lot of really great information.

One of the things about a natural language front end is that it encourages people to ask questions and expect answers. With the Freebase database, it's pretty easy to provide direct answers right at the top of the search results page, which users find to be a nice experience.

Of course you have to be very high-precision, so we've tuned the Freebase stuff for precision rather than recall.

JU: Tell me about the natural language landscape: the variety of approaches that exist, the style that you're using, how that compares to others, how all this fits into the history of the technology.

SP: The technology goes back a long way, three decades or so. 

JU: Longer, actually.

SP: Yeah, really since the beginnings of AI people have been trying to use computers to understand and generate language. There have been a number of different approaches: purely symbolic approaches, statistical approaches. We really use a hybrid. 

The Xerox technology uses a particular grammatical formalism, and we do use symbolic approaches to our semantic rules. But we also then put these semantic features into our index, and use machine learning and statistical approaches to retrieve and rank results. 

It really is a combination. We try not to be religious about these things, but just use best of breed, and choose the right tools for the jobs we're doing. 

JU: One of the things that Peter Norvig at Google is always saying is that the real secret to their success is vast quantities of data, and that in the end you don't really need AI, you just need lots and lots of data and the ability to crunch through it.

I assume you would argue that the natural language techniques are also helpful, and that as the quantity of data in your possession grows, the power that it brings to the table will also grow.

SP: Yeah. One thing we try to do with the natural language technology is give a leg up to the statistical and machine learning approaches. If you look at a search engine that just uses keywords, the information you have about the page is pretty slim. 

We're trying to capture more information about each page that we index, which enhances our ability to retrieve and rank. For example, it allows us to retrieve documents where there are no keyword matches, but there's a good meaning match.

JU: For example?

SP: So, consider a query like: "What politicians were killed by disease?" Powerset will retrieve documents that don't include the words 'disease' or 'politician' or 'kill', but that are about particular politicians who died from particular diseases.  

JU: Is the process of mapping generic terms to specific instances a hybrid of human editorial effort and statistical techniques?

SP: Yeah. We use things like WordNet, which is a giant dictionary or thesaurus of the English language that shows how various word senses relate to each other. We use that with some editing on our own. We also use machine learning techniques to figure out some word relationships, and which are most helpful in retrieval and ranking. So it really is a combination. 

JU: When did you start this work?

SP: The company was founded three years ago, and I joined two years ago. But of course the work at PARC goes back 30 years.

JU: You obviously have an academic background in this field.

SP: Yeah, I have a Ph.D. in computation linguistics, as do probably about twenty other people at Powerset. 

JU: What's your take on how this engine will start to surface through the various Microsoft online properties?

SP: The two areas where we can make a big impact are, first of all, improving core relevance, which is an absolute must for every search engine. And then also the user experience. Some of the technology -- and you start to see it in the Wikipedia search engine that we put out -- some of it really allows us to do different things in the presentation of these results. Thing that can save the user time, by getting the answer right on the search results page. 

Our goal is to continue to work on improving relevance, and we've shown that by using these semantic features we can drive large relevance improvements, but there's still a lot of work to be done there.

JU: In that case, the improvements would be under the covers, the person using Live Search wouldn't know that you were contributing to the relevance of the result.

SP: That's correct. Another way it can happen is by creating a different quality of snippet or caption, things that highlight the parts that match the query instead of just bolding the keywords. Actually highlight the answer right there on the search results page, so you don't have to click through to determine if it's the right page.

JU: There's a related area called entity extraction, and there's been a lot of action there. For example there's a company called ClearForest, recently acquired by Reuters, which has put a lot of work into entity extraction. What's the story on that front?

SP: A lot of companies are working on this, we have our own in-house effort for name recognition and entity recognition, and this is of course really helpful as a kind of light semantic layer. But for us, it becomes deeper because we can start to relate all kinds of entities to one another, based on where we've seen them, and also with the help of things like Freebase. 

To follow up on how you'll see the impact in things like Live Search, beyond the improvement in relevance and in the quality of snippet, I think you'll see features like related searches, other ways of presenting information similar to the Factz that are shown in our Wikipedia product, I think you'll see a lot more work on the instant answers, with a database that extends beyond Freebase.

Without committing to particular deliverables, these are the kinds of things I think you can expect to see. And you'll also continue to see growth on powerset.com, where we can be a bit more daring in terms of ways of presenting search results.

JU: Well thanks for your time. This has been interesting, and I'll be fascinated to see how things unfold over the next few years. I've got a feeling you'll have access to a pile of resources to work with...

SP: Yeah, we're really excited about it. As a startup, it's hard to build a full-scale web search engine. Having the resources available, and the really smart people at Live Search, is just a tremendous boost to us. </itunes:summary><link>http://on10.net/blogs/jonudell/Scott-Prevost-explains-Powersets-hybrid-approach-to-semantic-search/</link><pubDate>Fri, 26 Sep 2008 09:12:00 GMT</pubDate><guid isPermaLink="true">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/prevost/prevost.mp3</guid><evnet:views>962</evnet:views><evnet:viewtrackingurl>http://on10.net/23618/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;
Scott Prevost is General Manager and Director of Product for Powerset, the company whose semantic search engine was recently acquired by Microsoft. In this interview he describes the history of Powerset's natural language engine, and explains how it works as part of a hybrid approach to indexing, retrieval, and ranking.
&lt;/p&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/prevost/prevost.mp3" expression="full" duration="930" fileSize="7638144" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/prevost/prevost.wma" expression="full" duration="930" fileSize="7727573" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/prevost/prevost.mp3" length="7638144" type="audio/mp3" /><dc:creator>JonUdell</dc:creator><itunes:author>JonUdell</itunes:author><slash:comments>0</slash:comments><wfw:commentRss>http:/on10.net/blogs/jonudell/Scott-Prevost-explains-Powersets-hybrid-approach-to-semantic-search/RSS/</wfw:commentRss><trackback:ping>http://on10.net/23618/Trackback.aspx</trackback:ping><category>podcasts</category><category>powerset</category><category>search</category><category>semantic</category></item><item><title>Roger Barga on Trident, a workbench for scientific workflow</title><description>&lt;p&gt;
Roger Barga, a principal architect with Microsoft's Technical Computing Initiative, is leading the development of Trident, a "workflow workbench" for science. In its first incarnation, the tool will enable oceanographers to automate the management and analysis of vast quantities of data produced by the &lt;a href="http://en.wikipedia.org/wiki/NEPTUNE"&gt;Neptune sensor array&lt;/a&gt;. But as Roger explains in this interview, it's not just about oceanography. Every science is becoming data-intensive. Trident's graphical workflow authoring, reusable data transforms, and support for provenance -- the ability to reliably track and reproduce all the analytic steps leading to a scientific result -- is being used by astronomers too, and is expected to find its way into many other disciplines as well.
&lt;/p&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;
            &lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/barga/barga.jpg" /&gt;
            &lt;div&gt;
            &lt;strong&gt;Roger Barga&lt;/strong&gt;
            &lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; We're here to talk about the &lt;a href="http://www.microsoft.com/mscorp/tc/trident.mspx"&gt;Trident&lt;/a&gt;, the scientific workflow workbench for oceanography. Give us the 50,000-foot overview, then we'll zoom in.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Scientists are increasingly dealing with large volumes of data coming from disparate sources. The process used to be manageable. You'd get post-docs to convert the raw data from the instruments into readable formats, there was a manual workflow to process the data into useful data products. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Those were the good old days. Or maybe not so good.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Right. Because the time to get from raw data to those useful products was often measured in weeks or months. But now our ability to capture data has outpaced our ability to process and visualize it. And its rising exponentially with the rapid deployment of cheap sensors.&lt;/p&gt;
&lt;p&gt;The oceanographic project we're working on, Neptune, is just one example of this. Astronomy, and all other sciences, are experiencing the same trend.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Neptune is a University of Washington oceanographic project ...&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; ... it's actually an NSF project. The proper name is &lt;a href="http://www.joiscience.org/ocean_observing/initiative"&gt;Ocean Observatories Initiative&lt;/a&gt;, and it's being funded for several hundred million dollars. The University of Washington is one of the partners. Monterey Bay Aquarium Research Institute and a number of coastal observatories as well are involved.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; So fiberoptic cables are being laid, and lots of oceanographic data will be pouring in.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Exactly. It's transformed oceanography from a data-poor discipline to a data-rich one. They're going to be able to monitor the oceans 24x7 over long periods of time. So the kinds of processes they can study were never within reach before. They could collect data when there was an episodic event, or when they could get funding. Now they'll be collecting permanently.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What's the scope of the sensor network?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; They're laying the trench in Monterey to test and deploy the sensors. NSF is reviewing the larger program, and getting ready to fund the Neptune array which will be off the coast of Washington and Oregon. The Canadian version of the Neptune array is up and running and collecting data, but the software infrastructure is still being built as we speak.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What quantities of data is the Canadian array producing?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Gigabytes per day. It can easily handle a couple of high-def video streams coming from the ocean floor.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Really?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Yes. And also in-situ devices that can sequence organisms. It really is like not only taking Internet and power out to the ocean, but also a USB bus that instruments can be plugged into.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What are some of the experiments that become possible with this setup?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; For example, being able to understand sediment flows across the ocean floor, how temperature and salinity change, how fresh water flows in from rivers, what kind of life exists at those margins. And understanding that interesting narrow band where life thrives in the ocean. Too high up and the tides affect it, too low and there's not enough light. But really, there are a myriad of things like that. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; So an experiment, in this data-intensive new world, involves formulating a hypothesis, looking for patterns in previously-collected data, and then seeing whether data collected in the future supports the hypothesis. &lt;/p&gt;
&lt;p&gt;That means you not only need to run an analysis on data, but that you have to be able to repeat that analysis on an evolving body of data. Hence the need for the workflow automation that you're providing in the workbench.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Yes. Another aspect is the need to calibrate and tune the models. If they can do that based on long-term monitoring, it'll remove a lot of the uncertainty in our understanding of the oceans. Versus now, where the data are so sparse that it's hard to validate the model.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; I guess also that as your understanding of the data and the models evolves, you might want to rethink what data you're capturing and how you're interpreting it. So, what is it that you've built with Trident, and how does it help you do those things?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Jim Gray was the first person who had the vision of an oceanographer's workbench. His insight was that scientists really want to interact with visualizations of the ocean, but there was a huge gap between the raw data and those visualizations. &lt;/p&gt;
&lt;p&gt;Managing information and managing data is one of Microsoft's core strengths. In &lt;a href="http://research.microsoft.com/erp/"&gt;External Research&lt;/a&gt;, we look for partnership opportunities where can bring our technology, learn from applying it to data-intensive stress tests that involve even more data than our commercial products currently handle, and figure out how to use or extend our technology to provide a solution.&lt;/p&gt;
&lt;p&gt;Jim pointed out that workflow was one of the key missing ingredients. We looked at the in-house tools, and Windows Workflow was the engine of choice...&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; ...although it didn't exist at the time Jim floated this idea, right?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Well, yes, it was around in alpha and beta form internally. Jim knew I was doing some of my research using Windows Workflow. Of course he left the solution up to us, but he accurately identified workflow as being a way that the scientist could not only manage the data transformations that were needed, but also create a library of solutions that could be shared and reused.&lt;/p&gt;
&lt;p&gt;If you look at how Microsoft works as a company, we build platforms and then we expect ISVs to come in and bridge the gap between the platforms and the user communities. That's the role our group has played. We're looking at the requirements of the scientists, we're looking at the platform Microsoft provides, and we're building on that platform to provide a custom solution to the scientists that will not only accelerate their work, but change how they do science -- enable them to ask and answer questions they couldn't before.&lt;/p&gt;
&lt;p&gt;We partnered initially with the University of Washington and Monterey Bay Aquarium Research Institute, or MBARI. They're already gathering data from sensors, so they could describe the spectrum of data we'd have to ingest into our workflows. The University of Washington has a visualization tool called &lt;a href="http://www.cs.washington.edu/homes/keithg/oceans.html"&gt;COVE&lt;/a&gt;, which scientists are adopting as the preferred way to look at the ocean floor. You can think of it as Virtual Earth for the ocean. If there's bathymetry data, you can pull it in and se the ocean floor. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What kinds of data transformations are needed to get from the sensor outputs to COVE's inputs?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; There are probably about two dozen kinds of data sources we need to be able to ingest, based on the instruments and the types of data they put out. Typically it's streaming data in &lt;a href="http://www.unidata.ucar.edu/software/netcdf/"&gt;NetCDF format&lt;/a&gt;, or some other common format. So the first step is to recognize what kind of data format an instrument or model is kicking out, and transform it into an internal structure that our tool can use.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; But the workflow engine is abstracted from the instrumentation data formats and from the visualization tools, right? It's a mechanism for reproducibly running transformations, and managing that pipeline.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Right. But let's start with how we interacted with the scientists. Jim Gray would ask scientists: "What are the top 20 questions you want to ask, and queries you want to run?" From that, he'd get an understanding of how they viewed the data, and what kind of processing was required.&lt;/p&gt;
&lt;p&gt;We took the same approach, and asked the scientists which top 20 workflows they perform and which top 20 visualizations they like to see. Then we went through them from top to bottom, talking about the transforms and data integration that were required. We wound up with a set of two dozen transformations that were common across all of these workflows. That became the library of activities -- reusable chunks of code -- that the scientists could call upon to author not only these 20 workflows, but the next 20.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Can you give a couple of examples?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Sure. Regridding. You have two data sets, one's from a model and the other's from a set of deployed sensors out in the ocean. They're on different grid coordinate systems and you need to be able to bring those two together. That may require some interpolation, you might need to drop or add data points, transform coordinates, join data sets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; There might be a temporal variant of the spatial gridding as well, to align different time scales? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Right. Some instruments are getting things every second, some are getting them every 15 minutes. You can ask the user: "Do you want interpolation to take place? Do you want the system to  match up the points?" Based on these inputs, the correct workflow gets configured and they see the resulting visualization for the region of ocean they're interested in.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; It sounds like some of these primitives will wind up being fairly general, not just specific to oceanography.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Indeed they are. We're producing a version of Trident for oceanography, but many of these activities could be useful for other sciences as well. People in earth sciences, for example, are also using NetCDF and many of the same operations.&lt;/p&gt;
&lt;p&gt;We expect that by building a tool which is extensible, and agnostic in terms of the science it supports, you can imagine it being used, for example, to understand the interaction between oceans and warm air currents. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What does the Trident user see and do?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; We realized that the authoring experience for scientific workflow is very different from, say, business workflow. In business, you'd have your accountant write your expense report workflow. They'd lock it down, they'd deploy it, everybody would use it from then on, and nobody would touch it until it came back for bug fixes or enhancements. &lt;/p&gt;
&lt;p&gt;What we found with scientists is that they want to borrow somebody's workflow that does what they want, or close to it, load that workflow, and then start authoring from that point on. &lt;/p&gt;
&lt;p&gt;So we implemented that in Trident. You can search for workflows by purpose, or by the inputs they process. You click on one, and load it into a visual browser because while the oceanographers understand the workflows, they don't want to see C# or Java, they want to see something visual -- boxes that represent the transformations they want to apply. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; We've mentioned the Windows Workflow Foundation. For folks who aren't familiar with that system, how would you characterize it? How is it like and unlike a script execution engine?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; What's unique about workflow, versus scripting, is that with workflow you tease apart the notion of a schedule, which is the sequence of actions you'd like to have performed. If you were to look inside of each of those steps, you'd see code similar to what you'd find in a script. But on top of the sequence of steps you have an orchestration engine. When you pass this workflow -- this sequence of steps -- over to the orchestration engine, it runs the code inside each of the boxes, but as each one completes, control passes to the orchestration engine. &lt;/p&gt;
&lt;p&gt;So we have an abstraction layer, we've opened up the opportunity for reuse, the steps or activities become building blocks. In addition, the orchestration engine can monitor the execution of the workflow, or change the way it executes -- for example, by running blocks in parallel on a multicore machine. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What struck me about the Workflow Foundation was the way in which workflows can be very big or very small. As small as the sequence of interactions with a form on a web page, in which case the orchestration engine can be embedded entirely in the code that's behind that web page. &lt;/p&gt;
&lt;p&gt;Or it can be a very big thing. But in any case, since it's part of the .NET Framework, it can exist in a variety of places. It can run locally on a laptop, it can run on a server in the cloud. There's an interesting amount of flexibility in terms of how workflows can be deployed. An application could embed Trident, or Trident could be used as a service.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; That's right. That's the magic of it. Yes, it could be hosted in an environment that the scientist is already familiar with. Or for a big institution, you could post it up as a service. Anybody could access it from a browser. And that's part of our mantra here. If we provide this to the scientists, we have to make sure it works with the tools they're comfortable using. You should be able to point your Linux box running Firefox at this tool.&lt;/p&gt;
&lt;p&gt;But to your other point, we're experimenting here with workflows that are resource-seeking. You could launch one, perhaps even on your cellphone, and that scheduling engine's going to look for systems that have resources for that workflow, tap into them, and give the user on the cellphone the impression it's running locally. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; You've mentioned that the workflow style encourages a level of modularity that you might not otherwise get. It also provides a level of monitoring, control, and auditing. The reason that's important goes back to the idea of reproducibility. &lt;/p&gt;
&lt;p&gt;A friend of mine is an HPC expert, and one of his pet peeves is that when people look at HPC they tend to focus on how much raw horsepower can be thrown at a problem. His question is: "Who's worrying about reproducibility and correctness?" It's a really important question. &lt;/p&gt;
&lt;p&gt;In your environment, as I understand it, one of the things that you get is the ability to capture and replay and analyze what happened in a workflow, and the ability to faithfully reproduce a sequence of steps. You talked about enabling things that scientists couldn't do before. It's not only that they couldn't analyze large quantities of data, but also that they couldn't automate their own methods, and be able to reflect on them in an automated way.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Right. Even if we couldn't run a workflow faster, and even if we weren't processing a lot more data, one of our key features is support for provenance. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Explain what you mean by provenance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Think about it in terms of art. For a given piece of art, we're able to establish through authorities that it's original, where it came from, and who's had their hands on it through its lifetime. Provenance for a workflow result is the same thing. Minimally we want to be able to establish trust in a result. If you think about how that happens, it often starts by considering who wrote the workflow. So with Trident you can click on a result and interrogate the history of the workflow: who wrote it, who reviewed it, who revised it, when it first entered the system.&lt;/p&gt;
&lt;p&gt;We do versioning as well, so you can look at an old result and know that it was created by an old version of the workflow. And then have the ability to run the new version on the old dataset to see if it makes a difference. &lt;/p&gt;
&lt;p&gt;We capture execution provenance so you know exactly how your result was created. We capture provenance on the workflows themselves so you know who created them, and who's touched them. &lt;/p&gt;
&lt;p&gt;You might be thinking about creating a community, where you click on a workflow and can say: "OK, I trust that post-doc."&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; I've been reflecting on what Microsoft brings to the world of science, in yours and in other collaborations that I've been talking to MSR folks about. One is clearly the special competence and expertise in data management and processing. Even for computationally-oriented scientists, that data expertise isn't necessarily a core competence. &lt;/p&gt;
&lt;p&gt;Another is the software tradition of version control. Again, that hasn't been a traditional strength of scientists. So this looks like a fruitful partnership on both fronts. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Agreed. It would be nice to get &lt;a href="http://perspectives.on10.net/blogs/jonudell/Making-sense-of-C02-data/"&gt;Catharine van Ingen&lt;/a&gt;, or perhaps Alex Szalay to chime in how how this is being used for astronomy. Because we're giving drops of this code to our e-science researchers for use in other areas. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; I'd love talk with Alex. I had a couple of in-depth conversations about the WorldWide Telescope, one with &lt;a href="http://blog.jonudell.net/2008/06/23/the-story-of-the-worldwide-telescope/"&gt;Curtis Wong&lt;/a&gt; and the other with &lt;a href="http://blog.jonudell.net/2008/07/14/how-the-worldwide-telescope-works/"&gt;Jonathan Fay&lt;/a&gt;, and we touched on the work Alex has done. He's using your stuff as well?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Not him personally, but his project -- &lt;a href="http://pan-starrs.ifa.hawaii.edu/public/"&gt;Pan-STARRS&lt;/a&gt; -- is. Catharine van Ingen and Yogesh Simmhan are co-architects of that system along with Alex. And they're bringing workflow to the table. It's becoming the way scientists upload their data into Pan-STARRS and get it back out, and Trident is the workflow engine for that.&lt;/p&gt;
&lt;p&gt;You've probably also heard about other activities here in External Research. Perhaps the scholarly communiations aspect?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Yep. I've talked to &lt;a href="http://perspectives.on10.net/blogs/jonudell/Word-for-scientific-publishing/"&gt;Pablo Fernicola&lt;/a&gt; about the Word add-in for authoring scientific papers in the National Library of Medicine XML format. And recently I got the &lt;a href="http://blog.jonudell.net/2008/07/31/a-conversation-with-tony-hey-about-microsoft-external-research-and-the-new-breed-of-e-scientists/"&gt;overview of External Research&lt;/a&gt; from Tony Hey.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; When you think about Trident in the context of scholarly communication -- and to your point about the importance of provenance, we see eye to eye on that -- not only can we use these tools for e-science data management, but we're focusing on reproducible research. When Trident has finished running a workflow, we'll create an XML structure that describes how to call back into Trident to recreate the result. We're really keen on the idea that not only is it easier to do the science, and publish the science, but actually reproduce it. And that XML description should be able to be embedded in the published work.&lt;/p&gt;
&lt;p&gt;That's really exciting. It's been talked about in the computational sciences, but never addressed end to end with a tool that's instrumented, that produces an XML standard the community can own which describes how the science was done, and that gets carried along with the publication, either physically or by reference, and we store this execution script in a database somewhere. &lt;/p&gt;
&lt;strong&gt;JU:&lt;/strong&gt; It's a really big idea.
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;RB:&lt;/strong&gt; It is, I think it could be transformational.
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;JU:&lt;/strong&gt; I do too.
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;RB:&lt;/strong&gt; Right now, reproducibility means that that you happen to know the person who did the experiment, or you happen to capture enough stuff in your lab notebook or on your whiteboard, then you have a chance of being able to do it again. But imagine being able to click any result, and automatically and transparently reproduce that result.
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;JU:&lt;/strong&gt; In reality it won't necessarily be the case that you can punch a button and have everything replayed exactly. But having the documentation, at that level of detail, and in that form, would be an incredible asset.
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;RB:&lt;/strong&gt; Agreed. The hope is that here in External Research, because we're building these tools not just in the context of one science project, but many, you can have community tools that bridge communities. We're talking to people in the earth sciences doing atmospheric studies, and their workflows and analyses are so similar to what the oceanographers are doing. But right now, since those two communities aren't talking or sharing tools, it's very difficult for one community to interact with the other.
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;JU:&lt;/strong&gt; That's a really nice point. Well, thanks Roger!
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;RB:&lt;/strong&gt; See you later.&lt;img src="http://on10.net/23408/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://on10.net/blogs/jonudell/Roger-Barga-on-Trident-a-workbench-for-scientific-workflow/</comments><itunes:summary>
Roger Barga, a principal architect with Microsoft's Technical Computing Initiative, is leading the development of Trident, a "workflow workbench" for science. In its first incarnation, the tool will enable oceanographers to automate the management and analysis of vast quantities of data produced by the Neptune sensor array. But as Roger explains in this interview, it's not just about oceanography. Every science is becoming data-intensive. Trident's graphical workflow authoring, reusable data transforms, and support for provenance -- the ability to reliably track and reproduce all the analytic steps leading to a scientific result -- is being used by astronomers too, and is expected to find its way into many other disciplines as well.




    
        
            
            
            
            Roger Barga
            
            
        
    

JU: We're here to talk about the Trident, the scientific workflow workbench for oceanography. Give us the 50,000-foot overview, then we'll zoom in.
RB: Scientists are increasingly dealing with large volumes of data coming from disparate sources. The process used to be manageable. You'd get post-docs to convert the raw data from the instruments into readable formats, there was a manual workflow to process the data into useful data products. 
JU: Those were the good old days. Or maybe not so good.
RB: Right. Because the time to get from raw data to those useful products was often measured in weeks or months. But now our ability to capture data has outpaced our ability to process and visualize it. And its rising exponentially with the rapid deployment of cheap sensors.
The oceanographic project we're working on, Neptune, is just one example of this. Astronomy, and all other sciences, are experiencing the same trend.
JU: Neptune is a University of Washington oceanographic project ...
RB: ... it's actually an NSF project. The proper name is Ocean Observatories Initiative, and it's being funded for several hundred million dollars. The University of Washington is one of the partners. Monterey Bay Aquarium Research Institute and a number of coastal observatories as well are involved.
JU: So fiberoptic cables are being laid, and lots of oceanographic data will be pouring in.
RB: Exactly. It's transformed oceanography from a data-poor discipline to a data-rich one. They're going to be able to monitor the oceans 24x7 over long periods of time. So the kinds of processes they can study were never within reach before. They could collect data when there was an episodic event, or when they could get funding. Now they'll be collecting permanently.
JU: What's the scope of the sensor network?
RB: They're laying the trench in Monterey to test and deploy the sensors. NSF is reviewing the larger program, and getting ready to fund the Neptune array which will be off the coast of Washington and Oregon. The Canadian version of the Neptune array is up and running and collecting data, but the software infrastructure is still being built as we speak.
JU: What quantities of data is the Canadian array producing?
RB: Gigabytes per day. It can easily handle a couple of high-def video streams coming from the ocean floor.
JU: Really?
RB: Yes. And also in-situ devices that can sequence organisms. It really is like not only taking Internet and power out to the ocean, but also a USB bus that instruments can be plugged into.
JU: What are some of the experiments that become possible with this setup?
RB: For example, being able to understand sediment flows across the ocean floor, how temperature and salinity change, how fresh water flows in from rivers, what kind of life exists at those margins. And understanding that interesting narrow band where life thrives in the ocean. Too high up and the tides affect it, too low and there's not enough light. But really, there are a myriad of things like that. 
JU: So an experiment, in this data-intensive new world, involves formulating a hypothesis, looking for patterns in previously-collected data, and then seeing whether data collected in the future supports the hypothesis. 
That means you not only need to run an analysis on data, but that you have to be able to repeat that analysis on an evolving body of data. Hence the need for the workflow automation that you're providing in the workbench.
RB: Yes. Another aspect is the need to calibrate and tune the models. If they can do that based on long-term monitoring, it'll remove a lot of the uncertainty in our understanding of the oceans. Versus now, where the data are so sparse that it's hard to validate the model.
JU: I guess also that as your understanding of the data and the models evolves, you might want to rethink what data you're capturing and how you're interpreting it. So, what is it that you've built with Trident, and how does it help you do those things?
RB: Jim Gray was the first person who had the vision of an oceanographer's workbench. His insight was that scientists really want to interact with visualizations of the ocean, but there was a huge gap between the raw data and those visualizations. 
Managing information and managing data is one of Microsoft's core strengths. In External Research, we look for partnership opportunities where can bring our technology, learn from applying it to data-intensive stress tests that involve even more data than our commercial products currently handle, and figure out how to use or extend our technology to provide a solution.
Jim pointed out that workflow was one of the key missing ingredients. We looked at the in-house tools, and Windows Workflow was the engine of choice...
JU: ...although it didn't exist at the time Jim floated this idea, right?
RB: Well, yes, it was around in alpha and beta form internally. Jim knew I was doing some of my research using Windows Workflow. Of course he left the solution up to us, but he accurately identified workflow as being a way that the scientist could not only manage the data transformations that were needed, but also create a library of solutions that could be shared and reused.
If you look at how Microsoft works as a company, we build platforms and then we expect ISVs to come in and bridge the gap between the platforms and the user communities. That's the role our group has played. We're looking at the requirements of the scientists, we're looking at the platform Microsoft provides, and we're building on that platform to provide a custom solution to the scientists that will not only accelerate their work, but change how they do science -- enable them to ask and answer questions they couldn't before.
We partnered initially with the University of Washington and Monterey Bay Aquarium Research Institute, or MBARI. They're already gathering data from sensors, so they could describe the spectrum of data we'd have to ingest into our workflows. The University of Washington has a visualization tool called COVE, which scientists are adopting as the preferred way to look at the ocean floor. You can think of it as Virtual Earth for the ocean. If there's bathymetry data, you can pull it in and se the ocean floor. 
JU: What kinds of data transformations are needed to get from the sensor outputs to COVE's inputs?
RB: There are probably about two dozen kinds of data sources we need to be able to ingest, based on the instruments and the types of data they put out. Typically it's streaming data in NetCDF format, or some other common format. So the first step is to recognize what kind of data format an instrument or model is kicking out, and transform it into an internal structure that our tool can use.
JU: But the workflow engine is abstracted from the instrumentation data formats and from the visualization tools, right? It's a mechanism for reproducibly running transformations, and managing that pipeline.
RB: Right. But let's start with how we interacted with the scientists. Jim Gray would ask scientists: "What are the top 20 questions you want to ask, and queries you want to run?" From that, he'd get an understanding of how they viewed the data, and what kind of processing was required.
We took the same approach, and asked the scientists which top 20 workflows they perform and which top 20 visualizations they like to see. Then we went through them from top to bottom, talking about the transforms and data integration that were required. We wound up with a set of two dozen transformations that were common across all of these workflows. That became the library of activities -- reusable chunks of code -- that the scientists could call upon to author not only these 20 workflows, but the next 20.
JU: Can you give a couple of examples?
RB: Sure. Regridding. You have two data sets, one's from a model and the other's from a set of deployed sensors out in the ocean. They're on different grid coordinate systems and you need to be able to bring those two together. That may require some interpolation, you might need to drop or add data points, transform coordinates, join data sets.
JU: There might be a temporal variant of the spatial gridding as well, to align different time scales? 
RB: Right. Some instruments are getting things every second, some are getting them every 15 minutes. You can ask the user: "Do you want interpolation to take place? Do you want the system to  match up the points?" Based on these inputs, the correct workflow gets configured and they see the resulting visualization for the region of ocean they're interested in.
JU: It sounds like some of these primitives will wind up being fairly general, not just specific to oceanography.
RB: Indeed they are. We're producing a version of Trident for oceanography, but many of these activities could be useful for other sciences as well. People in earth sciences, for example, are also using NetCDF and many of the same operations.
We expect that by building a tool which is extensible, and agnostic in terms of the science it supports, you can imagine it being used, for example, to understand the interaction between oceans and warm air currents. 
JU: What does the Trident user see and do?
RB: We realized that the authoring experience for scientific workflow is very different from, say, business workflow. In business, you'd have your accountant write your expense report workflow. They'd lock it down, they'd deploy it, everybody would use it from then on, and nobody would touch it until it came back for bug fixes or enhancements. 
What we found with scientists is that they want to borrow somebody's workflow that does what they want, or close to it, load that workflow, and then start authoring from that point on. 
So we implemented that in Trident. You can search for workflows by purpose, or by the inputs they process. You click on one, and load it into a visual browser because while the oceanographers understand the workflows, they don't want to see C# or Java, they want to see something visual -- boxes that represent the transformations they want to apply. 
JU: We've mentioned the Windows Workflow Foundation. For folks who aren't familiar with that system, how would you characterize it? How is it like and unlike a script execution engine?
RB: What's unique about workflow, versus scripting, is that with workflow you tease apart the notion of a schedule, which is the sequence of actions you'd like to have performed. If you were to look inside of each of those steps, you'd see code similar to what you'd find in a script. But on top of the sequence of steps you have an orchestration engine. When you pass this workflow -- this sequence of steps -- over to the orchestration engine, it runs the code inside each of the boxes, but as each one completes, control passes to the orchestration engine. 
So we have an abstraction layer, we've opened up the opportunity for reuse, the steps or activities become building blocks. In addition, the orchestration engine can monitor the execution of the workflow, or change the way it executes -- for example, by running blocks in parallel on a multicore machine. 
JU: What struck me about the Workflow Foundation was the way in which workflows can be very big or very small. As small as the sequence of interactions with a form on a web page, in which case the orchestration engine can be embedded entirely in the code that's behind that web page. 
Or it can be a very big thing. But in any case, since it's part of the .NET Framework, it can exist in a variety of places. It can run locally on a laptop, it can run on a server in the cloud. There's an interesting amount of flexibility in terms of how workflows can be deployed. An application could embed Trident, or Trident could be used as a service.
RB: That's right. That's the magic of it. Yes, it could be hosted in an environment that the scientist is already familiar with. Or for a big institution, you could post it up as a service. Anybody could access it from a browser. And that's part of our mantra here. If we provide this to the scientists, we have to make sure it works with the tools they're comfortable using. You should be able to point your Linux box running Firefox at this tool.
But to your other point, we're experimenting here with workflows that are resource-seeking. You could launch one, perhaps even on your cellphone, and that scheduling engine's going to look for systems that have resources for that workflow, tap into them, and give the user on the cellphone the impression it's running locally. 
JU: You've mentioned that the workflow style encourages a level of modularity that you might not otherwise get. It also provides a level of monitoring, control, and auditing. The reason that's important goes back to the idea of reproducibility. 
A friend of mine is an HPC expert, and one of his pet peeves is that when people look at HPC they tend to focus on how much raw horsepower can be thrown at a problem. His question is: "Who's worrying about reproducibility and correctness?" It's a really important question. 
In your environment, as I understand it, one of the things that you get is the ability to capture and replay and analyze what happened in a workflow, and the ability to faithfully reproduce a sequence of steps. You talked about enabling things that scientists couldn't do before. It's not only that they couldn't analyze large quantities of data, but also that they couldn't automate their own methods, and be able to reflect on them in an automated way.
RB: Right. Even if we couldn't run a workflow faster, and even if we weren't processing a lot more data, one of our key features is support for provenance. 
JU: Explain what you mean by provenance.
RB: Think about it in terms of art. For a given piece of art, we're able to establish through authorities that it's original, where it came from, and who's had their hands on it through its lifetime. Provenance for a workflow result is the same thing. Minimally we want to be able to establish trust in a result. If you think about how that happens, it often starts by considering who wrote the workflow. So with Trident you can click on a result and interrogate the history of the workflow: who wrote it, who reviewed it, who revised it, when it first entered the system.
We do versioning as well, so you can look at an old result and know that it was created by an old version of the workflow. And then have the ability to run the new version on the old dataset to see if it makes a difference. 
We capture execution provenance so you know exactly how your result was created. We capture provenance on the workflows themselves so you know who created them, and who's touched them. 
You might be thinking about creating a community, where you click on a workflow and can say: "OK, I trust that post-doc."
JU: I've been reflecting on what Microsoft brings to the world of science, in yours and in other collaborations that I've been talking to MSR folks about. One is clearly the special competence and expertise in data management and processing. Even for computationally-oriented scientists, that data expertise isn't necessarily a core competence. 
Another is the software tradition of version control. Again, that hasn't been a traditional strength of scientists. So this looks like a fruitful partnership on both fronts. 
RB: Agreed. It would be nice to get Catharine van Ingen, or perhaps Alex Szalay to chime in how how this is being used for astronomy. Because we're giving drops of this code to our e-science researchers for use in other areas. 
JU: I'd love talk with Alex. I had a couple of in-depth conversations about the WorldWide Telescope, one with Curtis Wong and the other with Jonathan Fay, and we touched on the work Alex has done. He's using your stuff as well?
RB: Not him personally, but his project -- Pan-STARRS -- is. Catharine van Ingen and Yogesh Simmhan are co-architects of that system along with Alex. And they're bringing workflow to the table. It's becoming the way scientists upload their data into Pan-STARRS and get it back out, and Trident is the workflow engine for that.
You've probably also heard about other activities here in External Research. Perhaps the scholarly communiations aspect?
JU: Yep. I've talked to Pablo Fernicola about the Word add-in for authoring scientific papers in the National Library of Medicine XML format. And recently I got the overview of External Research from Tony Hey.
RB: When you think about Trident in the context of scholarly communication -- and to your point about the importance of provenance, we see eye to eye on that -- not only can we use these tools for e-science data management, but we're focusing on reproducible research. When Trident has finished running a workflow, we'll create an XML structure that describes how to call back into Trident to recreate the result. We're really keen on the idea that not only is it easier to do the science, and publish the science, but actually reproduce it. And that XML description should be able to be embedded in the published work.
That's really exciting. It's been talked about in the computational sciences, but never addressed end to end with a tool that's instrumented, that produces an XML standard the community can own which describes how the science was done, and that gets carried along with the publication, either physically or by reference, and we store this execution script in a database somewhere. 
JU: It's a really big idea.

RB: It is, I think it could be transformational.

JU: I do too.

RB: Right now, reproducibility means that that you happen to know the person who did the experiment, or you happen to capture enough stuff in your lab notebook or on your whiteboard, then you have a chance of being able to do it again. But imagine being able to click any result, and automatically and transparently reproduce that result.

JU: In reality it won't necessarily be the case that you can punch a button and have everything replayed exactly. But having the documentation, at that level of detail, and in that form, would be an incredible asset.

RB: Agreed. The hope is that here in External Research, because we're building these tools not just in the context of one science project, but many, you can have community tools that bridge communities. We're talking to people in the earth sciences doing atmospheric studies, and their workflows and analyses are so similar to what the oceanographers are doing. But right now, since those two communities aren't talking or sharing tools, it's very difficult for one community to interact with the other.

JU: That's a really nice point. Well, thanks Roger!

RB: See you later.</itunes:summary><link>http://on10.net/blogs/jonudell/Roger-Barga-on-Trident-a-workbench-for-scientific-workflow/</link><pubDate>Thu, 28 Aug 2008 17:41:00 GMT</pubDate><guid isPermaLink="true">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/barga/barga.mp3</guid><evnet:views>1192</evnet:views><evnet:viewtrackingurl>http://on10.net/23408/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;
Roger Barga, a principal architect with Microsoft's Technical Computing Initiative, is leading the development of Trident, a "workflow workbench" for science. In its first incarnation, the tool will enable oceanographers to automate the management and analysis of vast quantities of data produced by the &lt;a href="http://en.wikipedia.org/wiki/NEPTUNE"&gt;Neptune sensor array&lt;/a&gt;. But as Roger explains in this interview, it's not just about oceanography. Every science is becoming data-intensive. Trident's graphical workflow authoring, reusable data transforms, and support for provenance -- the ability to reliably track and reproduce all the analytic steps leading to a scientific result -- is being used by astronomers too, and is expected to find its way into many other disciplines as well.
&lt;/p&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/barga/barga.mp3" expression="full" duration="1890" fileSize="15136512" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/barga/barga.wma" expression="full" duration="1890" fileSize="15312203" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/barga/barga.mp3" length="15136512" type="audio/mp3" /><dc:creator>JonUdell</dc:creator><itunes:author>JonUdell</itunes:author><slash:comments>0</slash:comments><wfw:commentRss>http:/on10.net/blogs/jonudell/Roger-Barga-on-Trident-a-workbench-for-scientific-workflow/RSS/</wfw:commentRss><trackback:ping>http://on10.net/23408/Trackback.aspx</trackback:ping><category>e-science</category><category>oceanography</category><category>podcasts</category><category>Workflow</category></item><item><title>Ted Semon reflects on the 2008 Space Elevator Conference</title><description>&lt;p&gt;Ted Semon, a retired software engineer, chronicles the efforts to develop a space elevator on the &lt;a href="http://www.spaceelevatorblog.com/"&gt;Space Elevator Blog&lt;/a&gt;, and volunteers for &lt;a href="http://www.spaceward.org/"&gt;The Spaceward Foundation&lt;/a&gt; which administers &lt;a href="http://www.spaceward.org/elevator2010"&gt;competitions&lt;/a&gt; to develop several of the core technologies that will be needed to build the elevator. &lt;/p&gt;
&lt;p&gt;Ted attended and spoke at the &lt;a href="http://www.spaceelevatorconference.org/"&gt;2008 Space Elevator Conference&lt;/a&gt; held at the Microsoft Conference Center in Redmond. In this interview he discusses the concept of the space elevator, and the status of current efforts to bring it to life. &lt;/p&gt;
&lt;p&gt;In a &lt;a href="http://perspectives.on10.net/blogs/jonudell/Maurice-Franklin-reflects-on-the-2008-Space-Elevator-Conference/"&gt;related interview&lt;/a&gt;, Maurice Franklin, the Microsoft employee who brought the conference to Redmond this year, reflects on the conference and on the goals and status of the project. &lt;/p&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;&lt;img alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/semon/semon.jpg" /&gt;
            &lt;div&gt;&lt;strong&gt;Ted Semon&lt;/strong&gt; &lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: How did you become interested in the space elevator? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: I've always been a science fiction fan, and I read Arthur C. Clarke's &lt;a href="http://www.worldcat.org/oclc/4606759"&gt;Fountains of Paradise&lt;/a&gt; many years ago. The idea of the space elevator seemed so obviously the right way to get up out of Earth's gravity well. &lt;/p&gt;
&lt;p&gt;When I retired from the software world a few years ago, I decided to learn what was happening with the concept. There were blogs and websites, but nothing coherent, so I decided to pull the information together myself on the &lt;a href="http://www.spaceelevatorblog.com/"&gt;space elevator blog&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: At this point were we into the modern era of the development of the concept? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Yes, this was in 2006. I'd read the &lt;a href="http://www.worldcat.org/oclc/52067341"&gt;Brad Edwards book&lt;/a&gt;, but it was hard to find out what was currently going on, so I started the blog. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: The concept as described by Clarke is quite different from the modern one that's emerging, right? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: In some ways yes, in some ways no. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Can you spell out the differences? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: OK, there are several. He had located his port on the island of Sri Lanka. Current thinking is that it won't be a land port, it'll be an ocean-going port, so you can move the space elevator if you need to, and get it out of the way of satellites and other things in orbit. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: I gather that's a "when", not an "if". &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Exactly. There's stuff up there, it's going to intersect the elevator, you've got to deal with that. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And the object being moved, just to be clear, is a 100,000 kilometer strand of material. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Right. It's a carbon nanotube tether, or rope, or ribbon, whatever you want to call it. One end is anchored to an earth port, something like an ocean-going oil platform, and the counterweight ands at 100,000 kilometers up. &lt;/p&gt;
&lt;p&gt;By moving the ocean-going platform you can induce a wave that travels up the ribbon. You know which objects in space to worry about, at least the big ones, because you track them. And you know what's going on with the ribbon because you have sensors embedded in it, and climbers going up and down that signal their location. So you should be able to always move the ribbon out of the way of a collision. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So one difference from Clarke's original vision is that the platform is mobile and sea-based. What are some other differences? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Cost. He had imagined the cost would be something like the Earth's combined gross national products for a year, or some enormous number like that. The number now that looks more realistic is on the order of 10 billion dollars. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And what accounts for that lower estimate? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: More knowledge now about how it's going to be built. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Maurice Franklin and I discussed this, and his take was that the Clarke scenario assumed a huge mass parked in geosynchronous orbit, and that mass would be very expensive to lift. That ties into another evolution of the concept, which is that it's not now anchored with a large mass at 22,000 miles, but extends far beyond that. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Right. Something like 100,000 kilometers. The counterweight in the Edwards plan is about 600 metric tons, quite a bit smaller than the Clarke scenario. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: The reason that's possible is? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Because it's farther out in orbit. Well, it's hard to say an object anchored to Earth is in orbit, but it's 100,000 kilometers out. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: At its endpoint. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Yes. &lt;/p&gt;
&lt;p&gt;Let's see. He had also talked about the material being a carbon or diamond monofilament. I guess that's similar to carbon nanotube, and we should probably say he was right on that score. &lt;/p&gt;
&lt;p&gt;He hadn't talked about powering the climbers, though. They used batteries. In the Edwards concept, the climbers are laser-powered. Lasers will be aimed at photovoltaic cells on the bottom of the climbers. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So the climber is the robot that's attached to the tether, and ascends and descends? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Right. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: I've got some sense of what a carbon nanotube is. A sheet of carbon atoms folded into a cylinder. But I'm not at all clear now that translates into a 100,000 kilometer cable. What's the architecture of that cable? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: It's composed of fibers. When you buy a 50-foot rope at a store, there's no fiber in there 50 feet long. They're all woven together, and that's what'll happen with carbon nanotubes too. &lt;/p&gt;
&lt;p&gt;Right now the longest ones I know of, and have actually seen, are 5, 10, 15 millimeters long. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: The individual fibers? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Right. So one challenge is to grow a longer fiber. MIT, for example, is working with a company called NanoComp. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: There's obviously a limit to how far you can go in that dimension, so then it's a question of how to compose a ribbon out of these strands, probably at several levels of hierarchy. Just like the way the Golden Gate Bridge cables are multistranded at several levels of hierarchy. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Yes. The textile mills are very good at this stuff. If you give them fibers, they will weave you cables. The issue is going to be giving them carbon nanotube fibers of sufficient length and strength. That's where the bottleneck is now. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: What kind of diameter of cable are we talking about? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: If you're looking at the Edwards scenario, it's going to be a ribbon that's roughly 20 inches wide. And it is a ribbon. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Why? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: When you're in space, you want as wide a surface as possible. So a micrometeorite strike won't sever the ribbon, it'll only poke a hole in it. And if you have it woven correctly, the strain is taken up by nearby fibers. &lt;/p&gt;
&lt;p&gt;However the ribbon can be problematic in the atmosphere, because of wind effects. So it may be a cable in the atmosphere, widening out to a ribbon above. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: There needs to be a procedure for maintenance and repair, what's being discussed there? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Some people are talking about making the tether into a big loop that's constantly rotated down to Earth where you do the maintenance. Nice in theory, but you've doubled the length. And what do you do about having a cable in the atmosphere and a ribbon above? &lt;/p&gt;
&lt;p&gt;Another scenario is that the tether is made of segments. People worry that if the ribbon were cut, the two ends would fly apart. Not so. They'll sit there for some time, then gradually pull apart, but not like a snapping cable. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Not catastrophic? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: No, as long as you get to it in time. So you should be able to disconnect and reconnect segments. &lt;/p&gt;
&lt;p&gt;Another possibility: The climbers continuously reweave the ribbon as they go up and down. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: The carbon nanotube fiber and laser power beaming technologies seem to be two key ingredients in development. And those are what the &lt;a href="http://www.spaceward.org/games07.html"&gt;space elevator games&lt;/a&gt; test, right? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Yes. On the carbon nanotube front, there's a lot of work being done by industry and universities, and not only for the space elevator. Most people don't know or care about that, they just see a market for things much lighter and stronger than steel. &lt;/p&gt;
&lt;p&gt;And with carbon nanotubes being measured at 2, 4, 6, maybe even 8 gigapascals -- and these are big jumps over a few years ago -- there's a real sense that we're getting close to being able to make a ribbon strong enough to support an elevator. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: How strong is that? What are the forces acting on the ribbon? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: The original Edwards scenario called for 130 gigapascals. Since then there's been some rethinking. Some good aerospace engineers think it can be dropped to 60 or even 40 gigapascals. That doesn't mean 130 is outside the realm of possibility, but we'll get from 8 to 40 and 60 a lot sooner. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: But the existing results are for radically shorter lengths. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Yes, but you just need to something long enough to be woven. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Ropes stretch, though, and we don't have any examples of 100,000-kilometer ropes or cables. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Well, the total amount of cable in the San Francisco Bay Bridge would exceed the length of the space elevator. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Really? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: It's different of course because it winds back and forth and around things, but there is some experience there. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: In terms of the laser power beaming, is this also a case where development is occurring for all sorts of other reasons? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Exactly. NASA sponsors the space elevator games, but they mainly care about very strong materials, i.e. carbon nanotube tethers, and they care about power beaming, because they see applications for these things. &lt;/p&gt;
&lt;p&gt;Boeing has just come out with a solid state laser in the 25 kilowatt range, and they say they can go to the 100 kilowatt range. If you can get 20 of those, that's enough to power your climbers. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: What are the current applications of those? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Beaming power to a moon buggy so it doesn't have to carry batteries. Airships that stay up for weeks at a time. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Are any of these concepts real yet? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: No, not yet, but the needs exist and they're trying to develop the technology to satisfy those needs. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: I gather one potential showstopper is the threat of natural or manmade attack. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Actually the latter wasn't discussed too much at the conference, mostly the former. Micrometeorites, space junk. &lt;/p&gt;
&lt;p&gt;That's being addressed in two ways. For small things, make the tethers wide enough, and engineer a replacement lifecycle. For large things, move the elevator out of the way. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: That computational grand challenge dovetails with Microsoft's strengths and interests, so that might be one interesting outgrowth of having had Microsoft sponsor and host the conference. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: That'd be great! &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Give us a sense of who was at the conference, what was discussed, and what emerged. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: There are two answers, I guess. First, there were a lot of old pros, people who've been working on this for years, and have come up with the intial concepts and solutions. &lt;/p&gt;
&lt;p&gt;Then there are some new people in the last year. Some were invited, some just showed up. &lt;/p&gt;
&lt;p&gt;There's an effort to make this into an international campaign. We've adopted the "four pillar" concept. It's something you need for any huge infastructure project. The pillars are: technical capabilities, a business plan, a legal and insurance framework, and public support. &lt;/p&gt;
&lt;p&gt;That hadn't come together in the past, but this year we think we've gotten the enthusiasm, and especially the international support, to sustain that four-pillar approach going forward. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: We mentioned the Golden Gate Bridge. I recently learned that it wasn't a federal project, it was a municipal project. Likewise, the space elevator would perhaps ideally not be a big federal project. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: I don't think it has to be. I gave a talk this year on who I thought would build the first one. Ten billion dollars is a large sum, but not out of the reach of non-governmental entities. Money's an issue, but it's not going to be the showstopper. &lt;/p&gt;
&lt;p&gt;I do think you'll need a government involved for defense, and for insurability, because international treaties will have to be written, and I think a government will be able to do that more easily than a business consortium. &lt;/p&gt;
&lt;p&gt;I could see a group of US businesses getting together and saying to the US government, we'll take the financial and technical risk, in return please defend our elevator and help us deal with the insurability. If you do, we'll make you a deal: free launches, or discount launches. I'm sure there's a deal that can be made. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: In terms of why to do it, obviously everyone close to the concept takes it on faith that it's a good thing to do for all sorts of compelling reasons. To me, the solar satellite concept is maybe the most compelling, is that the application advocates tend to lead with? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Many do. I don't personally. I'm skeptical. We use so much energy, and to put enough stuff into space to create space-based solar power that would make a significant dent, well, the amount of material is huge. &lt;/p&gt;
&lt;p&gt;And we're not going to have a space elevator for 20 or 30 years. Meanwhile our problems will keep getting worse. We may have pilot projets, but nothing that'll power your refrigerator... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: ...or have any significant effect on greenhouse gases. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Correct. Having said that, the concept is outstanding. And while I'm skeptical, I'm in the minority. Most advocates see it as a huge reason to build the space elevator. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: What are the other reasons that come up? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Look at what you get: Enormous capacity, low cost, safer launches, and low environmental impact. Any industry that needs those benefits will want the space elevator. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Such as? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Well, what's making money right now is communications satellites. That's a big and growing industry. It'll be much easier to build satellites that don't have to reach orbit in rockets, and much cheaper to send them up. &lt;/p&gt;
&lt;p&gt;Another will be orbital tourism. Being able to go up 100 miles, spend the afternoon, and come down -- we think that'll be a big moneymaker. &lt;/p&gt;
&lt;p&gt;Then there are industries that don't exist today, except in labs, that need a space environment. To get them up today, you're talking about thousands of dollars a pound in rockets, and not a whole lot of pounds. With a space elevator it's hundreds of dollars a pound or less, and a lot of capacity. &lt;/p&gt;
&lt;p&gt;I think once it's there it'll make a ton of money for somebody, maybe lots of somebodies, because you want more than one space elevator. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Right. The first one is the bootstrap that gets you to others. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Yes, and once you've got two, now you're in business. One of your failsafe scenarios is that you can leverage one to fix the other. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So at the conference, the discussion was more about how to get it done than why to do it. What were the conclusions? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: That we're closer than ever. Arthur C. Clarke said that a space elevator would be built 50 years after people quit laughing about it. Well, people quit laughing some time ago. I think his prediction is a bit pessmistic. I think we're looking at 2020 to 2030 to actually be able to put one up. There's a general feeling that this is a real possibility, that it could happen in our lifetimes. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: If I were to attend the space elevator games, what would I see? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: You'll see a helicopter lifting a steel cable 1 kilometer up, and you'll see teams attaching climbers to the cable, and beaming power to photovoltaic cells on the climbers. We're hoping to have several competitors this year with a real shot at winning the prize. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And the prize is? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: There are two. If you can get up the kilometer cable in two meters per second, and you're the only one who does, there's a million dollar prize. If you can do it at five meters per second, and you're the only one who does, there's a two million dollar prize. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Really? So a million bucks for a laser-powered climber to go two meters per second, and nobody's claimed that yet? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: That's right. Last year it was 100 meters, and before that 50. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So you've raised the bar? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: And the prize money, yes. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And what's the other prize? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: For the strongest tether. Also a one million and two million dollar prize. You have to beat that house tether. Yours can be two grams, the house tether can be three, and it's made from commercially available material. So if you bring something new, like a carbon nanotube tether, and it can beat the house tether which is heavier, you can win the prize. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And the lengths are? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Two meters I think. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Oh, OK, so nothing like the kilometer climb. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: No, it's a two-meter loop. Yours and the house tether are placed onto a special machine designed for this event, it stresses them equally, whichever breaks first loses. &lt;/p&gt;
&lt;p&gt;Nobody's come close to winning that one yet. But last year we had our first carbon nanotube tether. MIT brought it, working with NanoComp. But it had been done so close to the competition that they weren't able to weave it a loop. So they actually tied a knot. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: They tied a knot!? [laughs] &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Exactly. But this year they'll have more time to prepare, and we know of at least one other team bringing a carbon nanotube tether. &lt;/p&gt;
&lt;p&gt;So our ideal scenario for this year is that we have a carbon nanotube tether that blows away the house tether, and a 5-meter-per-second climber. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And it'll happen where? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Not definite yet, but we're hoping for Meteor Crater in Arizona. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: It's inspiring to think about this stuff! &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: It's very inspiring to be on the inside. I got involved just because I was interested, but now I'm a huge fan and I'll do everything I can to help make it happen. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: The project seems to be attracting a variety of folks, from all walks of life, who are showing up, and wanting to participate, and finding ways to participate. &lt;/p&gt;
&lt;p&gt;Maurice Franklin, for example, a Microsoft employee, has now made a real contribution by organizing this year's conference. But he also talks about some other folks who showed up, uninvited, with relevant engineering credentials, and made real contributions. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: That's right. People like Maurice will be the lifeblood of this project. And when you get involved, and start to see that this isn't some science fiction idea that's never going to happen... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: ... and that there are serious people, with serious engineering credentials, working the problem in a pragmatic way. It might not happen, but it could. Thanks Ted! &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Thank you, Jon. &lt;/p&gt;&lt;img src="http://on10.net/23234/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://on10.net/blogs/jonudell/Ted-Semon-reflects-on-the-2008-Space-Elevator-Conference/</comments><itunes:summary>Ted Semon, a retired software engineer, chronicles the efforts to develop a space elevator on the Space Elevator Blog, and volunteers for The Spaceward Foundation which administers competitions to develop several of the core technologies that will be needed to build the elevator. 
Ted attended and spoke at the 2008 Space Elevator Conference held at the Microsoft Conference Center in Redmond. In this interview he discusses the concept of the space elevator, and the status of current efforts to bring it to life. 
In a related interview, Maurice Franklin, the Microsoft employee who brought the conference to Redmond this year, reflects on the conference and on the goals and status of the project. 



    
        
            
            Ted Semon 
            
        
    

JU: How did you become interested in the space elevator? 
TS: I've always been a science fiction fan, and I read Arthur C. Clarke's Fountains of Paradise many years ago. The idea of the space elevator seemed so obviously the right way to get up out of Earth's gravity well. 
When I retired from the software world a few years ago, I decided to learn what was happening with the concept. There were blogs and websites, but nothing coherent, so I decided to pull the information together myself on the space elevator blog. 
JU: At this point were we into the modern era of the development of the concept? 
TS: Yes, this was in 2006. I'd read the Brad Edwards book, but it was hard to find out what was currently going on, so I started the blog. 
JU: The concept as described by Clarke is quite different from the modern one that's emerging, right? 
TS: In some ways yes, in some ways no. 
JU: Can you spell out the differences? 
TS: OK, there are several. He had located his port on the island of Sri Lanka. Current thinking is that it won't be a land port, it'll be an ocean-going port, so you can move the space elevator if you need to, and get it out of the way of satellites and other things in orbit. 
JU: I gather that's a "when", not an "if". 
TS: Exactly. There's stuff up there, it's going to intersect the elevator, you've got to deal with that. 
JU: And the object being moved, just to be clear, is a 100,000 kilometer strand of material. 
TS: Right. It's a carbon nanotube tether, or rope, or ribbon, whatever you want to call it. One end is anchored to an earth port, something like an ocean-going oil platform, and the counterweight ands at 100,000 kilometers up. 
By moving the ocean-going platform you can induce a wave that travels up the ribbon. You know which objects in space to worry about, at least the big ones, because you track them. And you know what's going on with the ribbon because you have sensors embedded in it, and climbers going up and down that signal their location. So you should be able to always move the ribbon out of the way of a collision. 
JU: So one difference from Clarke's original vision is that the platform is mobile and sea-based. What are some other differences? 
TS: Cost. He had imagined the cost would be something like the Earth's combined gross national products for a year, or some enormous number like that. The number now that looks more realistic is on the order of 10 billion dollars. 
JU: And what accounts for that lower estimate? 
TS: More knowledge now about how it's going to be built. 
JU: Maurice Franklin and I discussed this, and his take was that the Clarke scenario assumed a huge mass parked in geosynchronous orbit, and that mass would be very expensive to lift. That ties into another evolution of the concept, which is that it's not now anchored with a large mass at 22,000 miles, but extends far beyond that. 
TS: Right. Something like 100,000 kilometers. The counterweight in the Edwards plan is about 600 metric tons, quite a bit smaller than the Clarke scenario. 
JU: The reason that's possible is? 
TS: Because it's farther out in orbit. Well, it's hard to say an object anchored to Earth is in orbit, but it's 100,000 kilometers out. 
JU: At its endpoint. 
TS: Yes. 
Let's see. He had also talked about the material being a carbon or diamond monofilament. I guess that's similar to carbon nanotube, and we should probably say he was right on that score. 
He hadn't talked about powering the climbers, though. They used batteries. In the Edwards concept, the climbers are laser-powered. Lasers will be aimed at photovoltaic cells on the bottom of the climbers. 
JU: So the climber is the robot that's attached to the tether, and ascends and descends? 
TS: Right. 
JU: I've got some sense of what a carbon nanotube is. A sheet of carbon atoms folded into a cylinder. But I'm not at all clear now that translates into a 100,000 kilometer cable. What's the architecture of that cable? 
TS: It's composed of fibers. When you buy a 50-foot rope at a store, there's no fiber in there 50 feet long. They're all woven together, and that's what'll happen with carbon nanotubes too. 
Right now the longest ones I know of, and have actually seen, are 5, 10, 15 millimeters long. 
JU: The individual fibers? 
TS: Right. So one challenge is to grow a longer fiber. MIT, for example, is working with a company called NanoComp. 
JU: There's obviously a limit to how far you can go in that dimension, so then it's a question of how to compose a ribbon out of these strands, probably at several levels of hierarchy. Just like the way the Golden Gate Bridge cables are multistranded at several levels of hierarchy. 
TS: Yes. The textile mills are very good at this stuff. If you give them fibers, they will weave you cables. The issue is going to be giving them carbon nanotube fibers of sufficient length and strength. That's where the bottleneck is now. 
JU: What kind of diameter of cable are we talking about? 
TS: If you're looking at the Edwards scenario, it's going to be a ribbon that's roughly 20 inches wide. And it is a ribbon. 
JU: Why? 
TS: When you're in space, you want as wide a surface as possible. So a micrometeorite strike won't sever the ribbon, it'll only poke a hole in it. And if you have it woven correctly, the strain is taken up by nearby fibers. 
However the ribbon can be problematic in the atmosphere, because of wind effects. So it may be a cable in the atmosphere, widening out to a ribbon above. 
JU: There needs to be a procedure for maintenance and repair, what's being discussed there? 
TS: Some people are talking about making the tether into a big loop that's constantly rotated down to Earth where you do the maintenance. Nice in theory, but you've doubled the length. And what do you do about having a cable in the atmosphere and a ribbon above? 
Another scenario is that the tether is made of segments. People worry that if the ribbon were cut, the two ends would fly apart. Not so. They'll sit there for some time, then gradually pull apart, but not like a snapping cable. 
JU: Not catastrophic? 
TS: No, as long as you get to it in time. So you should be able to disconnect and reconnect segments. 
Another possibility: The climbers continuously reweave the ribbon as they go up and down. 
JU: The carbon nanotube fiber and laser power beaming technologies seem to be two key ingredients in development. And those are what the space elevator games test, right? 
TS: Yes. On the carbon nanotube front, there's a lot of work being done by industry and universities, and not only for the space elevator. Most people don't know or care about that, they just see a market for things much lighter and stronger than steel. 
And with carbon nanotubes being measured at 2, 4, 6, maybe even 8 gigapascals -- and these are big jumps over a few years ago -- there's a real sense that we're getting close to being able to make a ribbon strong enough to support an elevator. 
JU: How strong is that? What are the forces acting on the ribbon? 
TS: The original Edwards scenario called for 130 gigapascals. Since then there's been some rethinking. Some good aerospace engineers think it can be dropped to 60 or even 40 gigapascals. That doesn't mean 130 is outside the realm of possibility, but we'll get from 8 to 40 and 60 a lot sooner. 
JU: But the existing results are for radically shorter lengths. 
TS: Yes, but you just need to something long enough to be woven. 
JU: Ropes stretch, though, and we don't have any examples of 100,000-kilometer ropes or cables. 
TS: Well, the total amount of cable in the San Francisco Bay Bridge would exceed the length of the space elevator. 
JU: Really? 
TS: It's different of course because it winds back and forth and around things, but there is some experience there. 
JU: In terms of the laser power beaming, is this also a case where development is occurring for all sorts of other reasons? 
TS: Exactly. NASA sponsors the space elevator games, but they mainly care about very strong materials, i.e. carbon nanotube tethers, and they care about power beaming, because they see applications for these things. 
Boeing has just come out with a solid state laser in the 25 kilowatt range, and they say they can go to the 100 kilowatt range. If you can get 20 of those, that's enough to power your climbers. 
JU: What are the current applications of those? 
TS: Beaming power to a moon buggy so it doesn't have to carry batteries. Airships that stay up for weeks at a time. 
JU: Are any of these concepts real yet? 
TS: No, not yet, but the needs exist and they're trying to develop the technology to satisfy those needs. 
JU: I gather one potential showstopper is the threat of natural or manmade attack. 
TS: Actually the latter wasn't discussed too much at the conference, mostly the former. Micrometeorites, space junk. 
That's being addressed in two ways. For small things, make the tethers wide enough, and engineer a replacement lifecycle. For large things, move the elevator out of the way. 
JU: That computational grand challenge dovetails with Microsoft's strengths and interests, so that might be one interesting outgrowth of having had Microsoft sponsor and host the conference. 
TS: That'd be great! 
JU: Give us a sense of who was at the conference, what was discussed, and what emerged. 
TS: There are two answers, I guess. First, there were a lot of old pros, people who've been working on this for years, and have come up with the intial concepts and solutions. 
Then there are some new people in the last year. Some were invited, some just showed up. 
There's an effort to make this into an international campaign. We've adopted the "four pillar" concept. It's something you need for any huge infastructure project. The pillars are: technical capabilities, a business plan, a legal and insurance framework, and public support. 
That hadn't come together in the past, but this year we think we've gotten the enthusiasm, and especially the international support, to sustain that four-pillar approach going forward. 
JU: We mentioned the Golden Gate Bridge. I recently learned that it wasn't a federal project, it was a municipal project. Likewise, the space elevator would perhaps ideally not be a big federal project. 
TS: I don't think it has to be. I gave a talk this year on who I thought would build the first one. Ten billion dollars is a large sum, but not out of the reach of non-governmental entities. Money's an issue, but it's not going to be the showstopper. 
I do think you'll need a government involved for defense, and for insurability, because international treaties will have to be written, and I think a government will be able to do that more easily than a business consortium. 
I could see a group of US businesses getting together and saying to the US government, we'll take the financial and technical risk, in return please defend our elevator and help us deal with the insurability. If you do, we'll make you a deal: free launches, or discount launches. I'm sure there's a deal that can be made. 
JU: In terms of why to do it, obviously everyone close to the concept takes it on faith that it's a good thing to do for all sorts of compelling reasons. To me, the solar satellite concept is maybe the most compelling, is that the application advocates tend to lead with? 
TS: Many do. I don't personally. I'm skeptical. We use so much energy, and to put enough stuff into space to create space-based solar power that would make a significant dent, well, the amount of material is huge. 
And we're not going to have a space elevator for 20 or 30 years. Meanwhile our problems will keep getting worse. We may have pilot projets, but nothing that'll power your refrigerator... 
JU: ...or have any significant effect on greenhouse gases. 
TS: Correct. Having said that, the concept is outstanding. And while I'm skeptical, I'm in the minority. Most advocates see it as a huge reason to build the space elevator. 
JU: What are the other reasons that come up? 
TS: Look at what you get: Enormous capacity, low cost, safer launches, and low environmental impact. Any industry that needs those benefits will want the space elevator. 
JU: Such as? 
TS: Well, what's making money right now is communications satellites. That's a big and growing industry. It'll be much easier to build satellites that don't have to reach orbit in rockets, and much cheaper to send them up. 
Another will be orbital tourism. Being able to go up 100 miles, spend the afternoon, and come down -- we think that'll be a big moneymaker. 
Then there are industries that don't exist today, except in labs, that need a space environment. To get them up today, you're talking about thousands of dollars a pound in rockets, and not a whole lot of pounds. With a space elevator it's hundreds of dollars a pound or less, and a lot of capacity. 
I think once it's there it'll make a ton of money for somebody, maybe lots of somebodies, because you want more than one space elevator. 
JU: Right. The first one is the bootstrap that gets you to others. 
TS: Yes, and once you've got two, now you're in business. One of your failsafe scenarios is that you can leverage one to fix the other. 
JU: So at the conference, the discussion was more about how to get it done than why to do it. What were the conclusions? 
TS: That we're closer than ever. Arthur C. Clarke said that a space elevator would be built 50 years after people quit laughing about it. Well, people quit laughing some time ago. I think his prediction is a bit pessmistic. I think we're looking at 2020 to 2030 to actually be able to put one up. There's a general feeling that this is a real possibility, that it could happen in our lifetimes. 
JU: If I were to attend the space elevator games, what would I see? 
TS: You'll see a helicopter lifting a steel cable 1 kilometer up, and you'll see teams attaching climbers to the cable, and beaming power to photovoltaic cells on the climbers. We're hoping to have several competitors this year with a real shot at winning the prize. 
JU: And the prize is? 
TS: There are two. If you can get up the kilometer cable in two meters per second, and you're the only one who does, there's a million dollar prize. If you can do it at five meters per second, and you're the only one who does, there's a two million dollar prize. 
JU: Really? So a million bucks for a laser-powered climber to go two meters per second, and nobody's claimed that yet? 
TS: That's right. Last year it was 100 meters, and before that 50. 
JU: So you've raised the bar? 
TS: And the prize money, yes. 
JU: And what's the other prize? 
TS: For the strongest tether. Also a one million and two million dollar prize. You have to beat that house tether. Yours can be two grams, the house tether can be three, and it's made from commercially available material. So if you bring something new, like a carbon nanotube tether, and it can beat the house tether which is heavier, you can win the prize. 
JU: And the lengths are? 
TS: Two meters I think. 
JU: Oh, OK, so nothing like the kilometer climb. 
TS: No, it's a two-meter loop. Yours and the house tether are placed onto a special machine designed for this event, it stresses them equally, whichever breaks first loses. 
Nobody's come close to winning that one yet. But last year we had our first carbon nanotube tether. MIT brought it, working with NanoComp. But it had been done so close to the competition that they weren't able to weave it a loop. So they actually tied a knot. 
JU: They tied a knot!? [laughs] 
TS: Exactly. But this year they'll have more time to prepare, and we know of at least one other team bringing a carbon nanotube tether. 
So our ideal scenario for this year is that we have a carbon nanotube tether that blows away the house tether, and a 5-meter-per-second climber. 
JU: And it'll happen where? 
TS: Not definite yet, but we're hoping for Meteor Crater in Arizona. 
JU: It's inspiring to think about this stuff! 
TS: It's very inspiring to be on the inside. I got involved just because I was interested, but now I'm a huge fan and I'll do everything I can to help make it happen. 
JU: The project seems to be attracting a variety of folks, from all walks of life, who are showing up, and wanting to participate, and finding ways to participate. 
Maurice Franklin, for example, a Microsoft employee, has now made a real contribution by organizing this year's conference. But he also talks about some other folks who showed up, uninvited, with relevant engineering credentials, and made real contributions. 
TS: That's right. People like Maurice will be the lifeblood of this project. And when you get involved, and start to see that this isn't some science fiction idea that's never going to happen... 
JU: ... and that there are serious people, with serious engineering credentials, working the problem in a pragmatic way. It might not happen, but it could. Thanks Ted! 
TS: Thank you, Jon. </itunes:summary><link>http://on10.net/blogs/jonudell/Ted-Semon-reflects-on-the-2008-Space-Elevator-Conference/</link><pubDate>Fri, 08 Aug 2008 14:21:00 GMT</pubDate><guid isPermaLink="true">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/semon/semon.mp3</guid><evnet:views>1066</evnet:views><evnet:viewtrackingurl>http://on10.net/23234/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;Ted Semon, a retired software engineer, chronicles the efforts to develop a space elevator on the &lt;a href="http://www.spaceelevatorblog.com/"&gt;Space Elevator Blog&lt;/a&gt;, and volunteers for &lt;a href="http://www.spaceward.org/"&gt;The Spaceward Foundation&lt;/a&gt; which administers &lt;a href="http://www.spaceward.org/elevator2010"&gt;competitions&lt;/a&gt; to develop several of the core technologies that will be needed to build the elevator. &lt;/p&gt;
&lt;p&gt;Ted attended and spoke at the &lt;a href="http://www.spaceelevatorconference.org/"&gt;2008 Space Elevator Conference&lt;/a&gt; held at the Microsoft Conference Center in Redmond. In this interview he discusses the concept of the space elevator, and the status of current efforts to bring it to life. &lt;/p&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/semon/semon.mp3" expression="full" duration="36" fileSize="17451840" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/semon/semon.wma" expression="full" duration="36" fileSize="17659877" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/semon/semon.mp3" length="17451840" type="audio/mp3" /><dc:creator>JonUdell</dc:creator><itunes:author>JonUdell</itunes:author><slash:comments>0</slash:comments><wfw:commentRss>http:/on10.net/blogs/jonudell/Ted-Semon-reflects-on-the-2008-Space-Elevator-Conference/RSS/</wfw:commentRss><trackback:ping>http://on10.net/23234/Trackback.aspx</trackback:ping><category>podcasts</category><category>space elevator</category></item><item><title>How Microsoft's External Research Division works with a new breed of e-scientists</title><description>&lt;p&gt;Tony Hey, VP for the External Research Division within Microsoft Research, leads the company's efforts to build external partnerships in key areas of scientific research, education, and computing. He's been a physicist, a computer scientist, and dean of engineering, and for five years ran the UK's e-Science program. These experiences have given him a broad view of the ways in which all the sciences are becoming both computational and data-intensive. Microsoft tools and services, he says, will support and sustain the new breed of scientists riding this new wave. &lt;/p&gt;
&lt;p&gt;
Audio: &lt;a href="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hey/hey.wma"&gt;WMA&lt;/a&gt;, &lt;a href="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hey/hey.mp3"&gt;MP3&lt;/a&gt;
&lt;/p&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;&lt;img alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hey/hey.jpg" /&gt;
            &lt;div&gt;&lt;b&gt;Tony Hey&lt;/b&gt; &lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: For this series of interviews I've spoken to a number of Microsoft folks who are working with external academic partners on projects that fall under your purview. The list includes Pablo Fernicola's &lt;a href="http://perspectives.on10.net/blogs/jonudell/Word-for-scientific-publishing/"&gt;Word add-in for scientific publishing&lt;/a&gt;, Catharine van Ingen's collaboration with Dennis Baldocchi at Berkeley on the &lt;a href="http://perspectives.on10.net/blogs/jonudell/Making-sense-of-C02-data/"&gt;analysis of C02 data&lt;/a&gt;, and Kyril Faenov's HPC++ project to bring &lt;a href="http://perspectives.on10.net/blogs/jonudell/Cluster-computing-for-the-classroom/"&gt;cluster computing to the classroom&lt;/a&gt;. These are all pieces of your puzzle, right?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Absolutely.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: By way of background, you've been a physicist, then a computer scientist, and then for a time led the UK's e-science program.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Which would be called cyberinfrastructure in the US, yes. I'm on the NSF's advisory committee for cyberinfrastructure, it's a very similar goal.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: And then you surprised a lot of people by joining Microsoft. Take us through your initial role leading the TCI [technical computing initiative] and on to your current expanded role leading MSR's external research efforts.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Right. So having been a physicist, and then a computer scientist working on parallel computing for years, and then chair of my computer science department and then dean of engineering, I think I understand the community we're trying to work with pretty well.&lt;/p&gt;
&lt;p&gt;Also, as you mentioned, I worked for 5 years running the UK e-science program. That was about huge amounts of distributed data, and collaborative multi-disciplinary research in a variety of fields. The environment, bioinformatics, almost every field of science now has some element of distributed and networked collaboration.&lt;/p&gt;
&lt;p&gt;The science agenda was for the tools and technologies to make that collaboration trivial, just as with Web 2.0 your grandmother can do a mashup.&lt;/p&gt;
&lt;p&gt;I don't think the UK e-science program achieved that, but I do believe that Microsoft can help make tools and technologies available that will help scientists and researchers do their work.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: In your parallel computing phase, you helped write the MPI [message passing interface] specification, correct?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Yes. I've been in this for 30 years, on and off. I have very good friends in the high-performance and parallel computing communities here in the US, and I was involved in European projects. There was a danger that the Europeans would go one way, and the US another, so it was time to see if we could get the community to put together a community standard. &lt;/p&gt;
&lt;p&gt;It isn't an ISO standard, there wasn't a big standards body, it was a group of experts who got together with the academics and with the industry players. Rather a small set, and we used to meet every 6 weeks in Dallas airport, so you really had to be dedicated to go there.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: [laughs]&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: But what came out of it was a standard which has stood the test of time. I co-authored and initiated the first draft. It's been much changed since then, and I don't take credit for the final thing, but I did try, with Jack Dongarra, to initiate the standards process, and I think I remember buying the beer at the first session.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What's interesting to me is that despite that, you've been a vocal skeptic regarding raw grid capability. And you've been very careful to stress that in your view, the real challenges have to do with data -- the ability to combine large quantities of data from multiple sources, and enable people to make sense of it.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Yes. I used to work in high-end supercomputing and parallel computing, but what distinguishes this decade is that we'll collect more scientific data than we have collected in the whole of human history. Instead of struggling with the problem of too little data, scientists will be struggling with the problem of huge amounts that they can't process or analyze. And it may be stored in different places, on different continents, so how do you put it together? How do you federate?&lt;/p&gt;
&lt;p&gt;That's the real challenge. Very people want to use petaflop computers. Most of the biologists, chemists, and engineers only need lesser capabilities that can be provided by just a simple cluster. And then you put the cluster where the data is, because that's what's difficult to move around. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Yes, Kryil Faenov made this same point in my interview with him. There are only a handful of intergalactic cloud infrastructures of the sort that a Google or Amazon or Microsoft can support, they're one-of-a-kind beasts, and you can't always bring your data to them. So he's interested in enabling organizations to stand up their own more modest clusters at the sites where the data lives. &lt;/p&gt;
&lt;p&gt;So, let's discuss the opportunity that you see. In another interview you said: &lt;/p&gt;
&lt;blockquote&gt;Rather than wasting the enthusiasm and talents of science graduate students by assigning them the task of building systems capable of handling, analyzing and mining literally petabytes of data, scientists should look to computer scientists and the IT companies to raise the level of abstraction and to provide them with the components of a reliable and functional cyberinfrastructure. &lt;/blockquote&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;That's the most concise mission statement I've found for what you're doing.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Exactly right. Part of my reason for joining Microsoft was having had a great friendship, and many discussions and arguments, with Jim Gray, from 2001 onwards. &lt;/p&gt;
&lt;p&gt;We argued and disagreed on many things, but we also agreed on things, and what we agreed on in particular is that a different paradigm is emerging. So for example there's experimental physics, there's theoretical physics, and now the third paradigm, it's clear, is computational physics based on simulation. &lt;/p&gt;
&lt;p&gt;What we're looking at here is data-centric science, where you'll do collections-based research -- like you do in mashups, but now with scientific datasets. And increasingly, you'll use semantics to get from data to information to real knowledge. &lt;/p&gt;
&lt;p&gt;So I came to Microsoft partly because of Jim Gray, but partly because I think companies can help. I struggled mightily with just open source tools. I used to produce open source tools myself, as an academic. MPI has a wonderful open source implementation, and that was one of the key things that we did.&lt;/p&gt;
&lt;p&gt;But I also know that open source, particularly when produced by academics like myself, well, it works on my machine, but if you want it to work on your machine, that's your problem. &lt;/p&gt;
&lt;p&gt;So one of the things I set up in the UK was, in fact, a software engineering center called the &lt;a href="http://www.omii.ac.uk/"&gt;Open Middleware Infrastructure Institute&lt;/a&gt;, where I put a lot of money in to get these open source codes tested and documented and made more reliable and sharable.&lt;/p&gt;
&lt;p&gt;That's why I think that a judicious mix of open source with commercial -- it could be from IBM, from Oracle, from Microsoft -- is the way to provide a more reliable infrastructure.&lt;/p&gt;
&lt;p&gt;That's part of the motivation for the tools we're producing around the technologies that scientists use to do their publication, their data mining, and so on. I think Microsoft can really take a lead here, and that's why I joined.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Elsewhere you've said: &lt;/p&gt;
&lt;blockquote&gt;Essentially I match up Microsoft researchers with major scientific problems that computer science technology can help to solve. &lt;/blockquote&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;What are those major problems?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: So, I came with a purely scientific mission with TCI. But now I've moved into Microsoft Research, and we have a bigger agenda. In terms of external research, we focus on four themes. &lt;/p&gt;
&lt;p&gt;One is health and wellness. That's bioinformatics, medical solutions, and so on. Really exciting, we've got some good projects in that area.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: I've talked to Kris Tolle and have done an &lt;a href="http://perspective