Latest Comments
Search

Lots of papers, few implementations

Why aren't there more open source text summarization projects?

There seems to be a ton of papers on text summarization. I would like to get my hands on something decent to experiment with. And don't tell me Mac OS X Summarize service or Microsoft Word. :)

Predict and Prevent

While the hot new thing from Google is knol, defined as a unit of knowledge, I believe there is something else pretty darn hot was just announced too which can, hopefully, identify hot spots (you knew that pun was coming) - Predict and Prevent at google.org.

Rapid ecological and social changes are increasing the risk of emerging threats, from infectious diseases to drought and other environmental disasters. This initiative will use information and technology to empower communities to predict and prevent emerging threats before they become local, regional, or global crises.


This is part of Larry Brillant's vision for the future - see my earlier blog post on this.

Just in case you missed it...

Google has open sourced protocol buffers.

Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats.

Protocol buffers are a great technology. It is one of those things that you typically don't stop to think about - what kind of technology goes into a car, a computer or a plane when you are using it?

And that's not the only cool technology that Google has released - check out our Google C++ Testing Framework and Google commandline flags module for C++.

Software Engineering

"Make everything as simple as possible, but not simpler."

- Albert Einstein

Writing software is a messy business - You are tasked with coming up with a solution that works. This is probably best exemplified by the Netflix Prize competition (see wikipedia article for a quick overview if you have not heard of it before).

One consensus I got at KDD Cup 2007 was a slight disappointment at the "hackish" nature of the leading team, which later won the 2007 progress prize. Their winning solution is described here.

It is easy to have bloat creep into code as bugs are fixed and features are added.

One positive plus of working in Google is the inclination to frown on code bloat, and engineers are recognized for coming up with simpler implementations. This philosophy of constant iteration of software engineering ensures that systems are capable of keeping up with features while being maintainable. (Unit testing is HUGE here too)

MOM

Doing a Google search on "mom" shows how overloaded the term is. Poor mothers... Not getting the mindshare of searches all over the world.

And of course, again we see evidence of Singaporeans' love for acronyms as Ministry of Manpower [mom.gov.sg] is such a high-ranking result.

Singapore finding unknown unknown threats

Interesting nugget - Larry Brilliant's won the TED prize in 2006 and his TED wish was to create a new global system that can identify and contain pandemics before they spread. See video [ted.com].

Singapore's version - RAHS (Seriously, do we need to make Risk Assesment And Horizon Scanning an acronym?). If only we could have used it to find, oh I don't know, missing terrorists or recalcitrant political activists. :)

Orh

Today I found out that my wife nodding and replying "orh" to what I say does not constitute agreement or even understanding of what I am saying.

Case in point: Lately it has been very hot in Mountain View, reaching over 37 degree Celsius in the past two days. Our house, which does not have any inlets except for the main door, is generally not in the direction of the wind. This causes the apartment to a furnace during hot days and starts baking the occupants - something that I start hearing about in the afternoon when my wife calls me and yelping for mercy from the heat.

One solution we found is to switch on the fan while the door is opened. The fan draws the cooler air from the outside and cools our apartment. This solution is not ideal as it leaves, well, the door open (i.e. you can't be sleeping). As an added bonus, insects get in.

I started proposing an alternate solution to my wife over lunch - use a fan to blow air out of a window, and open the other windows while leaving the main door closed. As air is being drawn out of that one window, this creates moving air (aka wind) and circulates the cooler air from outside to inside. And even if the air outside is just as warm as it is indoors, the air movement will cool the insides of the house, thus preventing baking of the occupants.

Arriving home in the evening, I proceeded to close the main door, bring the fan to the bedroom window and face it outside. My wife walks in and demands to know what I am doing. To which I replied, "trying out that hypothesis I mentioned earlier today". I walk outside to the living room and show her that cool air is starting to be circulate through the house, and say "see... it works!"

My wife exclaims: "oh is this what you were talking about over lunch? I still have no idea what you are talking about though."

Canonical Strings, or, why I like Python

I needed a quick and easy function to map strings into a canonical form. In this case, punctuation, upper/lower case, and word order are not important. i.e. "!$%!@$!@!This!?! is... a test" == "a test this is". Less than 1 minute and I am good to go with...

import re
re_punctuation = re.compile(
r"[`~!@#\$%\^&\*\(\)\-_\+={\[}\]\\|;:\'\",<\.>/\?]")
def GetCanonical(input):
canonical = re_punctuation.sub(" ", input.lower()).split()
canonical.sort()
return ' '.join(canonical)

GetCanonical("This is a test") == GetCanonical("a test this is")

Cookie Monster Meditates

Me love cookies. Me tend to get out of control when me see cookies. Me know it not natural to react so strongly to cookies, but me have weakness. Me know me do wrong. Me know it isn't normal. Me see disapproving looks. Me see stares. Me hurt inside.


When me get back to apartment, after cookie binge, me can't stand looking in mirror—fur matted with chocolate-chip smears and infested with crumbs. Me try but me never able to wash all of them out. Me don't think me is monster. Me just furry blue person who love cookies too much. Me no ask for it. Me just born that way.


Me was thinking and me just don't get it. Why is me a monster? No one else called monster on Sesame Street. Well, no one who isn't really monster. Two-Headed Monster have two heads, so he real monster. Herry Monster strong and look angry, so he probably real monster, too. But is me really monster?


Me thinks me have serious problem. Me thinks me addicted. But since when it acceptable to call addict monster? It affliction. It disease. It burden. But does it make me monster?


To read more, click on the original article Cookie Monster Searches Deep Within Himself and Asks: Is Me really Monster? by Andy F. Bryan.

We're having too much fun here

The article based on the interview with Straits Times is now available - Our Guys in Google [straitstimes.com].

I am now wondering if I remembered wrongly - is it supposed to be the 100 feet or 150 feet rule?

Oh well, hopefully this nugget of information does not become a circular reference like the Sacha Baron Cohen past job history factoid as featured on slashdot. :P

Page 1 ... 4 5 6 7 8 ... 12 Next 10 Entries »