Wednesday, June 30, 2010

You put the teapot on the ground
The world is ending all around
Water boils
Stirs and whirls
Then silence, nothingness surrounds

Monday, April 26, 2010

The Problem Blog

Imaging a web page that attempted to ask 6 billion people the same question: "What problem bugged you the most today?"
How do you make this data into a resource?  Ideally, at the end of the project, you could answer the following:
What are the top 10 problems that we face as humans?  top 100?
What are group X's top 10 problems?  group Y?
What problems does group X face that group Y does not?
What problems do we all face?

So, now we have some requirements.
You would need the page to be developed in multiple languages to reach the entire global population.
I assume that you would need to restrict the answers to only a few words and that you would need an algorithm that translated all of the answers into the same language for consolidation and analysis.  You would need to bin the statements into buckets in order to get a grouping of information (for top 10 counts).  This would require an understanding of grammar and context.  You would need to get some user information to generate the grouping (age, sex, location, income... or a reference ID that would allow the system to mine another data source for this type of information - membership to social media sites perhaps (LinkedIn, Facebook, Twitter, Amazon wish list...) or government sites (SSN#, passport #).

Some problems:
For example:  My problem is that I cannot get clean water.
Problems:  How do I (as the one entering the data) know that this is a problem and that the water I have is not clean?  Relatively, it may be the cleanest water available.  I would need to have access to information about global standards of water cleanliness in order to diagnose this.  What problem and I more likely to announce?  A problem where someone else (who I am aware of) has something that I don't.

Artificial intelligence

What is artificial intelligence?  In short, it is statistics.

Where the story is the unit of measure for human intelligence; data is the unit of measure for computer intelligence.  This is where we will always be different.  In the human mind, the one can be greater than the one thousand.

Asking an AI "should we do this or that" asks two questions.  One:  Have you seen this situation or a comparable situation before - do you have data on this question?  and Two:  What is the statistically correct response for the desired outcome.  But humans are opposed to this.  We are passionate about the outliers and want to know more about them.  We fall in love with the one-off result because that situation makes the best story and the best story is the one most remembered.  

The one-off may also hold clues for us to arrive at new solutions.  Some times the human is right and the AI is wrong.  There are times when statistical outliers begin to correlate to certain characteristics.  And in this way, they are sub-segmented and cease to be statistical outliers due to the new sub-segmentation.  The idea is that if you could replicate exactly the same circumstances, that you could replicate the outlier.  In some ways this may be true but there is still randomness in the data.  If you flip a coin - regardless of the circumstances - you will get some heads and some tails.

How do you separate the correlation to characteristics that subset the data from true randomness and how much data do you need to derive that?  Is it possible to arrive at this segregation?  Is this the new truth of human/AI dependency:  that there will never be a perfect AI due to the chance of outlier replication potential and unknown subsets of data.