Wayne Danielson

Oct. 19, 1994

Cognitive Science Club

Experimental Science Building

ESB 137. 471-3358



Inventing an Artificial Intelligence Personal Editor for News


Will speak from notes tonight. 


This is an experiment -- but it is a long one.   In one way or another it has occupied important portions of the scholarly part of my life for 30 years or so.  Be prepared for a little history.


Furthermore, the experiment is not complete.  In all probability, I will work  on aspects of the problem for the rest of my academic life.  And I suspect -- and hope -- that some of you here tonight will continue to work on this problem after I am contentedly sitting in my wheelchair enjoying the spring sunshine at the Bide a While Nursing Home.


Which is to say that  I think  the problem is an important one, one worthy of the best thought of our society’s best young minds.  What is the problem?  It is simply this -- we are leaving behind the age of mass communication, the age that began with the invention of printing with movable type 500 years ago in Germany.  We are leaving behind  the age of identical products like books and newspapers and magazines and tv newscasts distributed to vast numbers of people who find in them bits and pieces of what they really want to know.  We are entering an age of individualized communication  systems that enable us to go out into vast fields ofelectronically stored  information and and return with what we want to know when we want to know it. 


And what’s the problem with that?


The problem is simply that more is out there than we can handle. 


We are awash in  information.  And the problem grows worse every day.


We need help.  We need to create and send out into the information network intelligent agents -- clones of our cognitive selves --  who can return with the information we need and want, in forms intelligible to us, and edited into  time frames that fit into our daily lives.


Imagine this scene.  It is a beautiful fall day.  You have just had lunch on the West Mall.  You look at your watch.  You  see that you have 15 minutes before your 1 o’clock class.  You reach into your notebook and pull out a thin plastic sheet -- about the same size as a sheet of paper.  You press  the upper right hand corner and it lights up, in color, offering you some choices.

You select news.

It offers some additional choices, including how much time you have to spend.

You say 15 minutes.

At once, you begin to see on your screen, the news and pictures, the sights and sounds you would have chosen from all the stories being told around the world if you had had the time to go out into the information network and retrieve them and edit them for your typical reading speed. 

Furthermore, as you scan the news, your AI personal editor -- for that is what we are talking about tonight -- is noticing how much time you spend with each story, and what category of news that story represents.  It is learning more about your interests and how they  are changing and how it might do its job better the next time.

Your 15 minutes is up, and you have finished the last story.

You are ready to quit, but your editor  flashes one more message:  Here’s a story  not in your typical category that lots of people have chosen today.  Are you interested?  The first paragraph appears.   It’s another story about Prince Charles.  Ho hum.  You say thanks and tuck your device away and head off for class.

Okay.  That’s the end state.  The question is -- how do we get there?  How far do we have to go?   How far along are we?   What are the intellectual, the theoretical problems that need to be solved?  What are the practical problems ?  What can the academy contribute?  What needs to be done by government?  What needs to be done by industry?  How will we pay for what we get?   What are the likely effects on our society when mass communication as we know it  gradually fades away  to be replaced by highly individualized communication systems?

It is an interesting problem, isn’t it? 

And interesting solutions to various aspects of the problem are being discovered daily.  I think we will see more change in the next 10 years in the information industry than we have seen in the last half century.

Tonight, I would like to discuss some of my own work that may contribute toward the  emergence of the AI personal editor.

I. Chapel Hill in 1960s.  Can put the contents of an entire newspaper in the computer.

A. Hyphenation, justification.  Mousek-eteers.  Crui-sers.  Gol-dwater.

B. Electronic editing.  Deletions and Insertions.

C.  An editing algorithm.

1. percentages of news in various categories.

2. adjusted by actual flow of the news.

3. within categories, an editing formula -- Zipf-Danielson that

                      assigns space realistically -- giving more space to more

      important stories and less space to less important stories.

4. adjusted by actual available news copy.

D. Simulating various newspapers.  No thought of an individualized paper.

      No way to deliver one.

E. Published in Journal of the ACM under the heading “unusual applications.”

II. Texas in the 1970s.  Automating content analysis.

A. Content analysis involves reading text and putting it in categories.

B. Could we get a computer to do this?

C. Work at Harvard with General Inquirer. Philip Stone.

D. Answer was yes, if you weren’t too particular.

              E. Computer could read the news and do a simple coding.  Could recognize sports stories by the specialized words that occur in sports.  Could recognize business stories.  Could recognize weather stories.  Could recognize stories about accidents and disasters.  Could recognize stories about war, rebellion defense.  But it had trouble in closely overlapping areas:

Some crime stories about guns sounded like war.

Some disaster sports stories sounded like business.

Some science and technology stories sounded like defense.

Some stories about medicine were really stories about politics -- and vice versa.

Agreement with human coders was about 85 percent.  Human coders had trouble too, although they could make finer distinctions with careful training.

III. Texas in the 1980s.  Advent of the personal computer  changed everything.  A. We were getting out of big main frames.   Same algorithms publishing industry had devised the decade before were now available on the desktop.  Took a few years to understand what all this meant. 


B. Desktop publishing arrived.  People could take a product from initial key strokes to what looked like a published page.   A cottage industry sprang up in publishing.  Didn’t need a  printer any more.  Could do it all yourself. 


C. Particularly interesting was the notion that the pc was not only an analytical device -- a computer as we had known computers in the past -- it was also a communication device.  It could go out and fetch stuff and bring it back home and work on it.

D. Began not merely to use computers to classify text in isolation -- but to study changes in text over time.  Long-range social trends could be identified.  Disappearance of agriculture news.  Increasing attention to technology and numbers. .

Increase in attribution.  Decrease in attention to persons. 


E. More subtle categories were identifiable in the news.  Stories about men, women?  Stories about conflict?  Stories about innovation, invention?   Broad categorization of stories was easier now.  Computer’s ability to understand the news was improving.  Once you knew story was about a football game, for example, the computer could extract information from the story and answer questions about what happened. 

IV. Texas in the early 1990s.  Not just content analysis, but content synthesis. 

A. You don’t just take text apart, trying to understand what it’s about, what it means.

B. You can also put it together.  You can synthesize content. 

C. Arrival of hand-held communication devices -- the Newton.

D. Widespread use of broadcast communication networks -- mobile phones and paging services, message services.

E. Installation of high-speed, optical cable serving schools, homes.

F. Incredible growth of the internet -- computers talking to computers. 


V. What we have now:

A. Computers that can read stories and categorize them.

B. Computers that can judge  the relative importance of stories according

     to conventional news values:

     *Prominence

    * Immediacy

     *Significance/ consequence

    *Conflict/ suspense

     *Novelty, innovation

C. Computers that a re fast enough and portable enough to serve as

     communication devices.

D. Computers that can observe what we do with stories -- at least at such simple levels as how much time we spend with different categories of news -- and change or shape their future behavior based on these observations.  In other, words, computers can learn.

E. Algorithms that can approximate what editors do when they cut and trim        stories to make editions of the news for newspapers, magazines, radio and tv news.

F. Put all this together, and you have the prototype of a personal AI editor.   And that’s about where we are today.  It’s an interesting mixture of old and new.  Some ideas were arrived at years ago out of the sheer interest in research.  No one in the 1960s, I think, envisioned what we have today.  Other ideas are still being developed by students in classes and seminars.

All the separate pieces are in place and work and can be demonstrated now.

We have the content analysis techniques.

We have the editing algorithms.

We have the learning algorithms.

We should have an integrated  model by the end of the spring term that will work on pc’s you can plug into  to news data sets such as the AP, UPI or Reuters.   The portable, carry-around-with-you models will take longer to develop.