context free press
"Many types of textsthough not all of themare so monotonous on the language level that they can be produced automatically without any very complicated cognitive Artificial Intelligence models. A meticulous corpus analysis can reveal how similar elements and rules are used over and over again: one then only needs to reproduce them with a computer. This procedure will be demonstrated and discussed using television news..."Right on!
"...[Our computer] program is based on an empirical analysis of news bulletins which have actually already been broadcast; it could, however, keep on generating texts which could be broadcast right into the next century. This is due to the fact that news texts acquire their topicality solely from the random mixture of constant elements belonging to the "Symbolfeld" and the variables belonging to the "Zeigfeld" valid at the time..."I love this guy! That's "Zeigfeld" rhyming with "Seinfeld" (not the follies). zeigen: a german transitive verb meaning "to show."
"...the concept presented here for a computer simulation of German television news can be perceived both as a project for the entire setting up of a news machine and as a criticism of the hollow simplicity of the usual news bulletins. In other (polemic) words: artificial intelligenceof a particularly simple kindcould replace natural stupidity or could show it up. The political conclusions of the "dequalification of knowledge leading to the fact that people are informed rather than being able to discuss issues" (Schmidt 1986) have not yet been drawn...I *really* love this guy!!
Here's a copy of the whole paper: [Schmitz 1994].
February 2002 Notes
So. To get started, I used 1990 US Census data to web-whack together random people random places, and random company names.
Here's a description and local copy the US Census name data I used for the random people [HTML].
Here are some interesting notes from the Census bureau on their methods [HTML].
(August 2001) I typed in some notes on fictive news story generation [PDF].
(September 2000) I formed the unlikely conception that people would actually buy a consumer device that spit out totally random stuff, and started thinking about the business side of it, as if there were any [PDF].
The thing is, *I* would buy a device like this, instantly. But other people?
Jan 2002: Random crimes
Crime reporting seems like a natural place to start. How to generate a random crime?
The U.S. Department of Justice and the FBI collect and publish "Uniform Crime Reporting" statistics that follow data formats outlined in a 135 page document "National Incident-Based Reporting System (NIBRS): Volume I: Data Collection Guidelines" [PDF].
This FBI stuff lets you select a crime at random according to appropriate observed distributions, and also fill out some relevant details on victims, weapons, circumstances, etc.
Once the "crime" is known, it remains to fill out the news story by surrounding it (at random) with some salient features, often driven (I'm thinking) by the progress of the case in the judicial system.
Feb 2002: Data
Some word lists [directory]
An online Plain Text English dictionary [turned off the link to kill robots]
List started 20 March 2002
SIGGEN Resources in text generation.
CLINT, a Template/Word-based Text Generator. [local HTML]
CLAWS7 Tag Set, part of a part-of-speech tagging system [web]
Grady Ward's Moby, a lexicon project including parts of speech databases.
2 April 2002
Large US Cities & Automatically Constructed Geographic Phrases
64 cities have more than 250,000 people.
200 cities have more than 100,000 people.
555 cities have more than 50,000 people.
I also found a handy program perl program dist_pl that computes great circle distances and direction data from lat/long pairs such are found in the US census data I'm using. [My notes].
I intend to use this data to form geographical fragments such as:
...in Leland, a rural Iowa town...
3 April 2002
Anaphoric & Cataphoric
Main Entry: an·a·phor·ic Pronunciation: "a-n&-'for-ik, -'fär- Function: adjective Date: 1904
3 April 2002
The Prison Escape
I've done some experiments putting together a runtime system for the automatic generation of news stories about prison escapes.
I used a short initial fragment of this actual Associated Press story as a model:
GUTHRIE, Okla. -- Four prisoners broke out of a county jail Wednesday by smashing through a ceiling and an inner wall and escaping through an air conditioning duct.
23 September 2002
Civil servants have "made up" personal details for at least 1 million people and added them to the results of the 2001 census...(read more)
8 Nov 2002
Plot units include success, failure, motivation, change of mind, perserverance, loss, resolution, trade-off, mixed blessing, hidden blessing, sacrifice, killing two birds with one stone, fleeting success, starting over, giving up, intentional problem resolution, fortuitous problem resolution, success born of adversity, threat, and promise.
12 November 2002
Fatal Car Crash Model
Onondaga County Sheriff Kevin E. Walsh said Friday that deputies are investigating a two-vehilce crash that claimed the life of a Fulton teen-ager.
23 July 2004
The Context Free Press
I've returned to this idea (after a break of about two years), and have started to put together a fictive news service, the Context Free Press.