Thursday, November 10, 2016

Ode to Daylight (1997)

Ode to Daylight*

by Jeremy Yang (1997)
*May be sung to the tune of the "Beverly Hillbillies Theme".

Come and listen to the story of a man named Dave,
He was working at the EPA, intent the world to save.
Then one day, he was hacking on some files,
When up off the screen came the very first SMILES.
Simplified Molecular Input Line Entry System, that is -- compact and canonical!

Next thing you know, Al Leo's on the horn,
With Dave's help, CLogP is gonna be reborn.
Says California's the place you ought to be,
For awesome scuba diving, and hydrophobicity.

Now the CLogP program, was only just the start,
MedChem grew, with David's brother Art.
Arthur is a juggler, he's on his zillionth throw,
He's got more balls than you even want to know.

One day a strange thing arrived in the mail:
Money for some software, a really big sale.
The college accountants said this will never do,
So hit the road guys, and take your software too.

Now software needs hardware, like a bear needs his woods,
And a fellow named Yosi soon arrived with the goods.
Yosi is from Israel, a man of the world,
With a dangerous grin, and a natural curl.

A company was started, Daylight Software had begun.
Dave Leo joined the gang; he's Dr. Al's son,
Dave is a veteran, though his scars are fading now boys,
All ready for the rough and tumble world of software cowboys.

Next Daylight split, from a bizarre twist of fate,
Some stayed in the town LA and some went to the state.
Now that bizarre twist of fate -- Dawn is what she is called --
She's a doctor and she likes to doctor Dave best of all.

Right about this time a chap named JJ joined the krewe,
You can call him "J", "JY", or "Jeremy" will do.
After 6 months of hard training JJ said "I'm up to speed",
"I can pass 6 clubs with Art and synchronize a 3-way-feed."

In 1990 MUG was such a bon temps celebration,
That a California dude named Craig decided on migration.
With Sandy and the kids, to HP they said "Goodbye!",
To the land of gumbo, crawfish, gators, swamps and jambalaya.

In 1992 to Santa Fe we moved the store,
From low to high and wet to dry and where they check your aura.
We get the best psychic surgeons and aromatherapy drugs,
If only shamanistic healing worked on software bugs.

Jack and Liz Delany next appeared on the scene,
They moved from Rochester, where the weather's really mean.
Jack is flying planes now, he's earned his pilot's wings,
He's used to crashing software -- we hope the only things.

The month it was December, but then arrived our June,
She's charming and she's pretty and she really sings a tune,
Administratively exceptional and well endowed with words,
She's ruining our reputation as a screwy bunch of nurds.

Then Norah Shemetulskis arrived to join the krewe one day,
The Wild West called her name and she just could not delay,
The code corrallin' software cowgirl game is what she'd play,
So she loaded up the cats (and Mom) and left for Santa Fe.

Art's now in Seattle, he's there with his new bride --
Dave Leo's now with BioByte, MedChem's new joy and pride --
Craig has moved just down the street since he's been MSI-ed --
But the software keeps on growing, taking us all for a ride.

With networks reaching out around the world so very wide,
With Internet and WWW there's no place left to hide.
Daylight Inc. says we make the net work for chemistry,
But the net is us and we're the net in holistic harmony.

So take my hand and click my mouse and off we'll go exploring,
It may be weird, it may be strange, but it ain't gonna be boring.

Thursday, January 2, 2014

The Mythic and Mystic Origins of Informatics

Late one lonely, dark night in the 1980s, out on the rugged frontier of knowledge,  Library Science and Artificial Intelligence found each other.  In yearning and need they reached out, beyond the familiar.  LS was drawn to AI’s raw power and bold, adventurous spirit.  AI was attracted by LS’s ample stacks of data, bursting with meaning, earthy and redolent with real, precious knowledge.   Once together, their union was unbreakable.  The spawn of that union?  Informatics.

During the Manhattan Project in Los Alamos in the 1940s, nuclear fission was modeled by semi-automated computation by adding machines and expert human operators.  Through electronics, computers became increasingly powerful and automated mathematical calculators.  Early programming referred to hardware configuration of digital or analog electronic circuits. With data storage via magnetic tape and punch cards, “software” was born.  And Electrical Engineering begat Computer Science.  Computer Science branched into theory of languages and algorithms, and allowed us to reconsider the computer, not just as a super calculator, but as a symbol processor, maybe even as a cognition engine.


Others might have courted and conoodled with Library Science but were too dazzled by their own sounds and lights.  Computer Science was ascendant, exciting, in demand.  The Computer Age was upon us, and Computer Science had prodigious attitude (though the adoption of “science” in its name reflected a certain insecurity).  Data, it was thought, is data, only a single glyph on a complex flowchart of algorithms, symbols, formalized logic, the languages of cyber cognition.

Two tipsy bar patrons settle a dispute on the roster of the 1969 Mets using their smartphones in a few seconds, then move on to another crucial topic.  During a keynote lecture at a scientific conference, a student in the audience checks factual claims on her laptop and kindly offers corrections to the (one hopes) grateful expert.  3rd graders know how to look things up “on the Internet” and are evolving a cultural understanding of this shared and readily accessible knowledge-base quite new to our species.  The research of the 3rd grader or drunken debater is not so much analogous to the literature research of the professional scientist, it is much the same.  Of course not everything you read on the web is correct.  Neither is everything you read in scientific journals.  This equivalence can inform our progress as we all face similar challenges in improving online resources and our shared cyberinfrastructure and cyberculture.  Wikipedia and Google are hyper- examples, too big to be only examples of Informatics.  Google’s mission statement: "to organize the world's information and make it universally accessible and useful" describes what librarians have done since librarians began.  Arguably there is an analogy between Wikipedia and Google and the pre-Guttenberg great libraries (e.g. Alexandria) which preserved, promoted, and propagated knowledge and culture through scrolls and books.  Like the ancient libraries, these modern structures convey huge volumes of information and thereby assume special, though not unique, roles in the culture and collective knowledge.


Philosophy and Mathematics are understandably self-satisfied in their supremacy in the knowledge hierarchy.  The immutable, perfect forms of Socrates, numbers and theorems of Pythagoreas and Euclid continue to be worshipped by lesser fields.  Why condescend to the level of data?  Data, as Library Science will readily admit, is error ridden, inconsistent, noisy, untidy.  But there is beauty in tidying.

The meaning of “Artifical Intelligence” has evolved and deserves explanation.  The Turing test concept of artifical intelligence as indistinguishable from human intelligence is an important reference.  But it now can be applied and refined given for the variety of modern computing methodology.  Some computation may be regarded as superior to human intelligence (e.g. chess playing).  However, visions of computers learning in the flexible, adaptable, way of human toddlers seems far from realization.  Computer cognition  is an important kind of intelligence whether or not like human intelligence.  Important now, in 2014, in very concrete ways, because it is already controlling much of what goes on in our world.  In addition to questions of human intelligence, and computer intelligence, what about combinations as humans develop devices which enhance cognition and memory, and, maybe dramatizing a bit, co-evolve towards some cognitive sym-cyber-biote.


Natural Sciences enjoy their disdain for Social Sciences.  How could belief and behavior possibly be as rigorous and scientific as the mass and velocity of an electron?  However, eventually Artificial Intelligence recognized that to develop computer cognition, it became necessary to better understand human cognition and social science.  Neuropsychologists are fond of scanned, quantified brain activity, but must also infer cognitive state in very human ways.  To the discomfort of some, Philosophy is not far from questions like What does it mean to know?  How can we learn?  What is the fundamental particle of knowledge?

What is “informatics”?   And why might “informatics” be a very big deal right now.  Why might “informatics” need quotes?  Because like with any new field, terminology is in flux, obfuscated and distorted by hype.  Popular magazines and scientific journals alike note that we are in “The Information Age”, the era of “Big Data”, and that science has evolved a “4th Paradigm”.  The Internet and World Wide Web have apparently given rise to “Web 2.0”, and the “Semantic Web.”  We apparently suffer from “information overload” and “data loss” due to limited “Cyberinfrastructure”.   We must be “data driven” and avoid “data silos” in our “4th Paradigm” thinking.  Thanks to modern social networking technology, we can upload, in a few touches of a handheld, megabytes of images (happy hour hijinks, a new gas grille, a cat making a face, a homebrew recipe), to “The Cloud”, our cultural computing commonwealth.  As if we didn’t have enough data to worry about, we should also mind our “metadata” and beyond that our “ontologies”.  Data loss, data corruption and cybersecurity issues keep us nervous as do concerns that we are falling behind the technology curve and our smart phones are not keeping up with the neighbors.  


The emergence and ascendance of Informatics is unnoticed by some, admired and even revered by others.  There is a predictable tendency of some to elevate and expand Informatics beyond even its great importance.  When over hyped and over stated, Informatics becomes synonymous with Knowledge, or Science, or Language, Culture or Civilization.  All good, but we don’t need new names for them.

From Big Data, sophisticated computer models produce economic forecasts, or suggest what we should consume for better health and longer life.  Simultaneously, data can be selected and shaped to statistically support almost any claim.  Scientists and citizens alike are vulnerable to misrepresentations and abuse of data, both accidental and intentional.   Computer models now dominate fields from the stock trading to baseball management.  For those who must take risks on future outcomes, Informatics is a game changer, separating winners and losers (e.g. 2012 election, multiple professional sports leagues).


The epic union of Library Science and Artificial Intelligence may well inspire religions and myths if history is a guide.  There is a resonant truth in the indivisibility of memory, information, knowledge, cognition, and computation.  All logic is expressed in symbols built upon elemental experience.  All experience is defined by logical frameworks of cognition.  Somehow the ultimate learning genius, a human baby, in the ultimate boot-strapping algorithm, begins all with the force and warmth of lips on nipple, and develops a system of knowledge which encompasses quantum physics or abstract art.

Keywords: Library science, artificial intelligence, information systems, automated reasoning, machine learning, computer science, electrical engineering, cognition, 4th Paradigm, Big Data, The Information Age, Internet, World Wide Web, Semantic Web, semantic technology, data science.

Wednesday, November 13, 2013

Evidence, patterns, filtering. Charles Darwin, Sherlock Holmes, and Google.

Much has been made of the human ability to recognize patterns, seeing shapes and trends in complex scenes.  We can detect familiar faces in crowds, clouds which look like dragons, trends in data.  Also notable is the human ability to filter and ignore data.  Filtering is essential in processing new data.  

Data overload is arguably not a new problem.  Consider a World Cup spectator in a crowded stadium.  A pre-literate tribe member in a dense rainforest.   Charles Darwin exploring new ecosystems during the voyages of the Beagle.  Sherlock Holmes at a crime scene.  Human neurology and cognition enables these observers to filter what is potentially paralyzing volumes of data, focus on relevant information, and detect patterns.

The Sherlock Holmes example is different, and not just because he is a fictional character (supposedly).  Dr. Watson might not agree that a crime scene is rich with observable data.  But to Sherlock Holmes it always is, since he is exceptionally skilled as a detective, and highly trained in esoteric arts such as classification of tobacco ash and natural fibers.  So data overload can depend on the tools of observation used -- or misused.  A modern example is that a few cheap webcams can easily generate terabytes of mostly useless data per day (e.g. surveillance video) which can be stored and consequently require effort to distinguish from more valuable data.  We can agree that Hubble telescope video is more precious than surveillance video, but question the costs vs. the benefits of the data volume produced by our most prized and advanced observational tools.

If a crime scene included a corpse lying prone with a dagger in the back, I think Sherlock Holmes would refrain from tobacco ash analysis and focus on the dagger, at least initially.  Since he is not called to easy cases, maybe eventually the dagger will prove to be a false lead, and the tobacco ash will deserve attention.  But Holmes like any rational observer will prioritize, will rank the evidence.  If Holmes was unable to focus, filter and rank, the bloody dagger would blend into the observable thousands of coat fibers and tobacco ashes. Which brings us to Google.

The original Google PageRank algorithm was a game changer, enabling navigation of an expanding WWW, and propelling the success of Google.   Crawling and caching the web was difficult, but had been done (e.g. AltaVista).  Google’s key innovation was to in some sense replicate the essential human need to focus, filter, and rank.  However, despite the genius behind it, Google is fundamentally a very crude, blunt observational tool.  The top hits in a Google search can only be as good as the query, which up till now has essentially been a free text search.  The PageRank algorithm depends on page links to implement a kind of crowdsourced, popularity-contest-like scoring, vulnerable to manipulation and misinformation.   We can do better than Google, and often do, but for realms less than “all the World’s information”.


Finally, there must be consideration given to math and statistics.  Recognizing faces in crowds and patterns in data means gambling with some reasonable probabilities of success and failure.  Too many false matches incur costs, being too cautious and we miss opportunities, the so-called opportunity costs.  Some evolutionary biologists suggest that homo sapiens are inclined toward false positives, perhaps to maximize food gathering and predator avoidance.   However, humans are not cognitively equipped to assess probabilities when data volumes are far from experience.  Some examples are the perceived dangers of automotive travel vs. aircraft travel, shark attack, alien abduction, etc.  In any case, in designing data analysis algorithms, wise use of statistics is imperative.

Friday, July 6, 2012


Thank you Linton Class of 1977


Thanks to the organizers of our 35th Linton High School reunion, thanks for the invitation and please accept my regrets for not attending.  I’ll be rejoining my family in France that day after a month apart so my excuse is solid.  Still I cannot believe I’m going to miss it.  I’ll be there in spirit and lately I’ve also been visiting back with our spirits of high school past.  I’ve been thinking about our class, our school, our times, and maybe it’s part of being at this advanced age, but I’m just so grateful.

We had a great school with so much to offer.  We learned about so many things, yes academic subjects but much more, from teachers, from coaches and advisors, and ourselves, our classmates.  Unfortunately I did not learn much European history but that was all my fault, and I want to apologize to Mr. Washington.  My bad.  (Now I even wish I had a tweed suit like his.)  And I didn’t need to be so annoying about it either.  Same to Mr.  Millard and American history.  Was I really that immature and shallow? (No need to answer.)  Dan W. correctly anticipated that when I began teaching high school in 1986 (in Redwood City, CA), my students would return the favor and teach me some lessons in empathy.

Our comprehensive Linton High School had diverse curricular offerings (from calculus to cooking, from sociology to statistics, from botany to business) and extra-curricular offerings, and now I regret that I didn’t explore more of those great programs: drama, music, art, journalism, community service.  Of course I also regret never passing Dating 101 (and its prerequisite Talking to Girls - remedial track)  but that’s another story.  We had diverse programs and a diverse student population, and it took me years to understand how blessed we were in that.  Our Linton and Schenectady community was a slice of America: new immigrants and Daughters of the American Revolution, many races and religions, and economics from Chevy to Cadillac.  Over time as I met the preppies and various cliques with narrower backgrounds, however privileged, I appreciated just how fortunate we were to receive the life lessons we did.  We even witnessed the rapid decline of American manufacturing as the GE workforce dropped, lessons in global economics which other Americans had yet to face or understand.  

What historic, special times we lived and high schooled in too.  We entered Linton in 1974, the month after Nixon resigned and Ford became president.  Then the long Vietnam war ended.  The country was in an extended state of shock from the war, the draft, protests, assassinations, Watergate, civil rights struggles, feminism, sex and drugs.   (Fortunately we were not yet facing that other national tragedy, disco.)  We remember watching Dan Rather on the CBS news reporting the bad news from Viet Nam as a non-embedded journalist.   I like to think we were privileged to learn from those dramatic events but not have to suffer their worst effects, not see our friends drafted and die in war.  We learned the idealism of those times, about peace and love and sacrifice, and the belief that the world can be changed for the better, in a safe place from which we yearned for adventure in the wide world.  But we also maybe absorbed the cynicism, the distrust of authority, the older generation, the establishment, the government.

Getting back to my thank yous, now I understand just how much work and dedication it takes to create and run a decent, functioning, complex organization like our Linton.  All the teachers and counselors (yes and even the administrators) who were dedicated and caring and created their own special havens where kids could learn and grow.  In my case, thanks to my cross-country, track and tennis coaches (Therrieault, Baker, Catino) for showing up, caring, setting high expectations.  Of course we didn’t appreciate enough.  

Being somewhat clueless and underdeveloped, maybe I just didn’t see our cliques.  But when I heard about other high schools and their jocks vs. geeks vs. preppies vs. heads subcults, I was thankful for Linton.  I’m not saying everything was perfect.  But I think we shared something, something of value, in so many ways.

Thanks to Mrs. Allen for tolerating Rob M. and me as we sang Surfer Girl in 10th grade English during her lectures.

Thanks to Mr. Nolan for caring enough to attempt to impress us with his vocabulary and literary references every day (now I can admit I was impressed).

Thanks to Ms. Bruce for believing that Ulysses is a great novel and telling us so with all her heart.  And for being brave enough to take us to Broadway to see Grease, which was amazing.  And wearing a halter top and looking like Diane Keaton.

Thanks to Mr. Mead for inviting me to learn probability and statistics in his class.  Sorry to Mr. Mead for refusing -- finally I understand the importance of probability and statistics -- you were right and I was wrong!

Thanks to Mr. Norton for caring so much about significant figures.  The scientific world still needs to learn this lesson, now in the “information age” we have access to so much data, much of which is insignificant.

Thanks to Mr. Kidd and Mr. Della Salla, and the Plaza Players for inspiring us with their art.  Have we learned that the world must be seen through science and art?  

Thanks to my friends who taught me so much as we grew up together.  Thanks to my cross-country and track teammates for running all over Schenectady with me, from Rexford Bridge (past “Susan is a pinhead”) to Mohawk Mall, from Collins Park to Central Park.  



Thanks everyone.

Sincerely,

Jeremy Yang
Los Ranchos, New Mexico
July 2012