Saturday, September 28, 2024

Reading Randomly

I choose the books to read or listen pretty randomly. 

I don't chose books per some magazines' best seller lists or famous people's recommendations

For leisure, I listen to books from local libraries, and usually pick nonfiction books, in the format audio, and status available now, and occasionally get on waiting list for books recommended by  friends or colleagues.

For my expertise area and adjacent fields, my selection of books not directly related to ongoing work, is even more random. A book selected could be from checking the source of an interesting article in a magazine, or from trying to find the definition of a new concept presented in a seminar, or just fact checking some posts on LinkedIn.

The advantage of  reading "randomly" is that I can cover a wide range of topics from limited reading, sometimes totally out of my regular choices, to be aware widely of different perspectives, new concepts, approaches, theories, or philosophy. 

Here are three books/article I read/listened recently: "Hidden Potential", AI Model Collapse and "Black Swan"

Hidden potential is a book by Adam Grant, discussing about how people can unlock their hidden potentials. I picked the book from local library catalog - nonfiction, audio format. 

AI model collapse is an article in Nature (July 2024), full title "AI models collapse when trained on recursively generated data", which addresses the AI model degeneration. This article was selected purely by chance - I scanned through table of contents of last three issues of an online magazine on digital engineering, and read an article on applications of artificial intelligence (LLMs in particular) in engineering, The AI model collapse, which was a new concept to me, was listed as a risk factor. I did Google search to learn what it was, and then found the Nature article for a in-depth understanding.

Black Swan is a book by Nassim Nicholas Taleb exploring the profound impact of rare and unpredictable events. A chain events related to the fact checking of a LinkedIn post on how to interpret statistical observations, led me to the concept of survivorship bias, and then the book Black Swan.

An example of survivorship bias (Wikipedia) 

What is common of these three items?

They are all involved with tails of statistical distributions. In fact, reading randomly helps me cover the tails of the distribution of the books I read as well.

"Hidden Potential" uses extraordinary individuals and case studies to support the arguments and suggestions made. While these examples provided are compelling, the book did not take into account of "the silent majority" of similar cases, i.e. those who applied the same strategies but failed to achieve the same goals. The success stories are in the tails of corresponding statistical distributions.

I listened to the audio version of the book during my daily commute. It did not occur to me that the book fell into survivorship bias until the end of the book, where an example of a migrant work turned astronaut Jose Hernández was given. Yes he had perseverance, grit to achieve his dream of becoming an astronaut, but in the end it was a good luck that made it happen even according to book's story telling. Tens of thousands of driven people applied for the position, many who are equally qualified or more qualified did not get it. 

It is not just "Hidden Potential", motivational books, "how-to" make self improvement guides, I read, fall into Survivorship bias. That said, readers can still benefit from these books, trying new approaches they don't know before, just don't expect the same spectacular outcomes.

The main reason for generative AI model collapse is, when the AI models are trained by synthetic data generated in earlier generations, the tails of the original content distribution disappear in future generations due to the resampling error of previous generations of generative AI models. The article used scenarios, such as future models are trained exclusively on synthetic data from AI models, to demonstrate the fast deterioration of model performance. The paper did give an example of using only 90% synthetic data, which led to much slower and minor degeneration. Here the authors used unlikely scenarios to highlight generative AI model collapse.

With the current trend to use synthetic data to train AI models, this alarmistic article title caught people's attention and could potentially help to prevent future catastrophe from generative AI models  

"Black Swan" is, by definition, a rare, unpredictable event with huge consequences. The book provides an unconventional and unique insight to black swan events. It criticized conventional risk and prediction models, especially in economics and finance, because they are fundamentally flawed, i.e. mostly rely on assumption of Gaussian distribution; furthermore most economists fail to recognize the limitations of their models and tend to rationalize failures after the fact. The book is provocative and stimulating.


What I learnt, at a high level, from reading these books ?

All theories/models have limitations, don’t blindly trust them, be inquisitive - thinking critically! 

Ask questions, analyze assumptions, examine evidence, infer through reasoning, and define problems, i.e. make hypothesis and test it.

The key in this process is to test hypothesis with a goal to find out if the hypothesis is correct; not to prove that it is correct, the later leads to confirmation bias, and slew of other biases.

======

Note

some past blogs on books

I have listened 100 books

audio books

A peek into a different world 

Notes from recent audio book listening



No comments:

Post a Comment