Just another blog
on Data Science and other things

TBSA Step 9. Song Titles and Words-Time Ratio

There are three songs parameters that have been analyzed in previous steps: track length, number of words in titles and number of words in lyrics. Let's see how they depend on each other.

If we plot number of words in lyrics against length in seconds, the character of dependency seems to be more or less linear. But the result of linear regression is not very impressive; coefficient of determination (the R squared parameter) is only 0.2. In covers only it's 0.15, while in originals 0.23.

Both number of words in lyrics and length in seconds tend to increase with increasing of number of words in titles, at least from 1 to 6. This trend is more pronounced in number of words and less so in length, and the origin of this trend lies again in originals.

If we define Words-Time Ratio (WTR) as the number of words in lyrics divided by track length in seconds, then the increasing trend is more than clear, at least in classes from 1 to 4, and again its source lies in originals.

The Beatles linear regression

Exercise 1. Facebook friends list

Sometimes some exercises come to my mind. I'm going to post them and their solutions here. So the first one is as following.

A friend of yours wants to clean his Facebook friend list and is going to remove one friend during a week. What is the probability that you will be removed from his friend list, which has 300 members?

exercises probability theory

TBSA Step 8. Parts of Speech 2

Here I explore the most frequent words in the songs of the Beatles according to their classes and semantics.

If one splits word frequencies according to parts of speech, one can see the following. For nouns, the leader is love (almost 5% though). For pronouns, you is the leader (about 30%), and I is very close. As for verbs, while the leader is do (almost 6%), in the top ten there are mostly forms of the verb to be.

Lemmatization changes almost nothing for nouns. As for verbs, be is the leader now with almost 21%. Do with its forms has less than 7%.

Looking at time, it's interesting to note that the absolute leader love wasn't always one. In fact, love is the leader only in three years, and in the other years there are many other leaders - girldaywhatsuneverybody. As for verbs, be is the absolute leader for all years. Do holds the second place, but sometimes loses it to other words.

It's interesting that covers' nouns don't even list love in the top ten! The absolute leader there is baby.

Love is the leader in nouns for all the three original authors, and as for verbs, while be in an undisputable leader, covers have surprisingly high frequency for get (the second place!) and low (not even in top ten) for know.

The Beatles POS tagging

TBSA Step 7. Parts of Speech

This step contains analysis of parts of speech (POS) frequency in the Beatles songs.

The most frequent part of speech in the Beatles lyrics is the verb (24.7%). Pronouns (15.6%) and nouns (13.2%) have the second and the third place, respectively. In comparison to other texts in English and the English language corpus, verbs and pronouns play significantly more important role in the Beatles lyrics.

In time dynamics, the verb is always the leader. Pronouns and nouns fight for the second place.

Songs by Lennon, McCartney and Harrison have verbs and pronouns on the first and the second place respectively. But as for the third place, Harrison favours adverbs while others prefer nouns.

The verb is also the most common element in the cover songs. The main difference between covers and originals is the place of nouns and pronouns: the second and the third or vice versa. 

Verbs in their base form comprise more than a third of all verbs. Then there come verbs in the non-3rd person singular present form (~23%) and 3rd person singular present form (~12%). Other forms have less than 10% each. Nouns in singular form have about 83%.  Existential there and comparative adverbs are not numerous. Commas are the most common punctuation mark (almost 60%). Proper nouns are mostly singular.

The Beatles POS tagging

TBSA Step 6. Word Frequency 2: Love and Know

This step contains analysis of word frequency in the Beatles songs after removing stopwords.

With stopwords removed, the most common words are love and know.

Love is the absolute leader in the first two years, then it is outrun by other words. There is no year which would repeat the distribution of words in total in the top ten. Again we see that some words have very high frequencies just because they are repeated many times in just one song.

While original songs have word frequency distribution close to total, cover versions are quite different. The leaders there are baby and got.

When it comes to the words that are contained in the biggest number of songs, love and know are leaders again (know is even more frequent). In cover songs love has only the third position, and know is even lower. The leaders are ohwell and got.

The Beatles Zipfs Law

TBSA Step 5. Word Frequency: You and I

This step contains analysis of word frequency in the Beatles songs. 

There is not a word that is common to all Beatles songs. But as for the number of songs where words can be met, the leader is the, found in 182 songs. Then we can see such words as to (178 songs), you (173 songs), and (172 songs) and I (166 songs).

When counting frequencies, they are falling drastically with the word's rank. Even before 20 words they become less that 1%. The leader is you (4.75%). Even when taking into account short verb forms, you is the leader. The distribution doesn't seem to correspond to Zipf's law very well, but it's more or less close.

The obvious leader every year is again you. Still its frequence is going down with time. When looking at original songs and cover versions separately, you and I are the favourites in both cases, and then differences begin. 

In lemmatized lyrics only a and and reversed their places in the top ten, everything else is the same.

The frequency of words doesn't correspond much to the general frequency leaders (not in American nor in British variants), and even in contemporary fiction and poetry, but is rather close to TV/movies language.

The Beatles Zipfs Law

TBSA Step 4. Authors

Here I do some analysis of the Beatles songs authorship. Only official authorship data is analyzed, i.e., all non-cover songs belong to Lennon-McCartney, Harrison, or Starkey. Sure, the lion share of the Beatles songs is attributed to Lennon-McCartney. But there is some interesting information behind it to reveal.

From all Beatles songs, about 88% are written by the members, and from these about 87% are written by Lennon-McCartney. Harrison has about 12%, Ringo only 1%.

Lennon and McCartney as solo singers sang about the same amount of the songs they wrote together, about a third each. Other songs they wrote were sung by other Beatles members or in cooperation with them. Both Harrison and Starr sang songs by Lennon-McCartney (two and four, respectively), while the opposite never happened.

With time, the share of Lennon-McCartney songs is gradually rising to the maximum of about 90% in 1967, and then is falling down a bit. But if we exclude cover versions, the share of Lennon-McCartney songs is actually slightly falling with time, while Harrison's one is generaly rising.


Median lengths are different only a little; songs by Starr are the longest, by Lennon-McCartney are the shortest, and by Harrison are in between. Songs of all authors have the same median number of words in titles: it's three. But all of them, except Starkey, have many songs that have more. Harrison's songs has the shortest lyrics, Lennon-Mccartney goes second, and the longest lyrics belong to Ringo's songs. Still in all three parameters Lennon-McCartney songs have most number of outliers and higher range.

The Beatles

TBSA Step 3. Whose Voice is Soothing?

We know that all the four Beatles members were singers. But they sang both alone and together in different songs. So, here I analyze vocals data.

Lennon is definitely the lead singer. He sang about a third of all songs alone, and participated as a singer in about 57%. McCartney holds the second place with a fourth of all songs and about 52%, respectively. 

About three forth of all the songs were sung by one singer alone, with Lennon having 41% and McCartney 35% of them. But McCartney is the leader in command work: of the songs he participated in, almost 50% were not the songs he was the only vocalist. Starr has the highest share of one-singer-songs (71%).  The share of songs sung by one member is never less than 60% (1963, the very first year), and rises to the maximum of more than 80% in 1966 and 1968. 

The probability to hear Lennon and McCartney together while listening to a random Beatles song is only 22%, but hearing either one or another is much more likely (83%).

As for cover songs, most of them are one-singer-songs, and most of them are sung by Lennon.

The Beatles

TBSA Step 2. Words, words, words...

Here I conduct the comparison of general statistical information about word numbers in titles and lyrics, taking into consideration such parameters as release year and originality.

The average song title consists of three words. Such songs comprise about one third of all songs. The minimum lies at 1 and the maximum at 10 (Everybody's Got Something To Hide Except Me And My Monkey from the White Album, 1968). The number of words in titles is more diverse in the second period (1967-1970). The covers have generally the same title lengths as the original songs, but they don't have outliers.

The average song contains 182 words. The number of words in lyrics is again more diverse in the second period. The longest songs in average are in 1967, and the year with the highest variance of number of words is 1969. Still, the shortest (Wild Honey Pie, 21 words) song was released in 1968 and the longest (Hey Jude, 392 words) song in 1967. The covers don't differ much from the original songs, they have lower range but higher variance.

There are no songs combining median length, median number of words in title and median number of words in lyrics. Still, there are two songs combining median length and number of words in title: Hold Me Tight (1963) and For You Blue (1970).

The Beatles

TBSA Step 1. Second by Second: Length Comes First

Here I analyze statistical information about songs length as a simple but interesting attribute.

The overall mean song length is about 162 seconds, i.e., about 2.7 minutes. But there are two disinct periods: 1963-1966 and 1967-1970. In the first, short mean lengths slowly rise and the songs are rather uniform. In the second, long mean lengths slightly fall, but variance is rather high. The shortest and the longest songs both are in the second period (Her Majesty from Abbey Road, 1969 - 21 seconds, and I Want You from the same album, 467 seconds).

Cover versions are present almost entirely only in the first period and in general are only a bit shorter and a bit more uniform. So they do not affect statistical parameters in a significant degree. The most typical songs in terms of length (152 seconds) are For You BlueA Hard Day's NightAnd I Love Her and Hold Me Tight. The most productive year for the band in terms of length was 1968, while the least so was 1970.

The Beatles