Frequency of use. What do the indicators frq1, frq2 and LL-score mean in the dictionary of meaningful vocabulary

Frequency of use

noun, number of synonyms: 1

usage (10)

- Vocabulary, the use of which is limited due to certain reasons. extralinguistic reasons. To L.o.u. include: dialectisms, terms and professionalisms, jargon, colloquial words and expressions, vulgarisms...
Dictionary of sociolinguistic terms
General linguistics. Sociolinguistics: Dictionary-reference book
- translation of the German term Gebrauchstypen, introduced by Delbrück to designate established uses of grammatical forms. To T. up. include, for example, different types of syntactic usage...
Encyclopedic Dictionary of Brockhaus and Euphron
- Vocabulary, the use of which is limited by extralinguistic reasons: 1) dialecticisms limited territorially; 2) terms used in scientific style...
Dictionary of linguistic terms T.V. Foal
Dictionary of linguistic terms T.V. Foal
- Uses that prohibit the use of differences between one object and another: Living organisms cannot exist without...
- Uses that correlate with specific representatives of a given class of objects: I need to see this person...
Terms and concepts of general morphology: Dictionary-reference book
- 1) Options provided for by the rules for formatting complex non-union sentences: when explaining or motivating, a dash can be used instead of a colon: Separation is illusory - we will be together soon...
Syntax: Dictionary
- adverb, number of synonyms: 1 hidden...
Synonym dictionary
- adj., number of synonyms: 10 published, outdated, not meeting modern requirements, outdated, obsolete, relegated to the realm of legend...
Synonym dictionary
- Cm....
Synonym dictionary
- adj., number of synonyms: 19 anachronistic archaic archaic out-of-print obsolete out-of-date dilapidated obsolete obsolete out-of-date retired to the region...
Synonym dictionary
- adj., number of synonyms: 2 unsuitable for use uncommon...
Synonym dictionary
- adj., number of synonyms: 3 left unused put aside put under cover...
Synonym dictionary
- 1) Options provided for by the rules for formatting complex non-union sentences: when explaining or motivating, a dash can be used instead of a colon: Separation is illusory - we will be together soon 2) With isolation...
Dictionary of linguistic terms T.V. Foal

"frequency of use" in books

Feeding frequency

by Harmar Hillery

Feeding frequency

by Harmar Hillery

Feeding frequency The required number of feedings per day for a puppy depends on the size of the breed. Most puppies thrive when fed every three hours day and night, but if they were born prematurely or weighed less than 85g at birth, they are likely to be

Feeding frequency

From the book Breeding Dogs by Harmar Hillery

Feeding frequency

From the book Dogs and Their Breeding [Dog Breeding] by Harmar Hillery

Frequency

From the book Real Estate. How to advertise it author Nazaikin Alexander

14.2.3. Interaction frequency

by Dimitri Nicola

14.2.3. Frequency of Interaction The more often the same group of competitors interact, the more sustainable the collusion becomes, since violations are punished more promptly. If, for example, firms compete less frequently, their ability to maintain collusion is lower.

15.4.6. Auction Frequency

From the book Purchasing Guide by Dimitri Nicola

15.4.6. Frequency of Auctions As discussed above, some auction rings may transfer funds between themselves after an auction for which they have colluded, or may only keep records of amounts due on an occasional basis.

8. The frequency of use of function words turns out to be an author’s invariant

From the book Book 2. We change dates - everything changes. [New chronology of Greece and the Bible. Mathematics reveals the deception of medieval chronologists] author Fomenko Anatoly Timofeevich

8. The frequency of use of function words turns out to be an author’s invariant. A remarkable exception is our parameter 3 - the frequency of use of all function words - PREPOSITIONS, CONJUNCTIONS AND PARTICLES. The evolution of this parameter depending on the growth of the sample size is shown

Frequency

From the book Great Soviet Encyclopedia (CA) by the author TSB

Frequency

author Nazaikin Alexander

Frequency

From the book Media Planning for 100 author Nazaikin Alexander

Frequency Television channels are broadcast on meter and decimeter frequencies. The meter ranges were the first to be mastered on television. In the 90s of the 20th century, decimeter channels began to actively work in Moscow. Previously, the frequency was of significant importance, since for receiving different channels

Frequency

From the book Media Planning for 100 author Nazaikin Alexander

Frequency The frequency of signal transmission determines its quality. To a greater extent, it is provided in the VHF bands (frequency modulation FM). Listeners prefer good sound, which is why VHF stations have significant audience ratings and are preferred

3.2. Frequency

author Ivanov Dmitry Olegovich

3.2. Frequency When discussing the significance of any pathology in medicine, then, in our opinion, it is important to talk not only about the etiology, pathogenesis, clinical picture and severity of the injuries and complications that have occurred or may occur, but also about the prevalence of this pathology. TO

4.2. Frequency

From the book Heat balance disorders in newborns author Ivanov Dmitry Olegovich

4.2. Frequency Hyperthermia in newborns is probably much less common than hypothermia. This is probably due to the fact that there are extremely few studies on hyperthermia in infants in the scientific literature. Maayan-Metzger A. et al. (2003) analyzed 42,313 case reports

Frequency

From the book Glucose Metabolism Disorders in Newborns author Ivanov Dmitry Olegovich

Frequency Corblant M., who defined hypoglycemia as a blood glucose concentration of less than 30 mg% (1.67 mmol/l) in the first 72 hours of life, found it in 4.4% of all live births. In 1971, Lubchenco L. O. and Bard N., using the Corblant M. criteria, identified hypoglycemia in newborns with greater

I wrote a funny PHP script. I ran all the texts on the Spectator through it to check the language. In total, 39,110 different word forms are used in the texts. How many different ones exactly? words- quite difficult to determine. To get at least somehow closer to this figure, I took only the first 5 letters of the word and compared them. The result was 14,373 such combinations. It would be a stretch to call this the “Spectator” vocabulary.

Then I took the words and examined them for the frequency of repetition of letters. Ideally, you need to take some kind of dictionary, to complete the picture. You cannot run texts, you only need unique words. In the text, some words are repeated more often than others. So, the following results were obtained:

o - 9.28%
a - 8.66%
e - 8.10%
and - 7.45%
n - 6.35%
t - 6.30%
p - 5.53%
s - 5.45%
l - 4.32%
in - 4.19%
k - 3.47%
n - 3.35%
m - 3.29%
y - 2.90%
d - 2.56%
I - 2.22%
s - 2.11%
b - 1.90%
z - 1.81%
b - 1.51%
g - 1.41%
th - 1.31%
h - 1.27%
yu - 1.03%
x - 0.92%
f - 0.78%
w - 0.77%
c - 0.52%
sch - 0.49%
f - 0.40%
e - 0.17%
ъ - 0.04%

I advise those who go to the “Field of Miracles” to memorize this table. And name the words in that order. So, for example, it would seem that such a “familiar” letter “b” is used less often than the “rare” letter “s”. We must also remember that a word has more than one vowel. And that if you guessed one vowel, then you need to start following the consonants. And besides, the word is guessed precisely by its consonants. Compare: “**a**i*e” and “sr*vn*t*”. In both cases, the word is “compare”.

And one more consideration. How did you learn English? Remember? E pen, e pencil, e table. What I see is what I sing about. What’s the point?.. How often do you say the word “pencil” in normal life? If the task is to teach how to speak as quickly and efficiently as possible, then you need to teach accordingly. We analyze the language and highlight the most commonly used words. And we start learning from them. To more or less speak English, only one and a half thousand words are enough.

Another pampering: to form words from letters randomly, but taking into account the frequency of occurrence, so that it looks like normal words. In the first ten “random” four-letter words, “donkey” popped up. In the next fifty - the words “rushing” and “NATO”. But, alas, there are a lot of dissonant combinations, such as “bltt” or “nrro”.

Therefore - the next step. I divided all the words into two-letter combinations and began to combine them randomly (but taking into account the frequency of repetition). Steel in large quantities will produce words similar to “normal.” For example: “koivdiot”, “voabma”, “apy”, “depoid”, “debyako”, “orfa”, “poesnavy”, “ozza”, “chenya”, “rhetoria”, “urdeed”, “utoichi”, “stikh”, “sapot”, “gravda”, “ababap”, “obarto”, “eleuet”, “lyarezy”, “myni”, “bromomer” and even “todebyst”.

Where to apply... there are options. For example, write a generator of beautiful branded playful names. For yoghurts. Like, “memoliso” or “utororerto”. Or - the generator of futuristic poems "Burliuk-php": "opeldiy miaton, linoaz okmiaya... deesopen odesson."

And there is one more option. Need to try...

Some statistics on the use of Russian words:

The average word length is 5.28 characters.
The average sentence length is 10.38 words.
The 1000 most frequent lemmas cover 64.0708% of the text.
The 2000 most frequent lemmas cover 71.9521% of the text.
The 3000 most frequent lemmas cover 76.5104% of the text.
The 5000 most frequent lemmas cover 82.0604% of the text.

After the note I received this letter:

Hello Dmitry!
After analyzing the article “Language will bring you to Kyiv” and the part where you describe your program, an idea arose.
The script you wrote seems to me to be intended absolutely not for “Field of Miracles” to a greater extent, but for something else.
The first most reasonable use of the results of your script is determining the order of letters when programming buttons for mobile devices. Yes, yes - it is in mobile phones that all this is needed.
I distributed it in waves ()
The following is the distribution by buttons:
1. All letters from the first wave go to 4 buttons in the first row
2. All letters from the second wave are also on the remaining 4 buttons in the same first row
3. All letters from the third wave go to the remaining two buttons
4. 4.5 and 6 waves go to the second row
5. 7,8,9 waves go to the third row, and the 9th wave goes completely (despite the seemingly large number of letters) to the third row of the 9th button, so that the 10th button is left for all sorts of punctuation marks ( period, comma, etc.).
I think everything is clear as it is, without detailed explanations. But still, could you process with your script (including punctuation marks) the following texts:
And then post the statistics? It seemed to me? that the texts reflect our modern speech as much as possible, and yet we both speak and write SMS.
Thank you very much in advance.

So, there are two ways to analyze the frequency of repetition of letters. Method 1. Take a text, find unique (non-repeating) word forms in it and analyze them. The method is good for building statistics based on words in the Russian language, and not on texts. Method 2. Do not look for unique words in the text, but go straight to counting the frequency of repetition of letters. We get the frequency of letters in Russian text, and not in Russian words. To create keyboards and other things, you need to use exactly this method: texts are typed on the keyboard.

Keyboards should take into account not only the frequency of letters, but also the most persistent words (word forms). It’s not so difficult to guess which words are the most commonly used: these are, firstly, official parts of speech, because their role is to serve always and everywhere, and pronouns, whose role is no less important: to replace any thing/person in speech (this, he, she). Well, the main verbs (to be, to say). Based on the results of the analysis of the texts listed above, I received the following “popular” words: “and, not, in, that, he, I, on, with, she, how, but, his, this, to, a, all, her, was, so, then, said, for, you, oh, at, him, me, only, for, me, yes, you, from, was, when, from, for, still, now, they, said, already, him, no, was, her, to be, well, nor, if, very, nothing, here, herself, so that, to herself, this, maybe, that, before, we, them, whether, were, is, than, or, her” and so on.

Returning to keyboards, it is obvious that in the keyboard the letter combinations “not”, “what”, “he”, “on” and others should be as close to each other as possible, or if not close, then in some optimal way. It is necessary to conduct research into exactly how the fingers move across the keyboard, find the most “comfortable” positions and place the most commonly used letters in them, without forgetting, however, about letter combinations.

The problem, as always, is one: even if it is possible to create a Unique Keyboard, what will happen to the millions of people who are already accustomed to qwerty/ytsuken?

As for mobile devices... Probably it makes sense. At the very least, the letters "o", "a", "e" and "i" must be exactly on the same key. Punctuation marks in order of frequency of use: , . - ? ! " ; :) (

- — Topics information protection EN word usage frequency … Technical Translator's Guide

Y; frequencies; and. 1. to Frequent (1 digit). Monitor the frequency of repetition of moves. Required part of planting potatoes. Pay attention to your pulse rate. 2. The number of repetitions of identical movements, oscillations in what direction. unit of time. Hours of wheel rotation. H... encyclopedic Dictionary

I Alcoholism is a chronic disease characterized by a combination of mental and somatic disorders resulting from systematic alcohol abuse. The most important manifestations of A. x. are altered endurance to... ... Medical encyclopedia

CAPTURE- one of the specific terms used in Russian hook records. non-linear polyphony, characterized by a developed subvocal polyphonic structure and a sharp dissonance of the vertical. Singing implementation of the term in the present day. time has not been studied... Orthodox Encyclopedia

Stylostatistical method of text analysis- is the use of mathematical statistics tools in the field of stylistics to determine the types of language functioning in speech, patterns of language functioning in different spheres of communication, types of texts, specific functionalities. styles and...

Portioned flavored snus, mini portion Snus is a type of tobacco product. It is crushed moistened tobacco, which is placed between the upper (less often lower) lip and gum... Wikipedia

Scientific style- presents scientific. the sphere of communication and speech activity related to the implementation of science as a form of social consciousness; reflects theoretical thinking, appearing in a conceptually logical form, which is characterized by objectivity and abstraction... Stylistic encyclopedic dictionary of the Russian language

- (in specialized literature also patronymic) part of the family name, which is assigned to the child by the name of the father. Variations of patronymic names can connect their bearers with more distant ancestors, grandfathers, great-grandfathers... ... Wikipedia

Common use, applicability, prevalence, applicability, marketability, generally accepted Dictionary of Russian synonyms. usage noun, number of synonyms: 10 generally accepted (11) ... Synonym dictionary

Reasoning- - functionally semantic type of speech (see) - (FSTR), corresponding to the form of abstract thinking - inference, performing a special communicative task - to give the speech a reasoned character (to arrive logically at a new judgment or ... ... Stylistic encyclopedic dictionary of the Russian language

I would like to warn you that the information presented in this article is somewhat outdated. I did not rewrite it so that later I could compare how SEO standards change over time. You can find up-to-date information on this topic in new materials:

Hello, dear readers of the blog site. Today’s article will again be devoted to such a topic as search engine optimization of websites (). Previously, we have already touched on many issues related to such a concept as.

Today I want to continue the conversation about internal SEO, while clarifying some points raised earlier, as well as talk about what we have not yet discussed. If you are able to write good unique texts, but do not pay enough attention to how they are perceived by search engines, then they will not be able to make their way to the top of search results for queries related to the subject of your wonderful articles.

What affects the relevance of text to a search query?

And this is very sad, because in this way you will not realize the full potential of your project, which can turn out to be very impressive. You need to understand that search engines for the most part are stupid and straightforward programs that are not able to go beyond their capabilities and look at your project with human eyes.

They will not see much of everything that is good and necessary in your project (that you have prepared for visitors). They only know how to analyze a text, taking into account many components, but they are still very far from human perception.

Therefore, we will need to at least temporarily climb into the shoes of search robots and understand what they focus their attention on when ranking various texts for various search queries (). And for this you need to have an idea about, for this you will need to read the article provided.

Usually they try to use keywords in the page title, in some internal headings, and also distribute them evenly and as naturally as possible throughout the article. Yes, of course, highlighting keys in the text can also be used, but you should not forget about over-optimization, which may result.

The density of keys in the text is also important, but now this is rather not a desirable factor, but, on the contrary, a warning - one should not overdo it.

Determining the density of occurrence of a keyword in a document is quite simple. In fact, this is the frequency of its use in the text, which is determined by dividing the number of its occurrences in the document by the length of the document in words. Previously, the position of the site in the search results directly depended on this.

But you probably understand that it will not be possible to compile all the material only from the keys, because it will be unreadable, and thank God this is not necessary. Why, you ask? Yes, because there is a limit to the frequency of using a keyword in the text, after which the relevance of a document for a query containing this keyword will no longer increase.

Those. It will be enough for us to achieve a certain frequency and we will thus optimize it as much as possible. Or we will overdo it and fall under the filter.

It remains to solve two questions (and maybe three): what is this maximum density of keyword occurrence, after which it is already dangerous to increase it, and also to find out.

The fact is that keywords highlighted with emphasis tags and enclosed in a TITLE tag have more search weight than similar keywords simply appearing in the text. But recently, webmasters have begun to use this and have completely spammed this factor, which is why its importance has decreased and can even lead to the ban of the entire site due to abuse of strong points.

But the keys in TITLE are still relevant, it’s better not to repeat them there and not try to cram too much into one page title. If the keywords are in the TITLE, then we can significantly reduce their number in the article (and therefore make it easy to read and more suitable for people, not for search engines), achieving the same relevance, but without the risk of falling under the filter.

I think that everything is clear with this question - the more keys are enclosed in accent and TITLE tags, the greater the chance of losing everything at once. But if you don’t use them at all, then you won’t achieve anything either. The most important criterion is the naturalness of introducing keywords into the text. If they exist, but the reader doesn’t stumble over them, then everything is great.

Now it remains to figure out what frequency of use of a keyword in a document is optimal, which allows you to make the page as relevant as possible, and will not entail sanctions. Let's first remember the formula that most (probably even all) search engines use for ranking.

How to determine the permissible frequency of using a key

We have already talked about the mathematical model in the article mentioned just above. Its essence for this particular search query is expressed by one simplified formula: TF*IDF. Where TF is the direct frequency of occurrence of this query in the text of the document (the frequency with which words appear in it).

IDF is the inverse frequency of occurrence (rarity) of a given query in all other Internet documents indexed by a given search engine (in the collection).

This formula allows you to determine the correspondence (relevance) of a document to a search query. The higher the value of the product TF*IDF, the more relevant the document will be and the higher it will rank, all other things being equal.

Those. it turns out that the weight of the document for a given query (its compliance) will be greater, the more often the keys from this query are used in the text, and the less often these keys are found in other Internet documents.

It is clear that we cannot influence the IDF, except by choosing another request for which we will optimize. But we can and will influence TF, because we want to grab our share (and not a small one) of traffic from Yandex and Google search results on user questions we need.

But the fact is that search algorithms calculate the TF value using a rather cunning formula, which takes into account the increase in the frequency of keyword use in the text only up to a certain limit, after which the growth of TF practically stops, despite the fact that you increase the frequency. This is a kind of antispam filter.

Relatively long ago (until about 2005), the TF value was calculated using a fairly simple formula and was actually equal to the density of occurrence of the keyword. The results of calculating relevance using this formula were not entirely liked by search engines, because they pandered to spammers.

Then the TF formula became more complicated, such a concept as page nausea appeared and it began to depend not only on the frequency of occurrence, but also on the frequency of use of other words in the same text. And the optimal TF value could be achieved if the key turned out to be the most frequently used word.

It was also possible to increase the TF value by increasing the text size while maintaining the percentage of occurrence. The larger the towel with the article with the same percentage of keys, the higher the document will rank.

Now the TF formula has become even more complicated, but at the same time, now we do not need to bring the density to the value when the text becomes unreadable and search engines will impose ban on our project for spam. And there is no need to write disproportionately long sheets now either.

While maintaining the same ideal density (we will determine it just below from the corresponding graph), increasing the size of the article in words will improve its position in the search results only up to a certain length. Once you have the ideal length, further increasing it will not affect relevance (more precisely, it will, but very, very little).

All this can be seen clearly if you build a graph based on this tricky TF (direct occurrence frequency). If on one scale of this graph there is TF, and on the other scale - the percentage of the frequency of occurrence of the keyword in the text, then we will get the so-called hyperbole as a result:

The graph, of course, is approximate, because few people know the real TF formula that Yandex or Google use. But qualitatively it can be determined optimal range, in which the frequency should be located. This is approximately 2-3 percent of the total number of words.

If you consider that you will also enclose some of the keys in accent tags and the TITLE heading, then this will be the limit after which a further increase in density may be fraught with a ban. It is no longer profitable to saturate and disfigure the text with a large number of keywords, because there will be more minuses than pluses.

What length of text will be sufficient for promotion?

Based on the same assumed TF, one can plot its value versus length in words. In this case, you can take the frequency of keywords constant for any length and equal, for example, to any value from the optimal range (from 2 to 3 percent).

What is noteworthy is that we will receive a graph of exactly the same shape as the one discussed above, only the length of the text in thousands of words will be adjusted along the x-axis. And from it it will be possible to draw a conclusion about optimal length range, at which the almost maximum TF value is already achieved.

As a result, it turns out that it will be in the range of 1000 to 2000 words. With a further increase, relevance will practically not increase, and with a shorter length it will drop quite sharply.

That. We can conclude that in order for your articles to rank high in search results, you need to use keywords in the text with a frequency of at least 2-3%. This is the first and main conclusion that we made. Well, the second thing is that now it is not at all necessary to write very voluminous articles in order to get to the Top.

It will be enough to exceed the threshold of 1000 - 2000 words and include 2-3% of keywords in it. That's all - that's it recipe for the perfect text, which will be able to compete for a place in the top for low-frequency queries, even without using external optimization (purchasing links to this article with anchors that include keys). Although, rummage around a little Miralinkse , GGL, Rotapost or GetGoodLink, you can, because it will help your project.

Let me remind you once again that you can find out the length of the text you wrote, as well as the frequency of using certain keywords in it, using specialized programs or using online services that specialize in their analysis. One of these services is ISTIO, about working with whom I talked.

Everything I said above is not one hundred percent reliable, but very similar to the truth. In any case, my personal experience confirms this theory. But the algorithms of Yandex and Google are constantly undergoing changes, and few people know how it will be tomorrow, except those who are close to their development or developers.

Good luck to you! See you soon on the pages of the blog site

How to find a word in the dictionary?

The two main sections of the dictionary are a list of words, ordered alphabetically and by overall frequency of use in the corpus. All words are given in their original (initial) form: for names this is the nominative case form (for nouns, as a rule, the singular form, for adjectives - the full masculine form), for verbs - the infinitive form.

The alphabetical list contains 60 thousand of the most frequent word forms. To find information about the desired word, go to the section, select the first letter of the word and find the desired word in the table. To quickly find a word, you can also use the search box, for example:

Word: great

In this way, you can find information not only about a specific word, but also about a group of words that begin or end in the same way. To do this, in the search window, use an asterisk (*) after the typed sequence of letters (“all words starting with...”) or before the string of letters (“all words ending with...”. For example, if you want to find all words starting with re-, type in the search box:

Word: re*

If you want to find all words ending with – just a little, type in the search box:

Word: *no

In the frequency list of lemmas, words are ordered by the overall frequency of use in the corpus of the modern Russian literary language. The frequency list includes 20,000 of the most commonly used lemmas.

To find information about the desired word, go to the section and find the desired word in the table. To search for information about individual words, it is best to use the quick word search window.

Why can't I find a word in the dictionary even though I can find it in the corpus?

This could be due to several reasons. Firstly, the word may have low frequency (for example, only 3 occurrences in the corpus) or be used only in texts written before 1950. Secondly, a word can appear many times, but in one or two texts: such lemmas were deliberately excluded from the dictionary. Thirdly, we cannot exclude that there was an error in the automatic determination of the original form or part-speech characteristics of the word, or that the word was erroneously attributed as a proper noun. The site presents a “test” version of the frequency dictionary, and we are going to continue working to clarify its lexical composition.

What information can you get about the use of a word?

In the dictionary you can get the following information about the use of a word in the corpus:

the total number of uses of the lemma (total frequency in ipm units), see sections, frequency dictionaries of fiction and other functional styles; frequency dictionaries of nouns, verbs and other parts of speech

frequency rank of the word (that is, the serial number in the general frequency list), see sections, frequency dictionaries of nouns, verbs and other parts of speech.

number of texts in which the word appeared (number of documents), see section;

coefficient of variation D, see sections and frequency dictionaries of nouns, verbs and other parts of speech

distribution of word usage in texts created in different decades (1950s, 1960s, etc.), see section;

general frequency of use of individual word forms, see section Alphabetical list of word forms.

In dictionaries of meaningful vocabulary, you can also obtain information about the comparative frequency of a word in the general corpus and in the subcorpus of texts of a certain functional style (fiction, journalism, etc.) and the LL-score likelihood indicator.

In addition to quantitative indicators, the part of speech is indicated for the word. This is done in order to separate words from different parts of speech that have the same original form (cf. bake - noun and verb).

What is ipm?

The overall frequency characterizes the number of occurrences per million words of the corpus, or ipm (instances per million words). This is a unit of frequency measurement generally accepted in world practice, which simplifies the comparison of word frequency in different frequency dictionaries and in different corpora. The fact is that the samples of texts on which frequency is measured can differ quite greatly in size. For example, if the word power occurs 55 times in a corpus of 400 thousand words, 364 times in a corpus of millions and 40598 times in a corpus of 100 million words of the modern Russian language and 55673 times in a large corpus of 135 million NKRY, then its frequency in ipm will be 137.5, 364.0, 372.06 and 412.39, respectively.

Frequency dictionaries, ed. L.N. Zasorina and L. Lenngren were built on a sample of one million word usages, respectively, we can assume that the absolute indicators appearing there are also given in ipm.

What is the coefficient of variation D?

Coefficient D, introduced by A. Juilland (Juilland et al. 1970), is used in many frequency dictionaries (Russian dictionary by L. Lenngren, dictionary of the British National Corpus, dictionary of French vocabulary in the field of business). This coefficient allows you to see how evenly the word is distributed in different texts.

The coefficient value is determined in the range from 0 to 100. For example, the word And is found in almost all texts of the corpus, and its D value is close to 100. The word commissurotomy occurs 5 times in the corpus, but only in one text; it has a D value of about 0.

Specifying the coefficient D for each word makes it possible to assess how specific it is for individual subject areas. For example, words overripe And implant have approximately the same frequency (0.56 ipm), but at the same time the coefficient D overripe equals 90, a at the implant - 0. This means that the first word occurs evenly in texts of different directions and is significant for a large number of subject areas, while the word implant present only in a few texts on the topic of “medicine and health”.

What can you learn about the history of the use of the word in different periods?

Information on the distribution of word frequency in different decades of the 2nd half of the 20th century and at the beginning of the 21st century can be obtained in. For example, you can see how the fate of the word developed perestroika:

The sharp surge in its use in the 1980s is quite explainable by the socio-historical realities of that time; at the same time, from a linguistic point of view, this fact can be interpreted in this way: the word perestroika enriched with a new meaning, which became dominant in subsequent years.

Why are proper names and abbreviations included in a separate list?

Proper nouns are separated from the main part of the dictionary, since they form a much less statistically stable group, and their frequency largely depends on the choice of texts in the corpus and on their topic (in particular, on the place and time of the events described). Lenngren 1993 expressed the opinion that the inclusion of proper names in the frequency dictionary on a general basis inevitably leads to its premature obsolescence.

The dictionary includes the nuclear part of this list, numbering the 3,000 most frequent units. To search for data on the use of first names, patronymics, last names, nicknames, nicknames, toponyms, names of organizations and abbreviations, go to the section Alphabetical list of proper names and abbreviations, select the letter with which the word you are looking for begins and find it in the table. You can also use the quick word search window.

How can I get information about the use of individual forms of a word?

In addition to information about the use of a lemma (that is, a word in all forms of inflection), in the dictionary you can find out how individual word forms are used. Go to the Alphabetical list of word forms section, select the letter with which the word form begins and find it in the table. You can also use the quick search window, for example:

Word form: fly

To find all word forms that begin (or end) with a specific letter sequence, use the asterisk (*) sign in the search box. For example, all word forms starting with euthanasia-, can be found by typing:

Word form: sleep*

All word forms ending in ¬ -ic, can be found by typing:

Word form: *hiccup

The alphabetical list of word forms includes all word forms of the corpus with a frequency above 0.1 ipm (about 15 thousand in total) and contains information about their general frequency. Homonymous word forms are marked in the table with *.

How to find information about the “most common” words?

Using our dictionary, you can find information about classes of words that differ in general statistical characteristics. These are, in particular:

the most frequent words in the general sample from the corpus; average frequency words for the general sample, etc. (see section);

words most frequently found in the fiction subcorpus (see section Frequency Dictionary of Fiction);

words most frequently found in the subcorpus of journalism (see section Frequency Dictionary of Journalism);

words that appear most frequently in the subcorpus of other nonfiction (see section Frequency Dictionary of Other Nonfiction);

words that are most characteristic of oral speech (see section Frequency Dictionary of Living Oral Speech).

the most frequent nouns (see section Frequency list of nouns);

the most frequent verbs (see section Frequency list of verbs);

and other frequency lists of partial speech classes.

In addition to the proposed classes, you can independently explore other groups of words using the “General Alphabetical List” table in the Alphabetical List of Word Forms section (for example, you can explore the most frequent verbs with the prefix re-, words found in more than 200 texts and much more: the principles of class grouping depend on your tasks and your imagination).

How to trace the frequency distribution in texts of different functional styles?

L.N. Zasorina’s frequency dictionary provides data on the use of words in four types of texts: (I) newspaper and magazine texts, (II) drama, (III) scientific and journalistic texts, (IV) artistic prose. In our dictionary you can get similar information by using the section “Distribution of lemmas by functional styles”.

Frequency dictionaries of functional styles are compiled on the basis of the subcorpora of fiction, journalism, other non-fiction literature and live oral speech. In comparison with L.N. Zasorina’s dictionary, the composition of the headings has been slightly changed: instead of dramaturgy, recordings of live oral speech and transcripts of film soundtracks are used, scientific literature is allocated in a separate section, along with official business, church and other non-fiction literature.

The list includes the 5,000 most frequent lemmas of these subcorpora. For each lemma, the part of speech, frequency in the subcorpus and coefficient D are indicated.

What is a dictionary of meaningful vocabulary (fiction, etc.)?

There are words that are used much more often in one of the functional styles than in others. For example, for live oral speech such words are here, in general And OK. Indeed, it is difficult to imagine that in scientific and technical literature these words are used as often as in everyday language.

A list of the most typical lemmas for each functional type of text was identified based on a comparison of the frequency of lemmas in a given subcorpus of texts and in the rest of the corpus. Dictionaries of significant vocabulary include 500 lemmas.

What do the indicators frq1, frq2 and LL-score mean in the dictionary of meaningful vocabulary?

Frq1 is the overall frequency of the lemma in the entire corpus (in ipm units), frq2 is the frequency of the lemma in a given sub-corpus (the sub-corpus of fiction, journalism, other non-fiction and live spoken language, respectively), LL-score is the likelihood ratio calculated based on frq1 and frq2 according to the formula proposed by P. Rayson and A. Garside (see more about this in the Introduction to the dictionary). The higher the LL-score, the more significant the word is for a given functional style.

How to get a list of the 100 most frequent verbs?

In the section “General Vocabulary: Parts of Speech,” the frequency list of lemmas is divided into seven sublists: nouns, verbs, adjectives, adverbs and predicates, pronouns, numerals and auxiliary parts of speech. Here, for each lemma, its overall frequency and rank (ordinal number) in the general list are indicated. Each list contains 1,000 of the most frequent lemmas.

Thus, you can get a list of the 100 most frequent verbs by going to the Frequency Verb List subsection and selecting the first 100 verbs at the top of the list. Similarly, you can find out which adjective is the most frequent (as stated in the section Frequency list of adjectives, this adjective new) and find out many other interesting facts regarding the composition of part-speech classes.

How to use auxiliary tables?

Auxiliary tables include, firstly, data on the frequency of sub-speech classes, as well as other grammatical categories. These data were obtained on the basis of the NKRY subcorpus with the lexical and grammatical ambiguity removed (manually) (size of more than 6 million word usages). Since the statistics cover large classes of words, there is reason to believe that the proportion of parts of speech and other grammatical categories will be the same throughout the corpus.

Secondly, this section provides information about text coverage by lexemes, the average length of a word, word form and sentence.

Thirdly, here are frequency lists of uses of letters of the Russian alphabet, punctuation marks, as well as two-letter and multi-letter combinations.

Also on topic

Gross and net profit

Perfect and imperfect participles

Parliamentary and dualistic monarchy - what is it?

LLC or individual entrepreneur: which is better to choose?

Types of cognition and their characteristics