I wrote that I received an inquiry from a French on “Programming Linguistics” that I proposed in my graduate thesis in a blog article titled “Linguistics of programming languages ― now an inquiry on my graduate thesis came from France”. Yesteday I received another inquiry from a Japanese, and I searched for related work. As I wrote in the above article, there have been a few studies on programs regarded as human-written linguistic expressions. However, in this time, I found a paper by Masaru Ohba and Katsuhiko Gondow, which analyzes the strudture of identifiers.
In my graduate thesis [Kan 81] (in Section 4.2), I pointed out that there were identifiers consisted of multiple elements (i.e., morphemes), and they were articulated by white spaces, such as “towers of hanoi” or articulated by underlines (“_”) or hyphens (“-”), or the first character of each element is capitalized, such as “FileOfInteger”.
Ohba et al [Ohb 05] calls such elements of identifiers “concept keywords”. They tried to find concept keywords automatically. For this purpose, they developed the ckTF/IDF method, which is based on so-called TF/IDF method that have been used for analyzing natural languages. A feature of the ckTF/IDF method is that prefixes such as “kbd_” are not regarded as concept keywords. This is probably because such prefixes often representes abbreviated module names but they have no concern to the meaning of the identifiers.
The authors of this paper do not seem to have had feeling that they studied human beings (i.e., they probably did not think they studied a humanity). However, this work is apparently a part of the Morphology of programming languages that I specified in my graduate thesis.
References
- [Kan 81] Kanada, Y., “Toward Programming Linguistics”, Master's Thesis, University of Tokyo Graduate School, 1981 (in Japanse).
- [Ohb 05] Ohba, Masaru and Gondow, Katsuhiko, “Toward Mining "Concept Keywords" from Identifiers in Large Software Projects”, Int'l Workshop on Mining Software Repositories 2005, pp. 1—5, 2005.