Underscore

The symbol underscore, _, also called underline, underdash, low line, or low dash, is a character that originally appeared on the typewriter and was primarily used to underline words.[lower-alpha 1] To produce an underlined word, the word was typed, the typewriter carriage was moved back to the beginning of the word, and the word was overtyped with the underscore character.

_ ⎁ ◌̲
Underscore
In UnicodeU+005F _ LOW LINE (HTML _)
U+2381 CONTINUOUS UNDERLINE SYMBOL
U+0332 ̲ COMBINING LOW LINE
Related
See alsoU+2017 DOUBLE LOW LINE
U+2382 DISCONTINUOUS UNDERLINE SYMBOL

This character is often used to create visual spacing within a sequence of characters, where a whitespace character is not permitted (e.g., in computer filenames, email addresses, and in Internet URLs). Some computer applications will automatically emphasize text surrounded by underscores either by underlining or by italicizing it (e.g. _string_ may render string or string). In contexts where no formatting is supported such as in IRC, instant messaging, or older email formats, the enclosing underscore markup is sometimes used as a proxy for underlining the word(s) enclosed.

A variant, , is used in the Province of Quebec (Canada) to underline superscripts.[1]

In some languages, the mark is used as combining diacritic and is called a "combining low line".

Diacritic

The underscore is used as a diacritic mark, "combining low line", ◌̲ , in some languages of Egypt, some languages using the Rapidolangue orthography in Gabon, Izere in Nigeria, and indigenous languages of the Americas such as Shoshoni and Kiowa.

Similar marks

The underscore is not the same character as the dash character, although one convention for text news wires is to use an underscore when an em-dash or en-dash is desired, or when other non-standard characters such as bullets would be appropriate.

The combining diacritic, ◌̱, (Macron below) is similar to the combining low line but its mark is shorter. The difference between "macron below" and "low line" is that the latter results in an unbroken underline when it is run together: compare a̱ḇc̱ and a̲b̲c̲ (only the latter should look like abc).[2]

Modern use

In printed documents underlining is generally avoided, with italics or small caps often used instead, or (especially in headings) using capitalization or bold type. In a manuscript to be typeset, Manuscripts|various forms of underlining were therefore conventionally used to indicate that text should be set in special type such as italics, part of a procedure known as markup.

A series of underscores (like __________ ) may be used to create a blank to be filled in on a form. It is also sometimes used to create a horizontal line; other symbols with similar graphemes, such as hyphens and dashes, are also used for this purpose.

Usage in computing

History

As early output devices (both CRTs and printers) could not produce more than one character at a location, it was not possible to underscore text, so common character sets of the 1950s had no underscore. IBM's EBCDIC character-coding system, introduced in 1964, added the underscore, which IBM referred to as the "break character". IBM's report on NPL (the early name of what is now called PL/I) leaves the character set undefined, but specifically mentions the break character, and gives RATE_OF_PAY as an example identifier.[3] By 1967 the underscore had spread to ASCII,[4] replacing the similarly-shaped left-arrow character, , previously residing at code point 95 (5F hex) in ASCII-1963 (see also: PIP). C, developed at Bell Labs in the early 1970s, allowed the underscore as an alphabetic character.[5]

Underscore predates the existence of lower-case letters in many systems, so often it had to be used to make multi-word identifiers, since CamelCase (see below) was not available.

Programming conventions

Underscores inserted between letters are very common to make a "multi word" identifier in languages that cannot handle spaces in identifiers. This convention is known as "snake case" (the other popular method is called camelCase, where capital letters are used to show where the words start).

An underscore as the first character in an ID is often used to indicate internal implementation that is not considered part of the API and should not be called by code outside that implementation. Python uses this for private member variables of classes, this is common in other languages such as C++ even though those provide keywords to indicate that members are private. It is extensively used to hide variables and functions used for implementations in header files. In fact the use of single underscore for this became so common that C compilers had to standardize on a double leading underscore (for instance __DATE__) for actual built-in variables to avoid conflicts with the ones in header files. Python uses double underscore to "mangle" a private id to make it much harder to refer to it, and "PHP reserves all function names starting with __ as magical."[6]

A variable named with just an underscore often has special meaning. $_ or _ is the previous command or result in many interactive shells, such as those of Python, Ruby, and Perl. In Perl, @_ is a special array variable that holds the arguments to a function. In Clojure, it indicates an argument whose value will be ignored.[7]

In some languages with pattern matching, such as Prolog, Standard ML, Scala, OCaml, Haskell, Erlang and Wolfram Language, the pattern _ matches any value, but does not perform binding.

See also

Notes

  1. Underlining is a proofreading convention that says "set this text in italic type".

References

  1. "Clavier normalisé – CAN/CSA Z243.200-92 – Pictogrammes ISO 9995-7" (in French). Office québécois de la langue française. Retrieved 19 January 2015. See also ISO/IEC 9995#ISO/IEC 9995-7.
  2. "6.2 General Punctuation" (PDF). The Unicode Standard. Version 11.0.0. Mountain View, CA: The Unicode Consortium. 2018. p. 273. ISBN 978-1-936213-19-1. Retrieved 2018-12-12. Spacing Overscores and Underscores. U+203E OVERLINE is the above-the-line counterpart to U+005F low line. It is a spacing character, not to be confused with U+0305 COMBINING OVERLINE. As with all overscores and underscores, a sequence of these characters should connect in an unbroken line. The overscoring characters also must be distinguished from U+0304 COMBINING MACRON, which does not connect horizontally in this way.
  3. NPL Technical Report (PDF). IBM. 1964. p. 23. Retrieved 2011-06-09.
  4. Fischer, Eric. "The Evolution of Character Codes, 1874-1968" (PDF). Retrieved 2016-11-16. Cite journal requires |journal= (help)
  5. Ritchie, Dennis (c. 1975). "C Reference Manual" (PDF). Retrieved 2011-06-09. Cite journal requires |journal= (help)
  6. "Magic Methods". php.net. August 28, 2004. Archived from the original on August 30, 2004. Retrieved February 3, 2020.
  7. Bozhidar Batsov. "The Clojure Style Guide". Retrieved 2019-09-05.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.