String and Regular Expression Objects
String Objects
- String objects are among the most commonly encountered objects in JavaScript.
- Whenever a value is quoted (either single quotes, double quotes, or backticks) it becomes a string object and gains access to the methods and properties of that object type.
- Form data is always treated as a string object (even number inputs).
- JavaScript string methods return a copy that is a modified version of the string.
- DOM methods impact the original string (text node).
- The only property for String objects is
length; it reflects the number of characters in the string.
Changing String Content
| Method | Description | Code Example |
|---|---|---|
| replace() replaceAll() |
JavaScript method that replaces one character or sequence of characters with some other text.
Two parameters are typically passed. The first parameter is the string or regular expression being matched. The second parameter is the replacement. If a regular expression is used with the Typically the regular expression is preferred because of that global update (often the number of potential matches varies). Optionally a third parameter can be passed; this references a function that specifies how the return value is handled.
|
let url = "what we do"; url = url.replace(/\s/g,"_"); The value of To achieve the same without using a regular expression:
let url = "what we do";
url = url.replaceAll(" ","_");
|
| toLowerCase() | JavaScript method that returns the value of the string (or substring) in all lowercase. |
let x = 'Hello World'; x = x.toLowerCase(); The value of |
| toUpperCase() |
JavaScript method that returns the value of the string (or substring) in all uppercase.
This method and |
let x = 'Hello World'; x = x.toUpperCase(); The value of |
Splitting and Merging Strings / Text Nodes
| Method | Description | Code Example |
|---|---|---|
| concat() |
JavaScript method that joins together multiple strings.
An unlimited number of parameters can be passed; each needs to be a string. The original strings used in the concatenation are not modified. |
const x = "JavaScript";
const y = "great language!";
const z = x.concat(' is a ', y);
The |
| split() |
JavaScript method for dividing up a string and storing the pieces in an array.
The parameter passed is a character (or regular expression) that will serve as the location of the split. An array is returned of the split apart values (the character used as the split location is not included in those values). |
const num = "734-111-2222";
const numArray = num.split("-");
The value of |
| splitText() |
DOM method that splits a text node into two text nodes.
The single parameter passed is the position at which to split the text node. |
textNode.splitText(4); const newNode = textNode.nextSibling; The above code splits the |
| normalize() |
DOM method that merges adjacent text nodes.
Always called from a parent element node and impacts its child text nodes (but not descendants further down). Serves as the opposite of |
parentElementNode.normalize(); |
Extracting Part of a String
| Method | Description | Code Example |
|---|---|---|
| substr() |
JavaScript method that isolates a subset of a string and returns that substring.
Typically two parameters are passed. The first parameter is the starting position and the second parameter is the number of characters to capture. Note that the final character captured is actually the character before the end position. If just one value is supplied, the rest of the string (after that point) is captured. |
const stringEx = 'JavaScript'; const substrgEx = stringEx.substr(0,4); The value of or: const phoneEx = "734-111-2222"; const sansAreaCode = phoneEx.substr(4); The value of |
| substring() |
JavaScript method that isolates a subset of a string and returns that substring.
Typically two parameters are passed. The first parameter is the starting position and the second parameter is the ending position. Note that the final character captured is actually the character before the end position. If just one value is supplied, the rest of the string (after that point) is captured. |
const strg = 'JavaScript'; const justScript = strg.substring(4,10); The value of or: const phone = "734-111-2222"; const noAreaCode = phone.substring(4); The |
| slice() | JavaScript method that works just like substring(). |
See substring() example. |
| substringData() | DOM method that is identical to the JavaScript substr() method. |
See substr() example. |
Locating Character(s), a Position in a String, or Substring Existence
| Method | Description | Code Example |
|---|---|---|
| charAt() |
JavaScript method that returns the character at the indicated position.
Accepts just that one parameter, which is a number. The first character in a string is at position |
const name = 'Jason'; const third = name.charAt(2); The value of |
| endsWith() |
JavaScript method that returns a boolean based on whether the string parameter that is passed exists at the end of the string (or at the length passed).
This matching is case-sensitive. Accepts a second parameter (a number) for the length. If not passed, this defaults to the string length. This parameter allows you to check for a string based on where it would end (since you are passing the length, pass a number one greater than the final character's position). |
const str = 'Web Design and Development';
const e1 = str.endsWith('Development');
const e2 = str.endsWith('Design', 10);
The value of The value of |
| includes() |
JavaScript method that returns a boolean based on whether the string parameter passed exists (or does not exist) as a substring.
This matching is case-sensitive. Accepts a second parameter (a number) for the character position where the lookup should start. If this number is negative, the entire string is matched against. Defaults to 0. |
const str = 'Web Design';
const exists1 = str.includes('Design');
const exists2 = str.includes('design');
The value of The value of |
| indexOf() |
JavaScript method that begins at the start of the string and returns the position of the first matching character.
If just one parameter is passed, it is the string you are trying to match. If two parameters are passed the first parameter is the string you are trying to match and the second is the numeric position in the string to begin looking (the matching progresses from left to right from that position). If no match is found |
const phone = "734-111-2222";
const start = phone.indexOf("-") + 1;
const localPh = phone.substring(start);
The value of |
| lastIndexOf() |
JavaScript method that begins at the end of the string and returns the position of the first matching character (working right to left).
If just one parameter is passed, it is the string you are trying to match. If two parameters are passed the first parameter is the string you are trying to match and the second is the position in the string to begin looking (the matching progresses from right to left from that position). If no match is found |
const phone = "734-111-2222";
const pos = phone.lastIndexOf("-") + 1;
const last4 = phone.substring(pos);
The value of |
| match() |
JavaScript method that searches a string and returns the value matched, if indeed a match occurred.
If no match is found then Either a string or a regular expression can be passed as the parameter. If a regular expression is used with the |
const theTopic = "Web Development"; const lookWeb = theTopic.match(/Web/); The value of |
| search() |
JavaScript method that searches a string and returns the position value of the first character matched, if indeed a match occurred.
If no match is found then Either a string or a regular expression can be passed as the parameter. |
const theTopic = "Web Development"; const dev = theTopic.search(/Dev/); The value of |
| startsWith() |
JavaScript method that returns a boolean based on whether the string parameter exists at the specified position (which defaults to 0, which is the start of the string).
This matching is case-sensitive. Accepts a second parameter (a number) for the character position where the matching should start. If this number is negative, the entire string is matched against. Defaults to 0. |
const str = 'Web Design';
const s1 = str.startsWith('Web');
const s2 = str.startsWith('Design', 4);
The value of The value of |
Chaining Together String Methods
- We can also use multiple
Stringmethods in sequence and applied to the same string using dot notation, saving us time and streamlining code:const s = 'JavaScript'.toUpperCase().substring(4);The value of
sis:SCRIPT
Additional Examples and String Methods
- For additional examples and details on even more methods, MDN is always a good resource.
Regular Expression Objects
- In JavaScript there is a
RegExpcore object that can be combined with theStringcore object for pattern matching and search-replace functionality. - Regular expressions describe a pattern of characters.
- The regular expression literal (which automatically creates a Regular Expression object) uses this pattern:
/ / - Entire books have been written about regular expressions and they can become quite complex (and more powerful / precise as complexity increases). Our focus is on simpler implementations that suffice for form validation and string modification purposes.
Matching Patterns at the Level of Words and Sequences of Characters
- If we wanted to determine whether the word 'Internet' was used as part of a string, the regular expression would be:
/Internet/
- In this context we are dealing with literals, meaning that we are exactly matching that spelling and capitalization.
- Note that if you were searching for a literal
/in the string, you would need to escape that with the\character, coded as:\/ - This approach of escaping characters is one way to test for the presence of additional literals that would otherwise have a special meaning or interpretation in the regular expression.
- Other literals include:
\n // Matches new line \f // Matches form feed \r // Matches carriage return \t // Matches tab \v // Matches vertical tab
Matching Patterns at the Level of Individual Characters
- If we surround the regular expression value with square brackets
[ ]we are able to match any one character within the brackets. If we had:/[Internet]/
We would match any one of the letters: 'I', 'n', 't', 'e', or 'r'.
- If we wanted to match any characters but those (anything but what is contained in the square brackets), we use a caret
^to start off the bracket content:/[^Internet]/
- To match a range of characters we use a hyphen. If we wanted to match any alphanumeric characters we would specify:
/[a-zA-Z0-9]/
-
Some of these patterns are extremely common and so some special characters are used to simplify coding:
\w= Any word character. Same as[a-zA-Z0-9_]\W= Any non-word character. Same as[^a-zA-Z0-9_]\s= Any whitespace character. Includes tabs, vertical tabs, newlines, carriage returns, and form feeds. Equivalent to[\t\r\n\v\f]\S= Any non-whitespace character. Not the same as\W, because\Sexcludes tabs, vertical tabs, newlines, carriage returns, and form feeds. Equivalent to[^\t\r\n\v\f]\d= Any digit. Same as[0-9].\D= Any non-digit character. Same as[^0-9]..= Any character except newline.
Matching Patterns Involving a Certain Number of Characters
- Lets assume that we wanted to look for a 5-digit pattern. We could specify that as:
/\d{5}/This is equivalent to:
/\d\d\d\d\d/
- With this example we see that curly braces
{}containing a single value match an exact number of occurrences. - If we want to match a range of occurrences we can provide a second value. In this case the first number is the minimum number of occurrences and the second number is the maximum.
/\d{5,10}/Now we would be matching at least 5 digits and no more than 10 digits. If we wanted to open up the maximum so that there was no cap, we would have a comma followed by no value (so there would need to be 5 digits, although there could be more):
/\d{5,}/ - If we want to match zero or one occurrences of an item we follow it with a question mark
? - Zero or more (could be greater than one) occurrences of the previous item is signified by a
* - The
+character matches one or more occurrences - Some examples:
// 5 digits and optional single digit /\d{5}\d?/ // 5 digits and zero or more digits /\d{5}\d*/ // 5 digits and one or more digits /\d{5}\d+/ - If we wanted to provide alternatives during matching, we would use the
|(pipe) character, such as:// 2 digits or 4 non-digit characters /\d{2}|\D{4}/
Grouping Sets of Characters
- To group a set of characters (referred to as a subexpression or register) the parentheses
()are used. - This can be useful when applying
?,*,+, or|to the group:// 2 digits or 4 non-digit characters, // plus one or more word character(s) /(\d{2}|\D{4})\w+/ - Another advantage of grouping is that these subexpressions can be referenced again later using their position, starting from the left and numbered starting with 1, which is incremented as further subexpressions are added.
The subexpression above would be referenced as
\1, and if we added another to the right of the first that would be\2.There can be up to nine of these (at
\10we encounter issues, because that is interpreted as\1).
Matching Based on Position in String
- The
^can also be used inside the regular expression to indicate that the pattern must start at the beginning of the string. A$indicates that the pattern must exist at the end of the string. If we wanted to isolate the word 'Internet' as the exact value of the string (no other characters allowed), the regular expression would be:/^Internet$/
- The importance of this in form validation is substantial; we do not want users to be able to sneak extra characters into a string at the start or the end (this would still validate because the middle portion would meet the regular expression criteria). Use of
^and$prevent that from happening.
Matching Based on Word Boundaries
- If we wanted to match based on where a word ends and a non-word character begins, we would use
\b(word boundary), such as:// Matches "The Internet is Great" /net\b/
- The
\Bmatches a non-word boundary:// Matches "Internet" but not // "It was a net gain" /\Bnet/
Case Insensitivity, Global Matching, and Multiline Mode Flags
- To do matching without regard to case, add an
ioutside the final/of the regular expression. - To do a global match (to match every occurrence of the pattern in the string) add a
goutside the final/of the regular expression. This is essential when searching for a pattern and, when you find a match, making changes to the string there; you do not want to miss any of the places where changes should occur. Withoutgonly the first match would be changed.// Matches INTERNET, Internet, // INternet, internet, etc. /Internet/gi
- To switch to multiline mode specify
mafter the final/of the regular expression. In this mode^matches the beginning of a line or the beginning of a string.$matches the end of a line or the end of a string.// Matches 'Internet' and // 'On the Internet\nNo // one knows you're a dog' /Internet$/m
Regular Expression Methods
| Method | Description | Code Example |
|---|---|---|
| test() |
JavaScript method that is passed a string and returns true if the string matches the regular expression.
If there is no match then This is the fastest method for string examination (faster than the String methods). |
const textEx = /\w+/.test("Internet");
const numberEx = /\d+/.test("Internet");
The value of The value of |
| exec() |
JavaScript method with identical syntax to test(), however the result is very different.
What gets returned is an array instead of a boolean value. We won't delve into the details of the array because it gets fairly involved, so stick with the If a match is not found, |
const coding = /\w{4,}/g.exec("Web Coding is Fun");
The value of |
Form Validation
- Form validation refers to checking form data prior to sending the data to a server-side script for processing and/or saving it to a cookie or localStorage (a database within the browser).
- By catching errors and bad data at this stage, time is saved and server resources are saved.
- However there is always the risk that the user will have disabled JavaScript, eliminating this data checking.
- Because of this limitation server-side data checking is a necessity for critical data.
- There are two types of form validation:
- Data Existence: Has data been entered or a selection made?
- Data Correctness: Is the data valid and properly formatted?
- The form validation examples are primarily intended to catch accidents in data entry; users really wanting to enter bad data can still find a way to do so, such as making up a properly formed yet non-existent email address.
Data Existence Checking for Input Boxes and Textarea Boxes
- Use the
valueproperty of the form element. - If that
valueis empty (value === ""orvalue === '') then the form field contains no data.
Data Existence Checking for Radio Buttons and Checkboxes
- If one of these is clicked by the user then the
checkedproperty istruefor that element node. - Typically the approach taken is to:
- Set a status variable to either
trueorfalse - Loop through the form elements and if any of them are checked then flip that status variable to the opposite value.
- After the loop if the status variable is still at its initial value you know that none of the radio buttons or checkboxes were checked.
- Set a status variable to either
Data Existence Checking for Select Menus
- If the menu is showing multiple options (it is not a drop-down menu) and if no selections have been made then the
selectedIndexis-1. - If only one selection is possible (the standard drop-down menu is this way) then the
options[0]position is automatically selected in some browsers but not necessarily in all browsers, so in that case you would want to check for aselectedIndexof less than or equal to0. This is most effective if the first option is some default text, such as 'Please choose...', because that is not a valid selection.
Data Existence Checking Example
See the Pen Student Profile by Jason Withrow (@jwithrow) on CodePen.
Data Correctness Checking Example
The structural markup for this example is very similar to the prior example. The two fields for name have been replaced with a single field for name and an email field.
The JavaScript code is also similar, with the changes being regular expressions for the two fields that were changed.
See the Pen Data Correctness Checking by Jason Withrow (@jwithrow) on CodePen.
Regular Expressions for Name and Email Address
- 'Name' is:
- Expected to be at least two words, separated by a space.
- Hyphens are allowed in both first and last name.
- The regular expression also handles additional characters after the last name, such as II (e.g., John Smith II) and Jr. (e.g., John Smith, Jr.), as well as the commas and period.
- Middle names will also be accepted.
- The first name must have at least one letter and can include uppercase and lowercase letters, as well as hyphens:
[a-zA-Z\-]+ - This is followed by at least one space:
\s+ - The last name must also contain at least: one letter (uppercase or lowercase), one hypen, one period, one comma, or one space:
[a-zA-Z\-\.\,\s]+
- The email address:
- Starts with at least one alphanumeric character (including underscores), one period, or one hyphen:
([\w\.\-])+ - The next character is a single
@sign:\@ - Following the
@sign is at least a single alphanumeric character (including underscores) or a hyphen, as well as at least one period. This is set up so that there can be multiple sub-domains (e.g.,space.wccnet.):(([\w\-])+\.)+ - Then we reach the top-level domain name, which ranges in size from 2-6 characters (there are .museum and .travel top-level domains):
([\w]{2,6})+ - Note that the characters after
@include digits, in case the person enters an email address where the part after the@sign is an IP address (this is valid, since all domain names map to IP addresses ultimately).
- Starts with at least one alphanumeric character (including underscores), one period, or one hyphen:
- In both cases we check the entire string, from the start of the string
^to the last character in the string$ - It's worth noting that there is a
patternattribute for input elements that accepts regular expressions and will check data against that pattern. Thetype="email"input already uses a regular expression for checking. And these are widely supported (going back to Internet Explorer 10). - While this attribute may seem to rule out the need for JavaScript, think how trivial it is to change the form element to be something else (using the developer tools), to use an old non-supportive browser, etc. There are many ways to bypass it. In contrast, bypassing JavaScript often breaks the site (although I don't recommend it for accessibility reasons, sometimes JavaScript is used to add the submit button to the form, so messing with JavaScript could result in a form with no submit button).
- You are also at the mercy of the browser's implementation for
patternand their error messages. You cannot change them. With JavaScript you have all the control you desire over the user experience.