| Intermediate LPC |
| Descartes of Borg |
| November 1993 |
| |
| Chapter 5: Advanced String Handling |
| |
| 5.1 What a String Is |
| The LPC Basics textbook taught strings as simple data types. LPC |
| generally deals with strings in such a matter. The underlying driver |
| program, however, is written in C, which has no string data type. The |
| driver in fact sees strings as a complex data type made up of an array of |
| characters, a simple C data type. LPC, on the other hand does not |
| recognize a character data type (there may actually be a driver or two out |
| there which do recognize the character as a data type, but in general not). |
| The net effect is that there are some array-like things you can do with |
| strings that you cannot do with other LPC data types. |
| |
| The first efun regarding strings you should learn is the strlen() efun. |
| This efun returns the length in characters of an LPC string, and is thus |
| the string equivalent to sizeof() for arrays. Just from the behaviour of |
| this efun, you can see that the driver treats a string as if it were made up |
| of smaller elements. In this chapter, you will learn how to deal with |
| strings on a more basic level, as characters and sub strings. |
| |
| 5.2 Strings as Character Arrays |
| You can do nearly anything with strings that you can do with arrays, |
| except assign values on a character basis. At the most basic, you can |
| actually refer to character constants by enclosing them in '' (single |
| quotes). 'a' and "a" are therefore very different things in LPC. 'a' |
| represents a character which cannot be used in assignment statements or |
| any other operations except comparison evaluations. "a" on the other |
| hand is a string made up of a single character. You can add and subtract |
| other strings to it and assign it as a value to a variable. |
| |
| With string variables, you can access the individual characters to run |
| comparisons against character constants using exactly the same syntax |
| that is used with arrays. In other words, the statement: |
| if(str[2] == 'a') |
| is a valid LPC statement comparing the second character in the str string |
| to the character 'a'. You have to be very careful that you are not |
| comparing elements of arrays to characters, nor are you comparing |
| characters of strings to strings. |
| |
| LPC also allows you to access several characters together using LPC's |
| range operator ..: |
| if(str[0..1] == "ab") |
| In other words, you can look for the string which is formed by the |
| characters 0 through 1 in the string str. As with arrays, you must be |
| careful when using indexing or range operators so that you do not try to |
| reference an index number larger than the last index. Doing so will |
| result in an error. |
| |
| Now you can see a couple of similarities between strings and arrays: |
| 1) You may index on both to access the values of individual elements. |
| a) The individual elements of strings are characters |
| b) The individual elements of arrays match the data type of the |
| array. |
| 2) You may operate on a range of values |
| a) Ex: "abcdef"[1..3] is the string "bcd" |
| b) Ex: ({ 1, 2, 3, 4, 5 })[1..3] is the int array ({ 2, 3, 4 }) |
| <* NOTE Highlander@MorgenGrauen |
| Also possible in MorgenGrauen (in common: Amylaar-driver LPMuds): |
| "abcdef"[2..] -> "cdef" and |
| "abcdef"[1..<2] -> "bcde" (< means start counting from the end and with 1) |
| *> |
| |
| And of course, you should always keep in mind the fundamental |
| difference: a string is not made up of a more fundamental LPC data type. |
| In other words, you may not act on the individual characters by |
| assigning them values. |
| |
| 5.3 The Efun sscanf() |
| You cannot do any decent string handling in LPC without using |
| sscanf(). Without it, you are left trying to play with the full strings |
| passed by command statements to the command functions. In other |
| words, you could not handle a command like: "give sword to leo", since |
| you would have no way of separating "sword to leo" into its constituent |
| parts. Commands such as these therefore use this efun in order to use |
| commands with multiple arguments or to make commands more |
| "English-like". |
| |
| Most people find the manual entries for sscanf() to be rather difficult |
| reading. The function does not lend itself well to the format used by |
| manual entries. As I said above, the function is used to take a string and |
| break it into usable parts. Technically it is supposed to take a string and |
| scan it into one or more variables of varying types. Take the example |
| above: |
| |
| int give(string str) { |
| string what, whom; |
| |
| if(!str) return notify_fail("Give what to whom?\n"); |
| if(sscanf(str, "%s to %s", what, whom) != 2) |
| return notify_fail("Give what to whom?\n"); |
| ... rest of give code ... |
| } |
| |
| The efun sscanf() takes three or more arguments. The first argument is |
| the string you want scanned. The second argument is called a control |
| string. The control string is a model which demonstrates in what form |
| the original string is written, and how it should be divided up. The rest |
| of the arguments are variables to which you will assign values based |
| upon the control string. |
| |
| The control string is made up of three different types of elements: 1) |
| constants, 2) variable arguments to be scanned, and 3) variable |
| arguments to be discarded. You must have as many of the variable |
| arguments in sscanf() as you have elements of type 2 in your control |
| string. In the above example, the control string was "%s to %s", which |
| is a three element control string made up of one constant part (" to "), |
| and two variable arguments to be scanned ("%s"). There were no |
| variables to be discarded. |
| |
| The control string basically indicates that the function should find the |
| string " to " in the string str. Whatever comes before that constant will |
| be placed into the first variable argument as a string. The same thing |
| will happen to whatever comes after the constant. |
| |
| Variable elements are noted by a "%" sign followed by a code for |
| decoding them. If the variable element is to be discarded, the "%" sign |
| is followed by the "*" as well as the code for decoding the variable. |
| Common codes for variable element decoding are "s" for strings and "d" |
| for integers. In addition, your mudlib may support other conversion |
| codes, such as "f" for float. So in the two examples above, the "%s" in |
| the control string indicates that whatever lies in the original string in the |
| corresponding place will be scanned into a new variable as a string. |
| |
| A simple exercise. How would you turn the string "145" into an |
| integer? |
| |
| Answer: |
| int x; |
| sscanf("145", "%d", x); |
| |
| After the sscanf() function, x will equal the integer 145. |
| |
| Whenever you scan a string against a control string, the function |
| searches the original string for the first instance of the first constant in |
| the original string. For example, if your string is "magic attack 100" and |
| you have the following: |
| int improve(string str) { |
| string skill; |
| int x; |
| |
| if(sscanf(str, "%s %d", skill, x) != 2) return 0; |
| ... |
| } |
| you would find that you have come up with the wrong return value for |
| sscanf() (more on the return values later). The control string, "%s %d", |
| is made up of to variables to be scanned and one constant. The constant |
| is " ". So the function searches the original string for the first instance |
| of " ", placing whatever comes before the " " into skill, and trying to |
| place whatever comes after the " " into x. This separates "magic attack |
| 100" into the components "magic" and "attack 100". The function, |
| however, cannot make heads or tales of "attack 100" as an integer, so it |
| returns 1, meaning that 1 variable value was successfully scanned |
| ("magic" into skill). |
| |
| Perhaps you guessed from the above examples, but the efun sscanf() |
| returns an int, which is the number of variables into which values from |
| the original string were successfully scanned. Some examples with |
| return values for you to examine: |
| |
| sscanf("swo rd descartes", "%s to %s", str1, str2) return: 0 |
| sscanf("swo rd descartes", "%s %s", str1, str2) return: 2 |
| sscanf("200 gold to descartes", "%d %s to %s", x, str1, str2) return: 3 |
| sscanf("200 gold to descartes", "%d %*s to %s", x, str1) return: 2 |
| where x is an int and str1 and str2 are string |
| |
| 5.4 Summary |
| LPC strings can be thought of as arrays of characters, yet always |
| keeping in mind that LPC does not have the character data type (with |
| most, but not all drivers). Since the character is not a true LPC data |
| type, you cannot act upon individual characters in an LPC string in the |
| same manner you would act upon different data types. Noticing the |
| intimate relationship between strings and arrays nevertheless makes it |
| easier to understand such concepts as the range operator and indexing on |
| strings. |
| |
| There are efuns other than sscanf() which involve advanced string |
| handling, however, they are not needed nearly as often. You should |
| check on your mud for man or help files on the efuns: explode(), |
| implode(), replace_string(), sprintf(). All of these are very valuable |
| tools, especially if you intend to do coding at the mudlib level. |
| |
| Copyright (c) George Reese 1993 |