More Books
PHP 5 Unleashed
PHP 5 Unleashed
Table of Contents
Copyright
Lead Author
Contributing Authors
Acknowledgments
We Want to Hear from You!
Reader Services
Introduction
Organization of the Book
Part I. Working with PHP for General Web Development
Chapter 1. Basic PHP Development
How PHP Scripts Work
Basic PHP Syntax
Basic PHP Data Types
Variable Manipulation
Control Structures
User-Defined Functions
Dynamic Variables and Functions
Multiple File PHP Scripts
References
Strings in PHP
Comparing Strings
Advanced String Comparison
Search and Replacement
Formatting Strings
Strings and Locales
Formatting Date and Time Values
Summary
Chapter 2. Arrays
Basic Arrays
Implementing Arrays
More Array Materials
Chapter 3. Regular Expressions
The Basics of Regular Expressions
Limitations of the Basic Syntax
POSIX Regular Expressions
Perl-Compatible Regular Expressions (PCRE)
PCRE Modifiers
A Few Final Words
Chapter 4. Working with Forms in PHP
HTML Forms 101
Working with Form Submissions in PHP
Summary
Chapter 5. Advanced Form Techniques
Data Manipulation and Conversion
Form Data Integrity
Form Processing
Summary
Chapter 6. Persistent Data Using Sessions and Cookies
HTTP Cookies
PHP Sessions
Advanced Sessions
Summary
Chapter 7. Using Templates
The What and Why of Templates
The Smarty Template Engine
Summary
Part II. Advanced Web Development
Chapter 8. PEAR
What Is PEAR?
Getting and Installing PEAR
Using the PEAR Package Manager
Using the PEAR Website
Using PEAR Packages in Applications
Summary
Reference
Chapter 9. XSLT and Other XML Concerns
Relating XML to HTML
Using XSLT to Describe HTML Output Using XML Input
PHP4 and XSLT Using the DOM XML Module
PHP4 and XSLT Using the XSLT Module
PHP5 and XSLT
Accessing XML Data Using SimpleXML
Generating XML Documents Using PHP
Summary
References
Chapter 10. Debugging and Optimizations
Debugging Your PHP Scripts
Optimizing Your PHP Scripts
Summary
Chapter 11. User Authentication
Authenticating Users in PHP
Securing PHP Code
Summary
Chapter 12. Data Encryption
Shared Secret Versus Public Key
Shared Secret Algorithms
Public Key Cryptography
Using Public Keys in PHP
Summary
Chapter 13. Object-Oriented Programming in PHP
Why Objects?
Creating Basic Classes
Advanced Classes
Special Methods
Class Autoloading
Object Serialization
Exceptions
Iterators
Summary
Chapter 14. Error Handling
The PHP Error-Handling Model
What to Do About Errors
The Default Error Handler
Error Suppression
Custom Error Handlers
Causing Errors
Putting It All Together
Summary
Chapter 15. Working with HTML/XHTML Using Tidy
Introduction
Basic Tidy Usage
Tidy Configuration Options
Using the Tidy Parser
Applications of Tidy
Summary
Chapter 16. Writing Email in PHP
The MIME Protocol
Implementing MIME Email in PHP
Summary
Part III. Building Applications in PHP
Chapter 17. Using PHP for Console Scripting
Core CLI Differences
Working with PHP CLI
CLI Tools and Extensions
Summary
Chapter 18. SOAP and PHP
What Are Web Services?
Installation
Creating Web Services
Consuming Web Services
Looking for Web Services
Summary
Chapter 19. Building WAP-Enabled Websites
What Is WAP?
System Requirements
Introduction to WML
Serving WAP Content
Sample Applications
Summary
Part IV. I/O, System Calls, and PHP
Chapter 20. Working with the File System
Working with Files in PHP
File Permissions
File Access Support Functions
Summary
Chapter 21. Network I/O
DNS/Reverse DNS Lookups
Socket Programming
Network Helper Functions
Summary
Chapter 22. Accessing the Underlying OS from PHP
Introduction
Unix-Specific OS Functionality
Platform-Independent System Functions
A Brief Note About Security
Summary
Part V. Working with Data in PHP
Chapter 23. Introduction to Databases
Using the MySQL Client
Basic MySQL Usage
Summary
Chapter 24. Using MySQL with PHP
Performing Queries from PHP
A MySQLi Session Handler
What Is a Custom Session Handler?
Summary
Chapter 25. Using SQLite with PHP
What Makes SQLite Unique?
Basic SQLite Functionality
Working with PHP UDFs in SQLite
Odds and Ends
Summary
Chapter 26. PHP's dba Functions
Preparations and Settings
Creating a File-Based Database
Writing Data
Reading Data
Sample Application
Conclusion
Part VI. Graphical Output with PHP
Chapter 27. Working with Images
Basic Image Creation Using GD
Using the PHP/GD Drawing Functions
Working with Colors and Brushes
Using Fonts and Printing Strings
General Image Manipulation
Other Graphics Functions
Summary
Chapter 28. Printable Document Generation
A Note Regarding the Examples in This Chapter
Generating Dynamic RTF Documents
Generating Dynamic PDF Documents
Related Resources
Part VII. Appendixes
Appendix A. Installing PHP5 and MySQL
Installing PHP5
Installing MySQL and PHP Modules
Installing PEAR
Appendix B. HTTP Reference
What Is HTTP?
PHP Programming Libraries for HTTP Work
Understanding an HTTP Transaction
HTTP Client Methods
What Comes Back: Server Response Codes
HTTP Headers
Encoding
Identifying Clients and Servers
The "Referer"
Fetching Content from an HTTP Source
Media Types
Cookies: Preserving State and a Tasty Treat
Security and Authorization
Client-Side Caching of HTTP Content
Appendix C. Migrating Applications from PHP4 to PHP5
Configuration
Object-Oriented Programming (OOP)
New Behavior of Functions
Further Reading
Appendix D. Good Programming Techniques and Performance Issues
Common Style Mistakes
Common Security Concerns
Style and SecurityLogging
Summary
Appendix E. Resources and Mailing Lists
Relevant Websites
Mailing Lists and Newsgroups
Index
SYMBOL
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

Perl-Compatible Regular Expressions (PCRE)

Perl Compatible Regular Expressions (PCRE) are much more powerful than their POSIX counterpartsand consequently, also more complex and difficult to use.

PCRE adds its own character classes to the extended regular expression rules that we saw earlier:

  • \w represents a "word" character and is equivalent to the expression [A-Za-z0-9].

  • \W represents the opposite of \w and is equivalent to [^A-Za-z0-9].

  • \s represents a whitespace character.

  • \S represents a nonwhitespace character.

  • \d represents a digit and is equivalent to [0-9].

  • \D represents a nondigit character and is equivalent to [^0-9].

  • \n represents a newline character.

  • \r represents a return character.

  • \t represents a tab character.

As you can see, PCRE are significantly more concise than their POSIX counterparts. In fact, our simple email validation regex can now be written as

/\w+@\w+\.\w{2,4}/

But, wait a minutewhat are those slash characters at the beginning and at the end of the regex string? PCRE requires that the actual regular expression be delimited by two characters. By convention, two forward slashes are used, although any character other than the backslash that is not alphanumeric would do just as well.

Naturally, regardless of which character you choose, you will be required to escape the delimiter whenever you use it as part of the regex itself. For example:

/face\/off/

is the equivalent of the regular expression face/off.

PCRE also expands on the concept of references, making them useful not only as a byproduct of the regex operation, but as part of the operation itself.

In PCRE, it is possible to use a reference that was defined previously in a regular expression as part of the expression itself. Let's make an example. Suppose that you find yourself in a situation in which you have to verify that in a string such as the following:

Marco is a programmer. Marco's specialty is programming.
John is a programmer. John's specialty is programming.

The name of the person to whom the sentence refers is the same in both positions (that is, "Marco" or "John"). Using a normal search-and-replace operation would take a significant effort, and so would using a POSIX regex, because you do not know the name of the person a priori.

With a PCRE, however, this operation is trivial. You start by matching the first portion of the string. The name is the first word:

/^(\w+) is a programmer.

Next, you specify the name again. As you can see, we included it in parentheses in the preceding expression, which means that we create a reference to it. We can now recall that reference inside the regex itself and use it to our advantage:

/^(\w+) is a programmer. \1's specialty is programming.$/

If you try to match the preceding regex against the following sentence:

Marco is a programmer. Marco's specialty is programming.

Everything will work fine. However, if you try it against this sentence:

Marco is a programmer. John's specialty is programming.

The regex compiler will not return a match because the reference won't match.

To give you an idea of how powerful PCREs are and why it's worth trying to learn them, let me give you an alternative to the simple one-line expression using POSIX:

<?php

    $s = 'Marco is a programmer. Marco\'s specialty is programming.';

    if (ereg ('^([[:alpha:]]+) is a programmer', $s, $matches)) {
      if (ereg ('([[:alpha:]]+)\'s specialty is programming.$', $s, $matches2)) {
        if ($matches[1] === $matches[1]) {
          echo "MATCH\n";
        } else {
          echo "NO MATCH\n";
       } else {
          echo "NO MATCH\n";
    } else {
      echo "NO MATCH\n";
      }
?>

Now, this is a simple example, and the POSIX solution is definitely not as elegant as it could be, but you can see here that it takes three separate operations to approximate the power of just one PCRE.

I should note that the inability to use references within the regex itself is actually a limitation of PHP, rather than of the POSIX standardwhich, unfortunately, means that the PHP implementation of regex is not POSIX compliant.

The main PCRE function in PHP is preg_match():

preg_match (pattern, string[, matches[, flags]]);

As in the case of ereg(), this function causes the regular expression stored in pattern to be matched against string, and any references matches are stored in matches. The optional flags parameter can actually contain only the value PREG_OFFSET_CAPTURE. If this parameter is specified, it will cause preg_match() to change the format of matches so that it will contain both the text and the position of each reference inside string. Let's make an example:

<?php

    $s = 'Another beautiful day';

    preg_match ('/beautiful/', $s, $matches, PREG_OFFSET_CAPTURE);

    var_dump ($matches);

?>

If you execute this script, you should receive the following output:

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(9) "beautiful"
    [1]=>
    int(8)
  }
}

As you can see, the $matches array now contains another array for each reference. The latter, in turn, contains both the string matched and its position within $s.

Another function of the PCRE family is preg_match_all, which has the same syntax as preg_match(), but searches a string for all the occurrences of a regular expression, rather than for a specific one. Here's an example:

<?php

$s = 'A beautiful day and a beauty of a lake';

preg_match_all ('/beaut[^ ]+/', $s, $matches);

var_dump ($matches)

?>

If you execute this script, it will output the following:

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(9) "beautiful"
    [1]=>
    string(6) "beauty"
  }
}

As you can see, the $matches array contains an array whose elements are arrays that correspond to the matches found for each of the references. In this case, because no reference was specified, only the 0th element of the array is present, but it contains both the string "beautiful" and "beauty". By contrast, if you had executed this regex using preg_match(), only the word "beautiful" would have been returned.

Search-and-replace operations in the world of PCRE are handled by the preg_replace function:

preg_replace (pattern, replacement, string[, limit]);

Much like ereg_replace(), this function applies the regex pattern to string and then substitutes the placeholders in replacement with the references defined in it. The limit parameter can be used to limit the number of replacements to a maximum number. Here's an example, which will output marcot at tabini dot ca:

<?php

    $s = 'marcot@tabini.ca';

    echo preg_replace ('/^(\w+)@(\w+)\.(\w{2,4})/', '\1 at \2 dot \3', $s);

?>

Keep in mind that this is only one way of using preg_replace(), in which the entire input string is substituted by the replacement string. In fact, you can use this function to replace only small portions of text:

<?php

    $s = 'The pen is on the table';

    echo preg_replace ('/on/', 'over', $s);

?>

If you execute this script, preg_replace() will replace the word "on" with the word "over" in $s, resulting in the output The pen is over the table.

The last function that I want to bring to your attention is preg_split(), which is somewhat equivalent to the explode() function that we discussed earlier, with the difference that it takes a regular expression as a delimiter, rather than a straight string, and that it includes a few additional features:

preg_split (pattern, string[, limit[, flags]]);

The preg_split function works by breaking string in substrings delimited by sequences of characters delimited by pattern. The optional limit parameter can be used to specify a maximum number of splitting operations. The flags parameter, on the other hand, can be used to modify the behavior of the function as described in Table 3.2.

Table 3.2. preg_split() Flags

Reference Number

Value

PREG_SPLIT_NO_EMPTY

Causes empty substrings to be discarded.

PREG_SPLIT_DELIM_CAPTURE

Causes any references inside pattern to be captured and returned as part of the function's output.

PREG_SPLIT_OFFSET_CAPTURE

Causes the position of each substring to be returned as part of the function's output (similar to PREG_OFFSET_CAPTURE in preg_match()).


Here's an example of how preg_split() can be used:

<?php

    $s = 'Ten times he called, and ten times nobody answered';

    var_dump (preg_split ('/[ ,]/', $s));

?>

This script causes the string $s to be split whenever either a space or a comma is found, resulting in the following output:

array(10) {
  [0]=>
  string(3) "Ten"
  [1]=>
  string(5) "times"
  [2]=>
  string(2) "he"
  [3]=>
  string(6) "called"
  [4]=>
  string(0) ""
  [5]=>
  string(3) "and"
  [6]=>
  string(3) "ten"
  [7]=>
  string(5) "times"
  [8]=>
  string(6) "nobody"
  [9]=>
  string(8) "answered"
}

As you can imagine, the explode() function by itself would have been inadequate in this case, because it would have been able to split $s based only on a single character.

Named Patterns

An excellent and very useful addition to PCRE is the concept of named capturing groups (which everybody always refers to as named patterns). A named capturing group lets you refer to a subpattern of your expression by an arbitrary name, rather than by its position inside the regular expression. For example, consider the following regex:

/^Name=(.+)$/

Now, you would normally address the (.+) subpattern as the first item of the match array returned by preg_match() (or as $1 in a substitution performed through a call to preg_replace() or preg_replace_all()).

That's all well and goodat least as long as you have only a limited number of subpatterns whose position never changes. Heaven forbid, however, that you should ever find yourself in a position to have to add a capturing subpattern at the beginning of a regex that already has six of them!

Luckily, this problem can be solved once and for all by assigning a "name" to each of your subpatterns. Take a look at the following:

/^Name=(?P<thename>.+)$/

This will create a backreference inside your expression that can be explicitly retrieved by using the name thename. If you run this regex through preg_match(), the backreference will be inserted in the match array both by number (using the normal numbering rules) and by name. If, on the other hand, you run it through preg_replace(), you can backreference it by enclosing it in parentheses and prefixing it with ?P=. For example:

preg_replace ("/^Name=(?P<thename>.+)$/", "My name is (?P=thename)", $value);
you may want to include an example of this functionality.