php pattern matching

php pattern matching

Posted by: MajorGeek
Posted on: 2006-12-05 13:53:00

Ok, I got past my problem with file_get_contents (http://discussion.dreamhost.com/showthreaded.pl?Cat=&Board=forum_programming&Number=62841) by using curl instead. But I can't seem to figure out pattern matching in eregi. I'd just like to pull the dates and flows into arrays to plot with my data, but in this test script I don't seem to be able to grab the flows with anything I've tried for $pattern. Is it egregi or something else?

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Prosser Flow</title>
</head>

<body>
<?php

$theurl="http://www.usbr.gov/pn-bin/yak/arc3.pl"
."?station=YRPW&year=2006&month=4&day=1&year=2006&month=7&day=31&pcode=QD";

// if (!($contents = file_get_contents($theurl)))
//{
// echo 'Could not open URL';
// exit;
// }
$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, $theurl);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$contents = curl_exec($ch);
curl_close($ch);

// display file
echo $contents;

// find the part of the page we want and output it
//$pattern = '([0-9]+.[0-9]+)';
// $pattern = '(2122.08)';
//$pattern = '(^[0-9]+.[0-9]+$)';
//$pattern = '[[:digit:]]{4}.[[:digit:]]{2}';
$pattern = '^[[:digit:]]{4}.[[:digit:]]{2}$';
if (eregi($pattern, $content, $flow))
{
echo "<p>$flow is: ";
echo $flow[1];
echo '</p>';
}
else
{
echo '<p>Nothing matched</p>';
};

?>
</body>
</html>


This signature line intentionally blank.

Re: php pattern matching

Posted by: silkrooster
Posted on: 2006-12-05 20:32:00

I see a typo:
if (eregi($pattern, $content, $flow))
should be:
if (eregi($pattern, $contents, $flow))
Silk



My website

Re: php pattern matching

Posted by: pangea33
Posted on: 2006-12-05 20:38:00

I think that maybe you're approaching the task from too complicated a direction. Regular Expressions are very powerful and vital when working with huge data sets. They can also be notoriously problematic when working with data whose formatting isn't 100% guaranteed. In my experience there is almost ALWAYS chaff in my data wheat. I wrote this script that worked just fine with this particular data sample. Maybe you'll find it useful.

Yeah, I might have pegged the needle on the dorkness meter when I chose to do this for some sort of obscure entertainment, albeit entertainment that is useful for strengthening a skill set.

<?php
$theurl="http://www.usbr.gov/pn-bin/yak/arc3.pl"
."?station=YRPW&year=2006&month=4&day=1&year=2006&month=7&day=31&pcode=QD";

$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, $theurl);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$contents = curl_exec($ch);
curl_close($ch);

/******************************************
// I chose to save the content off to a file while working with it, rather than keep hitting the server

file_put_contents("/var/www/curl/sess.txt", $contents);
$contents = file_get_contents("/var/www/curl/sess.txt");

*******************************************/

// The feed appears to have useful parsing point identifiers.
// I just used them to get everything between them (including flags)
$cStartStr = "BEGIN DATA";
$cEndStr = "END DATA";
$cPageTail = stristr($contents, $cStartStr);
$nUsefulDataEndPos = strpos($cPageTail, $cEndStr);
$cUsefulData = substr($cPageTail, 0, $nUsefulDataEndPos);

// explode the content using newlines as delimeters
$aContents = explode(chr(10), $cUsefulData);

// i'll be putting the line items into an array. Two array types are used, choose one according to your preference
$aDateQD1 = array();
$aDateQD2 = array();

// skip the leading and trailing junk
// Prolly don't want to do all these assignments in the loop. They're just used for readability
for ($i=3; $i<count($aContents)-1; $i++) {

// Dates are formatted as 10 characters
$cDateStr = substr($aContents[$i],0,10);

// QD is everything in the trimmed value after the last space
$nQDVal = substr($aContents[$i], strrpos(trim($aContents[$i]), chr(32))+1);

// put the QD values into an array keyed with the date string
$aDateQD1[ $cDateStr ] = $nQDVal;

//put each date/QD combination into their own individual array elements
$aDateQD2[] = array($cDateStr, $nQDVal);
}

//peep scene
echo('<pre>');
print_r($aDateQD1);
print_r($aDateQD2);
echo('</pre>');

?>

Re: php pattern matching

Posted by: rlparker
Posted on: 2006-12-05 22:24:00

In reply to:

I might have pegged the needle on the dorkness meter when I chose to do this for some sort of obscure entertainment, albeit entertainment that is useful for strengthening a skill set.


Hey, "dorkness" be damned, that was a good and useful exercise, and no doubt very helpful to the original poster. Good job! smile

--rlparker

Re: php pattern matching

Posted by: kchrist
Posted on: 2006-12-06 06:55:00

In reply to:

Regular Expressions are very powerful and vital when working with huge data sets. They can also be notoriously problematic when working with data whose formatting isn't 100% guaranteed.


Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- jwz

Tags: dreamhostpl