X-Cart: shopping cart software

X-Cart forums (https://forum.x-cart.com/index.php)
-   Dev Questions (https://forum.x-cart.com/forumdisplay.php?f=20)
-   -   HTML Catalog Cleaner - Removes excess white space (https://forum.x-cart.com/showthread.php?t=8260)

NuAlpha 06-28-2004 01:25 PM

All fixed! Tested it and it preserves the javascript while stripping the HTML. Doesn't hang anymore on my setup...which was the result of a stupid coding mistake. :P

Code:

<?php ini_set('zlib.output_compression', 'Off'); // Turn off zlib compression, if On, to prevent Mozilla output problems. ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>HTML Catalog Cleaner</title>
</head>
<body>
<?php
######################################################
##            ## HTML Catalog Cleaner ##            ##
######################################################
##                                                  ##
## Strips every file in the HTML catalog directory  ##
## of all excess white spaces.                      ##
######################################################
## Version: 1.1.2 (6/26/2004)
## Last updated: 6/28/2004

# Define the Constants #
########################
define('CATALOG_DIR', '/home/your-site-dir/path-to-xcart/catalog'); // Set the absolute directory path to your catalog.
define('BAR_LENGTH_REDUCER', 3); // If you have over 1000 HTML files in your catalog, you may wish to set this number higher.
# System constants.
define('MAX_ET', ini_get('max_execution_time'));
define('MAX_IT', ini_get('max_input_time'));

# Modify PHP Settings #
#######################
ini_set('max_execution_time', '14400'); // Make the maximum execution & input time 4 hours so that the script doesn't time-out.
ini_set('max_input_time', '14400');
ob_implicit_flush(1); // Show the progress in the browser.

# Initialize variables.
$successes = 0;
$failures = 0;
$filelength['init'] = 0;
$filelength['final'] = 0;
$cnt['tmp'] = 0; // Newline counter
$cnt['tot'] = 0; // Totals counter
$pblr = 0; // Progress bar length reducer variable

# Initialize regular expressions.
$regex = array(''=>'/[\t\n\r\f]+/', // Newlines and tabs
                          ' '=>'/ +/', // Excess spaces
                          ''=>'/ /i', // Additional space after non-breaking space
                          '><'=>'/> </' // Space between HTML tags
                          );
$java_saver = '/(<script[^>]*>.*?<\/script>)/si';

# Function to Clean-up #
########################
function script_shutdown() {
        ini_set('max_execution_time', MAX_ET); // Reset the maximum execution time.
        ini_set('max_input_time', MAX_IT); // Reset the maximum input time.
        ob_implicit_flush(0); // Data should be kept in the buffer until ready.
}

register_shutdown_function('script_shutdown'); // Register the shutdown function.

# Pad with 256 bytes for Internet Explorer to show output immediately.
for ($pad=0; $pad < 256; $pad++) echo "\t";

echo "Stripping the HTML files of excess spaces...
";

# Open the directory and store the file list.
if (is_dir(CATALOG_DIR)) {
        if ($dh = opendir(CATALOG_DIR)) {
                # Iterate over file list.
                while (($filename = readdir($dh)) !== false) { // Use instead of scandir to skip some files.
                        if (strpos($filename,'.htm') !== false)
                                $file_list[] = $filename;
                }
                closedir($dh); // Close the directory.
        }
       
        # Perform specific operations on the files.
        foreach ($file_list as $file) {
                $file_contents = file_get_contents(CATALOG_DIR.DIRECTORY_SEPARATOR.$file);
                $filelength['init'] += strlen($file_contents);
                # Examine document for javascript code blocks and preserve them for restoration.
                if (preg_match_all($java_saver,$file_contents,$got_java,PREG_SET_ORDER)) {
                        foreach ($got_java as $java_chip) {
                                if (is_array($java_chip)) {
                                        $java_scripts[] = $java_chip[1];
                                }
                        }
                        foreach ($regex as $replace=>$finds) // Do each replacement.
                                $file_contents = preg_replace($finds,$replace,$file_contents);
                        # Reverse the damage to the javascripts.
                        if (preg_match_all($java_saver,$file_contents,$got_java,PREG_SET_ORDER)) {
                                foreach ($got_java as $stripped_java) {
                                        if (is_array($stripped_java)) {
                                                # Find the stripped java and replace it with the original code.
                                                $file_contents = str_replace($stripped_java[1],current($java_scripts),$file_contents);
                                                next($java_scripts);
                                        }
                                }
                        }
                } else {
                        foreach ($regex as $replace=>$finds) // Do each replacement.
                                $file_contents = preg_replace($finds,$replace,$file_contents);
                }
                $fp = fopen(CATALOG_DIR.DIRECTORY_SEPARATOR.$file, 'w'); // Truncate file, then apply the modifications.
                if (!fwrite($fp,$file_contents)) {
                        $failure_list[] = $file; // Log failures.
                        $failures++;
                } else {
                        $successes++;
                }
                fclose($fp);
                unset($java_scripts);
                $java_scripts = array();
                $filelength['final'] += strlen($file_contents);
                if ($pblr == BAR_LENGTH_REDUCER) { // Progress bar length reducer.
                        echo '|'; // Lengthen the progess bar.
                        $cnt['tmp']++; // Increment the newline counter.
                        $pblr = 0; // Reset pblr counter.
                } else {
                        $pblr++;
                }
                $cnt['tot']++; // Increment totals counter.
                if ($cnt['tmp']==300) { echo '
'; $cnt['tmp']=0; } // Reset the counter.
                }
} else
        die(''.CATALOG_DIR.' is not a directory! Please check the path and try again.');

echo '

There were '.number_format($successes).' successful cleanings and '.number_format($failures).' failures out of a total of '.number_format($cnt['tot']).' files.</p>';
echo '

Your HTML Catalog files had a total combined length of '.number_format($filelength['init']).' characters.';
echo '
They now have a total length of '.number_format($filelength['final']).' characters.</p>';
echo 'That is a total of <u>'.number_format($filelength['init']-$filelength['final']).'</u> excess white spaces removed from your files.

';

if (isset($failure_list)) {
        echo '
The following files could not be written to:
';
        $c = 'Y'; // Init background color notifier.
        foreach ($failure_list as $fail) {
                # Show background color every other line for readability.
                if($c=='N') {$bgb=''; $bge=''; $c='Y';} else {$bgb='<font style="background-color:#E0E0E0">'; $bge='</font>'; $c='N';}
                echo $bgb.''.$fail.$bge.'
';
        }
}
?>
</body>
</html>


Let me know if you notice anything else that needs fixing. Enjoy!

adpboss 06-28-2004 03:13 PM

I'll test soon.

Thanks NuAlpha, it's a great mod if we get it working right. :)

NuAlpha 06-28-2004 05:46 PM

Minor update:

Replace the code:
Code:

# Pad with 256 bytes for Internet Explorer to show output immediately.
for ($pad=0; $pad < 256; $pad++) echo "\t";


...with:
Code:

# Pad with 256 bytes for Internet Explorer to show output immediately.
if (strpos($_SERVER['HTTP_USER_AGENT'],'MSIE') !== false)
        for ($pad=0; $pad < 256; $pad++) echo "\t"; echo "\n";


:wink:

NuAlpha 06-29-2004 11:05 AM

Potential bug that needs fixing...

Replace:
Code:

<?php ini_set('zlib.output_compression', 'Off'); // Turn off zlib compression, if On, to prevent Mozilla output problems. ?>


With:
Code:

<?php
        ini_set('zlib.output_compression', 'Off'); // Turn off zlib compression, if On, to prevent Mozilla output problems.
?>


That terminating PHP tag on the same line as the comment can cause problems.

NuAlpha 06-30-2004 11:20 AM

Just ran this on our latest HTML catalog. Stripped a total of 14,606,179 excess white spaces from all of the catalog files. Javascript was left untouched and everything seems to work great! 8)

adpboss 06-30-2004 07:22 PM

Works with my java pop stuff, it's relatively fast and the script terminates properly with the report at the end.

This includes all of the bug fixes and updates up to the time of this post.

Using version 3.4.14.

GREAT JOB NUALPHA!

Code:

<?php
  ini_set('zlib.output_compression', 'Off'); // Turn off zlib compression, if On, to prevent Mozilla output problems.
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>HTML Catalog Cleaner</title>
</head>
<body>
<?php
######################################################
##            ## HTML Catalog Cleaner ##            ##
######################################################
##                                                  ##
## Strips every file in the HTML catalog directory  ##
## of all excess white spaces.                      ##
######################################################
## Version: 1.1.2 (6/26/2004)
## Last updated: 6/28/2004

# Define the Constants #
########################
define('CATALOG_DIR', '/home/your-site-dir/path-to-xcart/catalog'); // Set the absolute directory path to your catalog.
define('BAR_LENGTH_REDUCER', 3); // If you have over 1000 HTML files in your catalog, you may wish to set this number higher.
# System constants.
define('MAX_ET', ini_get('max_execution_time'));
define('MAX_IT', ini_get('max_input_time'));

# Modify PHP Settings #
#######################
ini_set('max_execution_time', '14400'); // Make the maximum execution & input time 4 hours so that the script doesn't time-out.
ini_set('max_input_time', '14400');
ob_implicit_flush(1); // Show the progress in the browser.

# Initialize variables.
$successes = 0;
$failures = 0;
$filelength['init'] = 0;
$filelength['final'] = 0;
$cnt['tmp'] = 0; // Newline counter
$cnt['tot'] = 0; // Totals counter
$pblr = 0; // Progress bar length reducer variable

# Initialize regular expressions.
$regex = array(''=>'/[\t\n\r\f]+/', // Newlines and tabs
            ' '=>'/ +/', // Excess spaces
            ''=>'/ /i', // Additional space after non-breaking space
            '><'=>'/> </' // Space between HTML tags
            );
$java_saver = '/(<script[^>]*>.*?<\/script>)/si';

# Function to Clean-up #
########################
function script_shutdown() {
  ini_set('max_execution_time', MAX_ET); // Reset the maximum execution time.
  ini_set('max_input_time', MAX_IT); // Reset the maximum input time.
  ob_implicit_flush(0); // Data should be kept in the buffer until ready.
}

register_shutdown_function('script_shutdown'); // Register the shutdown function.

# Pad with 256 bytes for Internet Explorer to show output immediately.
if (strpos($_SERVER['HTTP_USER_AGENT'],'MSIE') !== false)
  for ($pad=0; $pad < 256; $pad++) echo "\t"; echo "\n";

echo "Stripping the HTML files of excess spaces...
";

# Open the directory and store the file list.
if (is_dir(CATALOG_DIR)) {
  if ($dh = opendir(CATALOG_DIR)) {
      # Iterate over file list.
      while (($filename = readdir($dh)) !== false) { // Use instead of scandir to skip some files.
        if (strpos($filename,'.htm') !== false)
            $file_list[] = $filename;
      }
      closedir($dh); // Close the directory.
  }

  # Perform specific operations on the files.
  foreach ($file_list as $file) {
      $file_contents = file_get_contents(CATALOG_DIR.DIRECTORY_SEPARATOR.$file);
      $filelength['init'] += strlen($file_contents);
      # Examine document for javascript code blocks and preserve them for restoration.
      if (preg_match_all($java_saver,$file_contents,$got_java,PREG_SET_ORDER)) {
        foreach ($got_java as $java_chip) {
            if (is_array($java_chip)) {
              $java_scripts[] = $java_chip[1];
            }
        }
        foreach ($regex as $replace=>$finds) // Do each replacement.
            $file_contents = preg_replace($finds,$replace,$file_contents);
        # Reverse the damage to the javascripts.
        if (preg_match_all($java_saver,$file_contents,$got_java,PREG_SET_ORDER)) {
            foreach ($got_java as $stripped_java) {
              if (is_array($stripped_java)) {
                  # Find the stripped java and replace it with the original code.
                  $file_contents = str_replace($stripped_java[1],current($java_scripts),$file_contents);
                  next($java_scripts);
              }
            }
        }
      } else {
        foreach ($regex as $replace=>$finds) // Do each replacement.
            $file_contents = preg_replace($finds,$replace,$file_contents);
      }
      $fp = fopen(CATALOG_DIR.DIRECTORY_SEPARATOR.$file, 'w'); // Truncate file, then apply the modifications.
      if (!fwrite($fp,$file_contents)) {
        $failure_list[] = $file; // Log failures.
        $failures++;
      } else {
        $successes++;
      }
      fclose($fp);
      unset($java_scripts);
      $java_scripts = array();
      $filelength['final'] += strlen($file_contents);
      if ($pblr == BAR_LENGTH_REDUCER) { // Progress bar length reducer.
        echo '|'; // Lengthen the progess bar.
        $cnt['tmp']++; // Increment the newline counter.
        $pblr = 0; // Reset pblr counter.
      } else {
        $pblr++;
      }
      $cnt['tot']++; // Increment totals counter.
      if ($cnt['tmp']==300) { echo '
'; $cnt['tmp']=0; } // Reset the counter.
      }
} else
  die(''.CATALOG_DIR.' is not a directory! Please check the path and try again.');

echo '

There were '.number_format($successes).' successful cleanings and '.number_format($failures).' failures out of a total of '.number_format($cnt['tot']).' files.</p>';
echo '

Your HTML Catalog files had a total combined length of '.number_format($filelength['init']).' characters.';
echo '
They now have a total length of '.number_format($filelength['final']).' characters.</p>';
echo 'That is a total of <u>'.number_format($filelength['init']-$filelength['final']).'</u> excess white spaces removed from your files.

';

if (isset($failure_list)) {
  echo '
The following files could not be written to:
';
  $c = 'Y'; // Init background color notifier.
  foreach ($failure_list as $fail) {
      # Show background color every other line for readability.
      if($c=='N') {$bgb=''; $bge=''; $c='Y';} else {$bgb='<font style="background-color:#E0E0E0">'; $bge='</font>'; $c='N';}
      echo $bgb.''.$fail.$bge.'
';
  }
}
?>
</body>
</html>


jburba2000 08-12-2004 11:14 AM

Anyone wanna tell me why I am getting this error? The only thing I have changed is my directory path???

Quote:

Warning: Unexpected character in input: '\' (ASCII=92) state=1 in /home/admin/sitedir/admin/catalog_cleanup.php on line 2

Parse error: parse error in /home/admin/sitedir/admin/catalog_cleanup.php on line 2

jburba2000 08-12-2004 11:44 AM

never mind, i figured it out, and with a total of 1,111,686 excess white spaces removed from my code.

wow, cudos bro, much appreciated mod, hope i can pay ya back later...

NuAlpha 08-12-2004 12:34 PM

Quote:

Originally Posted by jburba2000
never mind, i figured it out, and with a total of 1,111,686 excess white spaces removed from my code.

wow, cudos bro, much appreciated mod, hope i can pay ya back later...


Welcome! :wink:

john99 08-21-2004 08:12 PM

Quote:

Originally Posted by NuAlpha
Quote:

Originally Posted by jburba2000
never mind, i figured it out, and with a total of 1,111,686 excess white spaces removed from my code.

wow, cudos bro, much appreciated mod, hope i can pay ya back later...


Welcome! :wink:


Hi jburba2000, I hit the same problem and wonder if you could let me know how you fixed it. Thanks.


All times are GMT -8. The time now is 04:04 AM.

Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.