Skip to content

Nettoyer une page HTML de ces balises

  • by
  1. function html2txt($document){
  2.         $search = array(‘@<script[^>]*?>.*?</script>@si’, // Strip out javascript
  3.         ‘@<style[^>]*?>.*?</style>@siU’, // Strip style tags properly
  4.         ‘@<[?]php[^>].*?[?]>@si’, //scripts php
  5.         ‘@<[?][^>].*?[?]>@si’, //scripts php
  6.         ‘@<[\/\!]*?[^<>]*?>@si’, // Strip out HTML tags
  7.         ‘@<![\s\S]*?–[ \t\n\r]*>@’, // Strip multi-line comments including CDATA
  8.         );
  9.         $text = preg_replace($search, , $document);
  10.         return $text;
  11. }

Note

Il reste encore des améliorations à faire…