I process some text files that represented in the code below :
the code :
$file = file($files);
$lines = str_replace("'", '', $file);
$noMultipleSpace = removeMultipleSpaces($lines);
$fileContents = array();
foreach($noMultipleSpace as $line) {
if (isLatin($line) && count(preg_split('/\s+/', $line)) > 25) {
$newContent = preg_split('/\\.\\s*/', $line);
foreach($newContent as $newsContent) {
$pos1 = stripos($newsContent, ':');
if ($pos1 == false && count(preg_split('/\s+/', $newsContent) > 3) && isLatin($newsContent)) {
$fileContents[] = $newsContent;
}
}
$content = implode('.', $fileContents);
}
}
with the function :
function isLatin($string) {
return preg_match('/^\\s*[a-z,A-Z]/', $string) > 0;
}
function removeMultipleSpaces($string){
return preg_replace('/\s+/', ' ',$string);
}
but, in the implode process, the dot paste in the next sentence. for example sentence1 .Sentence2. My expectation is sentence1. Sentence2. what’s wrong? thank you 🙂
the input is text files, for example :
ChengXiang Zhai
Department of Computer Science University of Illinois at Urbana Champaign
ABSTRACT
Temporal Text Mining (TTM) is concerned with discovering temporal patterns in text
information collected over time. Since most text information bears some time stamps, TTM has many applications in multiple domains, such as summarizing events in news articles and
revealing research trends in scientific literature. In this paper, we study a particular TTM
task discovering and summarizing the evolutionary patterns of themes in a text stream. We
define this new text mining problem and present general probabilistic methods for solving
this problem through (1) discovering latent themes from text; (2) constructing an evolution
graph of themes; and (3) analyzing life cycles of themes. Evaluation of the proposed methods
on two different domains (i.e., news articles and literature) shows that the proposed
methods can discover interesting evolutionary theme patterns effectively. Categories and
Subject Descriptors: H.3.3 [Information Search and Retrieval]: Clustering General Terms:
Algorithms Keywords: Temporal text mining, evolutionary theme patterns, theme threads,
clustering
1.
INTRODUCTION
I wanna get the important sentence only, from Temporal Text Mining (TTM)... until effectively
Your intermediate sentences appear to have a trailing space, causing the imploded delimiter to appear off.
Try this: