Friday, 23 August 2013

Scalable Solution for Splitting words in a document?

Scalable Solution for Splitting words in a document?

I have a document in which words are to separated and extracted by a blank
space. For that purpose i used the following code.
string[] words = s.Split(' ');
Now the problem is that i am going to use this code for the parser of a
search engine. Because of that there would be hundreds of thousands if not
millions of webpages that would be needed to split into words.
Is my concern right that using the above code the process could take very
long time or is it unfounded. If it is right any suggestions on an
alternative scalable solution would be welcomed.

No comments:

Post a Comment