Let's just start from the practical bash examples. The problem is this: I would do some ideal summary for a set of newswires about a particular topic. for example every topic contains about ten documents, which each is about an specified aspect, also connected with the times sequences. However, the time orders about the topic is only embeded in the content of the documents. When you concatenate all this sub-topic documents into a large topic sets to so the following ideal summary task about the particular topic, you would be confused by the un-order docment sets. And that's just what the point we aim to solve.

Particular and Important Script

Although shell script are almost the same in essentials while differing in minor points. To ensure the accuracy of this article, I just decide to limit my note about the shell script on the bash shell. Therefore, the position of this post is aim to get some conclusion about the experience during the usage of the bash script. It's definitely an amazing thing that when you feel that you are just write beautiful code skilfully. Especially that when you encounter a pariticular problem ,and you find that it can be resolved by the computer elegant and high-efficiency. That's the origin of the sense of achievement. Now, let's cut the cackle to just begin our particular requirement and resolution. First we need to divide our problems into some basic modules. they mainly contains three modules: method to extract the date in the documents; formatted the date from the extracted data string; sort and concatenate the date, generate the results.

Normalized the Text and Extract the date String

During the process of hummanic translation from english to chinese, we conduct our work by three groups of people. and due to the discrepency of encoding system of ther editor, the result files contained incongruous encoding such as UTF-8 encoding, UTF-8 BOM encoding, ascii encoding and so on, so the first things I need to do it normalize the while encodings of the translations files. Therefore, I just write a simple script to check the encoding of files, and find the unmatched encoding, and then to converted them automatically or manually. Then, There comes the steps of extract the date string from the translated texts. the first things to do is to delete the blank lines in the texts. this tricky is also very important. it makes me re-relizes the function of sed to execute some other command, such as deletion, except for the common used substitued operations. And following the compress blank lines process, I adopt the grep -o option to extract the date information from the strings. also I learning about the array construction method through the bash substitution operations. It make me notices that some useful bash test conditions such as -z: stands for empty. -n: stands for empty string. also the iteration methods of array for file in ${array[@]}, also some logic operations. ! and -o, -a. One of most important may the usage of functions. how to defintion a function in bash, and pass paramters to the function and return values from it. Morely, such as the local and global variables. All in a word, I feel that this attemption just make me write code more easily. Also the standardlize of write code is the most important things I should develops. Just write the code like compose a poem. That's the philosophy of Perl languages.

Sort accroding to the date

The last things to do is just sort the extracted date which belongs to corresponding docs in a particular topics. Here, I also find that the most ultimately and effective methods to learning the usage of sort. And all these things cannot not be finish once. And since you have encounter before, you should find that only by the combination of sytemic learning of the tutorial with the practical problems resolution you can truely master the essentials. And that's the different of proficient and possibility.


Advanced Bash-Scripting Guide