1.split content in lines

walden.txt create new file in pycharm named walden.txt and copy the content

path = '/Users/osx/PycharmProjects/lesson2/waldenn.txt'
with open(path, 'r') as f:
    lines = f.readlines()
    for index, line in enumerate(lines):
        data = {
            'index':index,
            'line' :line,
            'words':len(line.split())
        }
        print(data)

if open a text file with ascii or unicode or utf8 python can not decode

Error: UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xff in position 0: ordinal not in range(128) Process finished with exit code 1

2. Save splited lines into Mongodb

import pymongo

client = pymongo.MongoClient('localhost', 27017)
workbook = client['workbook']
sheet_tab = workbook['sheet_tab']

path = '/Users/osx/PycharmProjects/lesson2/waldenn.txt'
with open(path, 'r') as f:
    lines = f.readlines()
    for index, line in enumerate(lines):
        data = {
            'index':index,
            'line' :line,
            'words':len(line.split())
        }
        sheet_tab.insert_one(data)

if no pymongo package in this project, add it as follow

still, important thing! turn on mongodb service as follow command in iTerm

mongod

3.check the data saved in db

3.1 use code

import pymongo

client = pymongo.MongoClient('localhost', 27017)
workbook = client['workbook']
sheet_tab = workbook['sheet_tab']

# path = '/Users/osx/PycharmProjects/lesson2/waldenn.txt'
# with open(path, 'r') as f:
#     lines = f.readlines()
#     for index, line in enumerate(lines):
#         data = {
#             'index':index,
#             'line' :line,
#             'words':len(line.split())
#         }
#         sheet_tab.insert_one(data)

for item in sheet_tab.find():
    print(item)

3.2 use plug-in tool in pycharm

it seems that the plug-in tool can only show 300 line records???????????? using command can find all the records in mongodb 2018-1-10

4 orgnize data in order

4.1 fliter commands/query db

>  $lt/$lte/$gt/$gte/$ne，依次等价于</<=/>/>=/!=。（l表示less g表示greater e表示equal n表示not ） example: ```Python for item in sheet_tab.find({'words':{'$lt':5}}): print(item) #find the words of a line is less than 5 words ```