averell¶

averell.core.export_corpora(corpus_ids, granularity, corpora_folder, filename, no_download=False)[source]¶

Generates a single JSON file with the chosen granularity for all of the: selected corpora

Parameters:	corpus_ids – IDs of the corpora that will be exported granularity – Level of parsing granularity corpora_folder – Local folder where the corpora is located filename – Name of the output file no_download – Whether to download or not a corpora when missing
Returns:	Python dict with the chosen granularity for all of the selected corpora

averell.core.get_corpora(corpus_indices=None, output_folder=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/averell/checkouts/fix-documentation/docs/corpora'))[source]¶

Download and uncompress selected corpora

Parameters:	corpus_indices – Indices of the corpus that will be downloaded output_folder – Local folder where the corpus is going to be uncompressed
Returns:	Python dict with all corpora features

averell.utils.download_corpora(corpus_indices=None, output_folder=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/averell/checkouts/fix-documentation/docs/corpora'))[source]¶

Download corpus from a list of sources to a local folder

Parameters:	corpus_indices – list List with the indexes of CORPORA_SOURCES to choose which corpus is going to be downloaded output_folder – string The folder where the corpus is going to be saved

averell.utils.download_corpus(url, filename=None)[source]¶

Function to download the corpus zip file from external source

Parameters:	url – string URL of the corpus file
Returns:	string Local filename of the corpus

averell.utils.filter_corpus_features(corpus_features, corpus_id, granularity)[source]¶

Get the granularity features for each poem in corpus

Parameters:	corpus_features – list of dicts List of corpus poems python dicts corpus_id – int Corpus id to be filtered granularity – string Level to filter the poem (stanza, line, word or syllable)
Returns:	list List of rows with the corpus granularity info

averell.utils.filter_features(features, corpus_index, granularity=None)[source]¶

Select the granularity

Parameters:	features – dict Poem python dict corpus_index – int Corpus index to be filtered granularity – string Level to filter the poem (stanza, line, word or syllable)
Returns:	list List of rows with the poem granularity info

averell.utils.get_ids(values)[source]¶

Transform numeric identifiers, corpora shortcodes (slugs), and two-letter ISO language codes, into their corresponding numeric identifier as per the order in CORPORA_SOURCES.

Returns:	List of indices in CORPORA_SOURCES
Return type:	list

averell.utils.get_line_features(features)[source]¶

Filter the line features of a poem

Parameters:	features – dict Poem dictionary
Returns:	dict list Lines dict list

averell.utils.get_main_corpora_info()[source]¶

Create dict with the main corpora info saved in CORPORA_SOURCES

Returns:	Dictionary with the corpora info to be shown
Return type:	dict

averell.utils.get_stanza_features(poem_features)[source]¶

Filter the stanza features of a poem

Parameters:	poem_features – dict Poem dictionary
Returns:	dict list Stanzas dict list

averell.utils.get_syllable_features(features)[source]¶

Filter the syllable features of a poem

Parameters:	features – dict Poem dictionary
Returns:	dict list Syllables dict list

averell.utils.get_word_features(features)[source]¶

Filter the word features of a poem

Parameters:	features – dict Poem dictionary
Returns:	dict list Words dict list

averell.utils.pretty_string(text, num_words)[source]¶

Add a line break every number of words into a text to create multiline cells to use in get_main_corpora_info()

Parameters:	text – String to be split num_words – Number of words to add a line break after
Returns:	String with line break every number of words entered
Return type:	str

averell.utils.progress_bar(t)[source]¶: from https://gist.github.com/leimao/37ff6e990b3226c2c9670a2cd1e4a6f5 Wraps tqdm instance. Don’t forget to close() or __exit__() the tqdm instance once you’re done (easiest using with syntax).

averell.utils.read_features(corpus_folder)[source]¶

Read the dictionary of each poem in “corpus_folder” and return the list of python dictionaries

Parameters:	corpus_folder – Local folder where the corpus is located
Returns:	List of python dictionaries with the poems features

averell.utils.uncompress_corpus(filename, save_dir)[source]¶

Simple function to uncompress the corpus zip file

Parameters:	filename – string The file that is going to be uncompressed save_dir – string The folder where the corpus is going to be uncompressed
Returns:	string Filename of uncompressed corpus

averell.utils.write_json(poem_dict, filename)[source]¶

Simple function to save data in json format

Parameters:	poem_dict – dict Python dict with poem data filename – string JSON filename that will be written with the poem data

averell¶

Table of Contents

This Page