averell¶
-
averell.core.export_corpora(corpus_ids, granularity, corpora_folder, filename, no_download=False)[source]¶ - Generates a single JSON file with the chosen granularity for all of the
- selected corpora
Parameters: - corpus_ids – IDs of the corpora that will be exported
- granularity – Level of parsing granularity
- corpora_folder – Local folder where the corpora is located
- filename – Name of the output file
- no_download – Whether to download or not a corpora when missing
Returns: Python dict with the chosen granularity for all of the selected corpora
-
averell.core.get_corpora(corpus_indices=None, output_folder=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/averell/checkouts/fix-documentation/docs/corpora'))[source]¶ Download and uncompress selected corpora
Parameters: - corpus_indices – Indices of the corpus that will be downloaded
- output_folder – Local folder where the corpus is going to be uncompressed
Returns: Python dict with all corpora features
-
averell.utils.download_corpora(corpus_indices=None, output_folder=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/averell/checkouts/fix-documentation/docs/corpora'))[source]¶ Download corpus from a list of sources to a local folder
Parameters: - corpus_indices – list List with the indexes of CORPORA_SOURCES to choose which corpus is going to be downloaded
- output_folder – string The folder where the corpus is going to be saved
-
averell.utils.download_corpus(url, filename=None)[source]¶ Function to download the corpus zip file from external source
Parameters: url – string URL of the corpus file Returns: string Local filename of the corpus
-
averell.utils.filter_corpus_features(corpus_features, corpus_id, granularity)[source]¶ Get the granularity features for each poem in corpus
Parameters: - corpus_features – list of dicts List of corpus poems python dicts
- corpus_id – int Corpus id to be filtered
- granularity – string Level to filter the poem (stanza, line, word or syllable)
Returns: list List of rows with the corpus granularity info
-
averell.utils.filter_features(features, corpus_index, granularity=None)[source]¶ Select the granularity
Parameters: - features – dict Poem python dict
- corpus_index – int Corpus index to be filtered
- granularity – string Level to filter the poem (stanza, line, word or syllable)
Returns: list List of rows with the poem granularity info
-
averell.utils.get_ids(values)[source]¶ Transform numeric identifiers, corpora shortcodes (slugs), and two-letter ISO language codes, into their corresponding numeric identifier as per the order in CORPORA_SOURCES.
Returns: List of indices in CORPORA_SOURCES Return type: list
-
averell.utils.get_line_features(features)[source]¶ Filter the line features of a poem
Parameters: features – dict Poem dictionary Returns: dict list Lines dict list
-
averell.utils.get_main_corpora_info()[source]¶ Create dict with the main corpora info saved in CORPORA_SOURCES
Returns: Dictionary with the corpora info to be shown Return type: dict
-
averell.utils.get_stanza_features(poem_features)[source]¶ Filter the stanza features of a poem
Parameters: poem_features – dict Poem dictionary Returns: dict list Stanzas dict list
-
averell.utils.get_syllable_features(features)[source]¶ Filter the syllable features of a poem
Parameters: features – dict Poem dictionary Returns: dict list Syllables dict list
-
averell.utils.get_word_features(features)[source]¶ Filter the word features of a poem
Parameters: features – dict Poem dictionary Returns: dict list Words dict list
-
averell.utils.pretty_string(text, num_words)[source]¶ Add a line break every number of words into a text to create multiline cells to use in
get_main_corpora_info()Parameters: - text – String to be split
- num_words – Number of words to add a line break after
Returns: String with line break every number of words entered
Return type: str
-
averell.utils.progress_bar(t)[source]¶ from https://gist.github.com/leimao/37ff6e990b3226c2c9670a2cd1e4a6f5 Wraps tqdm instance. Don’t forget to close() or __exit__() the tqdm instance once you’re done (easiest using with syntax).
-
averell.utils.read_features(corpus_folder)[source]¶ Read the dictionary of each poem in “corpus_folder” and return the list of python dictionaries
Parameters: corpus_folder – Local folder where the corpus is located Returns: List of python dictionaries with the poems features
-
averell.utils.uncompress_corpus(filename, save_dir)[source]¶ Simple function to uncompress the corpus zip file
Parameters: - filename – string The file that is going to be uncompressed
- save_dir – string The folder where the corpus is going to be uncompressed
Returns: string Filename of uncompressed corpus